Search in sources :

Example 36 with SystemStream

use of org.apache.samza.system.SystemStream in project samza by apache.

the class KafkaChangelogStateBackendFactory method getBackupManager.

@Override
public TaskBackupManager getBackupManager(JobContext jobContext, ContainerModel containerModel, TaskModel taskModel, ExecutorService backupExecutor, MetricsRegistry metricsRegistry, Config config, Clock clock, File loggedStoreBaseDir, File nonLoggedStoreBaseDir) {
    SystemAdmins systemAdmins = new SystemAdmins(config);
    StorageConfig storageConfig = new StorageConfig(config);
    Map<String, SystemStream> storeChangelogs = storageConfig.getStoreChangelogs();
    if (new TaskConfig(config).getTransactionalStateCheckpointEnabled()) {
        return new KafkaTransactionalStateTaskBackupManager(taskModel.getTaskName(), storeChangelogs, systemAdmins, taskModel.getChangelogPartition());
    } else {
        return new KafkaNonTransactionalStateTaskBackupManager(taskModel.getTaskName(), storeChangelogs, systemAdmins, taskModel.getChangelogPartition());
    }
}
Also used : StorageConfig(org.apache.samza.config.StorageConfig) SystemStream(org.apache.samza.system.SystemStream) TaskConfig(org.apache.samza.config.TaskConfig) SystemAdmins(org.apache.samza.system.SystemAdmins)

Example 37 with SystemStream

use of org.apache.samza.system.SystemStream in project samza by apache.

the class KafkaChangelogStateBackendFactory method filterStandbySystemStreams.

@VisibleForTesting
Map<String, SystemStream> filterStandbySystemStreams(Map<String, SystemStream> changelogSystemStreams, ContainerModel containerModel) {
    Map<SystemStreamPartition, String> changelogSSPToStore = new HashMap<>();
    changelogSystemStreams.forEach((storeName, systemStream) -> containerModel.getTasks().forEach((taskName, taskModel) -> changelogSSPToStore.put(new SystemStreamPartition(systemStream, taskModel.getChangelogPartition()), storeName)));
    Set<TaskModel> standbyTaskModels = containerModel.getTasks().values().stream().filter(taskModel -> taskModel.getTaskMode().equals(TaskMode.Standby)).collect(Collectors.toSet());
    // remove all standby task changelog ssps
    standbyTaskModels.forEach((taskModel) -> {
        changelogSystemStreams.forEach((storeName, systemStream) -> {
            SystemStreamPartition ssp = new SystemStreamPartition(systemStream, taskModel.getChangelogPartition());
            changelogSSPToStore.remove(ssp);
        });
    });
    // changelogSystemStreams correspond only to active tasks (since those of standby-tasks moved to sideInputs above)
    return MapUtils.invertMap(changelogSSPToStore).entrySet().stream().collect(Collectors.toMap(Map.Entry::getKey, x -> x.getValue().getSystemStream()));
}
Also used : StreamMetadataCache(org.apache.samza.system.StreamMetadataCache) SSPMetadataCache(org.apache.samza.system.SSPMetadataCache) HashMap(java.util.HashMap) TaskModel(org.apache.samza.job.model.TaskModel) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) SystemStream(org.apache.samza.system.SystemStream) Duration(java.time.Duration) Map(java.util.Map) ExecutorService(java.util.concurrent.ExecutorService) JobModel(org.apache.samza.job.model.JobModel) MapUtils(org.apache.commons.collections4.MapUtils) StorageConfig(org.apache.samza.config.StorageConfig) TaskConfig(org.apache.samza.config.TaskConfig) JobContext(org.apache.samza.context.JobContext) ContainerContext(org.apache.samza.context.ContainerContext) Set(java.util.Set) Clock(org.apache.samza.util.Clock) MetricsRegistry(org.apache.samza.metrics.MetricsRegistry) Collectors(java.util.stream.Collectors) File(java.io.File) TaskMode(org.apache.samza.job.model.TaskMode) ContainerModel(org.apache.samza.job.model.ContainerModel) VisibleForTesting(com.google.common.annotations.VisibleForTesting) Config(org.apache.samza.config.Config) SystemAdmins(org.apache.samza.system.SystemAdmins) HashMap(java.util.HashMap) HashMap(java.util.HashMap) Map(java.util.Map) TaskModel(org.apache.samza.job.model.TaskModel) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 38 with SystemStream

use of org.apache.samza.system.SystemStream in project samza by apache.

the class TaskSideInputHandler method getOldestOffsets.

/**
 * Gets the oldest offset for the {@link SystemStreamPartition}s associated with all the store side inputs.
 *   1. Groups the list of the SSPs based on system stream
 *   2. Fetches the {@link SystemStreamMetadata} from {@link StreamMetadataCache}
 *   3. Fetches the partition metadata for each system stream and fetch the corresponding partition metadata
 *      and populates the oldest offset for SSPs belonging to the system stream.
 *
 * @return a {@link Map} of {@link SystemStreamPartition} to their oldest offset. If partitionMetadata could not be
 * obtained for any {@link SystemStreamPartition} the offset for it is populated as null.
 */
@VisibleForTesting
Map<SystemStreamPartition, String> getOldestOffsets() {
    Map<SystemStreamPartition, String> oldestOffsets = new HashMap<>();
    // Step 1
    Map<SystemStream, List<SystemStreamPartition>> systemStreamToSsp = this.sspToStores.keySet().stream().collect(Collectors.groupingBy(SystemStreamPartition::getSystemStream));
    // Step 2
    Map<SystemStream, SystemStreamMetadata> metadata = JavaConverters.mapAsJavaMapConverter(this.streamMetadataCache.getStreamMetadata(JavaConverters.asScalaSetConverter(systemStreamToSsp.keySet()).asScala().toSet(), false)).asJava();
    // Step 3
    metadata.forEach((systemStream, systemStreamMetadata) -> {
        // get the partition metadata for each system stream
        Map<Partition, SystemStreamMetadata.SystemStreamPartitionMetadata> partitionMetadata = systemStreamMetadata.getSystemStreamPartitionMetadata();
        // Because of https://bugs.openjdk.java.net/browse/JDK-8148463 using lambda will NPE when getOldestOffset() is null
        for (SystemStreamPartition ssp : systemStreamToSsp.get(systemStream)) {
            oldestOffsets.put(ssp, partitionMetadata.get(ssp.getPartition()).getOldestOffset());
        }
    });
    return oldestOffsets;
}
Also used : SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) Partition(org.apache.samza.Partition) HashMap(java.util.HashMap) ConcurrentHashMap(java.util.concurrent.ConcurrentHashMap) SystemStream(org.apache.samza.system.SystemStream) SystemStreamMetadata(org.apache.samza.system.SystemStreamMetadata) List(java.util.List) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 39 with SystemStream

use of org.apache.samza.system.SystemStream in project samza by apache.

the class TransactionalStateTaskRestoreManager method getCurrentChangelogOffsets.

/**
 * Get offset metadata for each changelog SSP for this task. A task may have multiple changelog streams
 * (e.g., for different stores), but will have the same partition for all of them.
 */
@VisibleForTesting
static Map<SystemStreamPartition, SystemStreamPartitionMetadata> getCurrentChangelogOffsets(TaskModel taskModel, Map<String, SystemStream> storeChangelogs, SSPMetadataCache sspMetadataCache) {
    Map<SystemStreamPartition, SystemStreamPartitionMetadata> changelogOffsets = new HashMap<>();
    Partition changelogPartition = taskModel.getChangelogPartition();
    for (Map.Entry<String, SystemStream> storeChangelog : storeChangelogs.entrySet()) {
        SystemStream changelog = storeChangelog.getValue();
        SystemStreamPartition changelogSSP = new SystemStreamPartition(changelog.getSystem(), changelog.getStream(), changelogPartition);
        SystemStreamPartitionMetadata metadata = sspMetadataCache.getMetadata(changelogSSP);
        changelogOffsets.put(changelogSSP, metadata);
    }
    LOG.info("Got current changelog offsets for taskName: {} as: {}", taskModel.getTaskName(), changelogOffsets);
    return changelogOffsets;
}
Also used : SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) Partition(org.apache.samza.Partition) HashMap(java.util.HashMap) SystemStream(org.apache.samza.system.SystemStream) SystemStreamPartitionMetadata(org.apache.samza.system.SystemStreamMetadata.SystemStreamPartitionMetadata) HashMap(java.util.HashMap) Map(java.util.Map) ImmutableMap(com.google.common.collect.ImmutableMap) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 40 with SystemStream

use of org.apache.samza.system.SystemStream in project samza by apache.

the class ChangelogStreamManager method createChangelogStreams.

/**
 * Creates and validates the changelog streams of a samza job.
 *
 * @param config the configuration with changelog info.
 * @param maxChangeLogStreamPartitions the maximum number of changelog stream partitions to create.
 */
public static void createChangelogStreams(Config config, int maxChangeLogStreamPartitions) {
    // Get changelog store config
    StorageConfig storageConfig = new StorageConfig(config);
    ImmutableMap.Builder<String, SystemStream> storeNameSystemStreamMapBuilder = new ImmutableMap.Builder<>();
    storageConfig.getStoreNames().forEach(storeName -> {
        Optional<String> changelogStream = storageConfig.getChangelogStream(storeName);
        if (changelogStream.isPresent() && StringUtils.isNotBlank(changelogStream.get())) {
            storeNameSystemStreamMapBuilder.put(storeName, StreamUtil.getSystemStreamFromNames(changelogStream.get()));
        }
    });
    Map<String, SystemStream> storeNameSystemStreamMapping = storeNameSystemStreamMapBuilder.build();
    // Get SystemAdmin for changelog store's system and attempt to create the stream
    SystemConfig systemConfig = new SystemConfig(config);
    storeNameSystemStreamMapping.forEach((storeName, systemStream) -> {
        // Load system admin for this system.
        SystemAdmin systemAdmin = systemConfig.getSystemFactories().get(systemStream.getSystem()).getAdmin(systemStream.getSystem(), config, ChangelogStreamManager.class.getSimpleName());
        if (systemAdmin == null) {
            throw new SamzaException(String.format("Error creating changelog. Changelog on store %s uses system %s, which is missing from the configuration.", storeName, systemStream.getSystem()));
        }
        StreamSpec changelogSpec = StreamSpec.createChangeLogStreamSpec(systemStream.getStream(), systemStream.getSystem(), maxChangeLogStreamPartitions);
        systemAdmin.start();
        if (systemAdmin.createStream(changelogSpec)) {
            LOG.info(String.format("created changelog stream %s.", systemStream.getStream()));
        } else {
            LOG.info(String.format("changelog stream %s already exists.", systemStream.getStream()));
        }
        systemAdmin.validateStream(changelogSpec);
        if (storageConfig.getAccessLogEnabled(storeName)) {
            String accesslogStream = storageConfig.getAccessLogStream(systemStream.getStream());
            StreamSpec accesslogSpec = new StreamSpec(accesslogStream, accesslogStream, systemStream.getSystem(), maxChangeLogStreamPartitions);
            systemAdmin.createStream(accesslogSpec);
            systemAdmin.validateStream(accesslogSpec);
        }
        systemAdmin.stop();
    });
}
Also used : StreamSpec(org.apache.samza.system.StreamSpec) SystemConfig(org.apache.samza.config.SystemConfig) StorageConfig(org.apache.samza.config.StorageConfig) SystemStream(org.apache.samza.system.SystemStream) SamzaException(org.apache.samza.SamzaException) ImmutableMap(com.google.common.collect.ImmutableMap) SystemAdmin(org.apache.samza.system.SystemAdmin)

Aggregations

SystemStream (org.apache.samza.system.SystemStream)143 HashMap (java.util.HashMap)75 Test (org.junit.Test)74 SystemStreamPartition (org.apache.samza.system.SystemStreamPartition)72 Partition (org.apache.samza.Partition)58 Map (java.util.Map)55 TaskName (org.apache.samza.container.TaskName)52 MapConfig (org.apache.samza.config.MapConfig)49 Config (org.apache.samza.config.Config)46 SystemAdmin (org.apache.samza.system.SystemAdmin)42 SystemAdmins (org.apache.samza.system.SystemAdmins)40 TaskModel (org.apache.samza.job.model.TaskModel)39 Collections (java.util.Collections)37 Set (java.util.Set)37 TaskConfig (org.apache.samza.config.TaskConfig)37 Clock (org.apache.samza.util.Clock)36 File (java.io.File)35 ImmutableMap (com.google.common.collect.ImmutableMap)34 SystemStreamPartitionMetadata (org.apache.samza.system.SystemStreamMetadata.SystemStreamPartitionMetadata)33 TaskMode (org.apache.samza.job.model.TaskMode)32