Search in sources :

Example 41 with SystemStream

use of org.apache.samza.system.SystemStream in project samza by apache.

the class ControlMessageSender method broadcastToOtherPartitions.

void broadcastToOtherPartitions(ControlMessage message, SystemStreamPartition ssp, MessageCollector collector) {
    SystemStream systemStream = ssp.getSystemStream();
    int partitionCount = getPartitionCount(systemStream);
    int currentPartition = ssp.getPartition().getPartitionId();
    for (int i = 0; i < partitionCount; i++) {
        if (i != currentPartition) {
            OutgoingMessageEnvelope envelopeOut = new OutgoingMessageEnvelope(systemStream, i, null, message);
            collector.send(envelopeOut);
        }
    }
}
Also used : SystemStream(org.apache.samza.system.SystemStream) OutgoingMessageEnvelope(org.apache.samza.system.OutgoingMessageEnvelope)

Example 42 with SystemStream

use of org.apache.samza.system.SystemStream in project samza by apache.

the class StreamManager method clearStreamsFromPreviousRun.

/**
 * This is a best-effort approach to clear the internal streams from previous run, including intermediate streams,
 * checkpoint stream and changelog streams.
 * For batch processing, we always clean up the previous internal streams and create a new set for each run.
 * @param prevConfig config of the previous run
 */
public void clearStreamsFromPreviousRun(Config prevConfig) {
    try {
        ApplicationConfig appConfig = new ApplicationConfig(prevConfig);
        LOGGER.info("run.id from previous run is {}", appConfig.getRunId());
        StreamConfig streamConfig = new StreamConfig(prevConfig);
        // Find all intermediate streams and clean up
        Set<StreamSpec> intStreams = streamConfig.getStreamIds().stream().filter(streamConfig::getIsIntermediateStream).map(id -> new StreamSpec(id, streamConfig.getPhysicalName(id), streamConfig.getSystem(id))).collect(Collectors.toSet());
        intStreams.forEach(stream -> {
            LOGGER.info("Clear intermediate stream {} in system {}", stream.getPhysicalName(), stream.getSystemName());
            systemAdmins.getSystemAdmin(stream.getSystemName()).clearStream(stream);
        });
        // Find checkpoint stream and clean up
        TaskConfig taskConfig = new TaskConfig(prevConfig);
        taskConfig.getCheckpointManager(new MetricsRegistryMap()).ifPresent(CheckpointManager::clearCheckpoints);
        // Find changelog streams and remove them
        StorageConfig storageConfig = new StorageConfig(prevConfig);
        for (String store : storageConfig.getStoreNames()) {
            String changelog = storageConfig.getChangelogStream(store).orElse(null);
            if (changelog != null) {
                LOGGER.info("Clear store {} changelog {}", store, changelog);
                SystemStream systemStream = StreamUtil.getSystemStreamFromNames(changelog);
                StreamSpec spec = StreamSpec.createChangeLogStreamSpec(systemStream.getStream(), systemStream.getSystem(), 1);
                systemAdmins.getSystemAdmin(spec.getSystemName()).clearStream(spec);
            }
        }
    } catch (Exception e) {
        // For batch, we always create a new set of internal streams (checkpoint, changelog and intermediate) with unique
        // id. So if clearStream doesn't work, it won't affect the correctness of the results.
        // We log a warning here and rely on retention to clean up the streams later.
        LOGGER.warn("Fail to clear internal streams from previous run. Please clean up manually.", e);
    }
}
Also used : Logger(org.slf4j.Logger) Collection(java.util.Collection) LoggerFactory(org.slf4j.LoggerFactory) Set(java.util.Set) HashMap(java.util.HashMap) StreamSpec(org.apache.samza.system.StreamSpec) org.apache.samza.config(org.apache.samza.config) Multimap(com.google.common.collect.Multimap) Collectors(java.util.stream.Collectors) SystemStreamMetadata(org.apache.samza.system.SystemStreamMetadata) SamzaException(org.apache.samza.SamzaException) List(java.util.List) HashMultimap(com.google.common.collect.HashMultimap) CheckpointManager(org.apache.samza.checkpoint.CheckpointManager) SystemStream(org.apache.samza.system.SystemStream) Map(java.util.Map) SystemAdmin(org.apache.samza.system.SystemAdmin) VisibleForTesting(com.google.common.annotations.VisibleForTesting) StreamUtil(org.apache.samza.util.StreamUtil) MetricsRegistryMap(org.apache.samza.metrics.MetricsRegistryMap) SystemAdmins(org.apache.samza.system.SystemAdmins) StreamSpec(org.apache.samza.system.StreamSpec) CheckpointManager(org.apache.samza.checkpoint.CheckpointManager) SystemStream(org.apache.samza.system.SystemStream) SamzaException(org.apache.samza.SamzaException) MetricsRegistryMap(org.apache.samza.metrics.MetricsRegistryMap)

Example 43 with SystemStream

use of org.apache.samza.system.SystemStream in project samza by apache.

the class MetricsSnapshotReporterFactory method getSystemStream.

protected SystemStream getSystemStream(String reporterName, Config config) {
    MetricsConfig metricsConfig = new MetricsConfig(config);
    String metricsSystemStreamName = metricsConfig.getMetricsSnapshotReporterStream(reporterName).orElseThrow(() -> new SamzaException("No metrics stream defined in config."));
    SystemStream systemStream = StreamUtil.getSystemStreamFromNames(metricsSystemStreamName);
    LOG.info("Got system stream {}.", systemStream);
    return systemStream;
}
Also used : SystemStream(org.apache.samza.system.SystemStream) SamzaException(org.apache.samza.SamzaException) MetricsConfig(org.apache.samza.config.MetricsConfig)

Example 44 with SystemStream

use of org.apache.samza.system.SystemStream in project samza by apache.

the class MetricsSnapshotReporterFactory method getMetricsReporter.

@Override
public MetricsReporter getMetricsReporter(String reporterName, String containerName, Config config) {
    LOG.info("Creating new metrics snapshot reporter.");
    MetricsRegistryMap registry = new MetricsRegistryMap();
    SystemStream systemStream = getSystemStream(reporterName, config);
    SystemProducer producer = getProducer(reporterName, config, registry);
    Duration reportingInterval = Duration.ofSeconds(getReportingInterval(reporterName, config));
    String jobName = getJobName(config);
    String jobId = getJobId(config);
    Serde<MetricsSnapshot> serde = getSerde(reporterName, config);
    Optional<Pattern> blacklist = getBlacklist(reporterName, config);
    MetricsSnapshotReporter reporter = new MetricsSnapshotReporter(producer, systemStream, reportingInterval, jobName, jobId, containerName, Util.getTaskClassVersion(config), Util.getSamzaVersion(), Util.getLocalHost().getHostName(), serde, blacklist, SystemClock.instance());
    reporter.register(this.getClass().getSimpleName(), registry);
    return reporter;
}
Also used : Pattern(java.util.regex.Pattern) SystemStream(org.apache.samza.system.SystemStream) SystemProducer(org.apache.samza.system.SystemProducer) Duration(java.time.Duration) MetricsRegistryMap(org.apache.samza.metrics.MetricsRegistryMap)

Example 45 with SystemStream

use of org.apache.samza.system.SystemStream in project samza by apache.

the class TestCoordinatorStreamSystemConsumer method testOrderKeyRewrite.

/**
 * Verify that if a particular key-value is written, then another, then the original again,
 * that the original occurs last in the set.
 */
@Test
public void testOrderKeyRewrite() throws InterruptedException {
    final SystemStream systemStream = new SystemStream("system", "stream");
    final SystemStreamPartition ssp = new SystemStreamPartition(systemStream, new Partition(0));
    final SystemConsumer systemConsumer = mock(SystemConsumer.class);
    final List<IncomingMessageEnvelope> list = new ArrayList<>();
    SetConfig setConfig1 = new SetConfig("source", "key1", "value1");
    SetConfig setConfig2 = new SetConfig("source", "key1", "value2");
    SetConfig setConfig3 = new SetConfig("source", "key1", "value1");
    list.add(createIncomingMessageEnvelope(setConfig1, ssp));
    list.add(createIncomingMessageEnvelope(setConfig2, ssp));
    list.add(createIncomingMessageEnvelope(setConfig3, ssp));
    Map<SystemStreamPartition, List<IncomingMessageEnvelope>> messages = new HashMap<SystemStreamPartition, List<IncomingMessageEnvelope>>() {

        {
            put(ssp, list);
        }
    };
    when(systemConsumer.poll(anySet(), anyLong())).thenReturn(messages, Collections.<SystemStreamPartition, List<IncomingMessageEnvelope>>emptyMap());
    CoordinatorStreamSystemConsumer consumer = new CoordinatorStreamSystemConsumer(systemStream, systemConsumer, new SinglePartitionWithoutOffsetsSystemAdmin());
    consumer.bootstrap();
    Set<CoordinatorStreamMessage> bootstrappedMessages = consumer.getBootstrappedStream(SetConfig.TYPE);
    // First message should have been removed as a duplicate
    assertEquals(2, bootstrappedMessages.size());
    CoordinatorStreamMessage[] coordinatorStreamMessages = bootstrappedMessages.toArray(new CoordinatorStreamMessage[2]);
    assertEquals(setConfig2, coordinatorStreamMessages[0]);
    // Config 3 MUST be the last message, not config 2
    assertEquals(setConfig3, coordinatorStreamMessages[1]);
}
Also used : SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) Partition(org.apache.samza.Partition) SystemConsumer(org.apache.samza.system.SystemConsumer) HashMap(java.util.HashMap) LinkedHashMap(java.util.LinkedHashMap) SystemStream(org.apache.samza.system.SystemStream) IncomingMessageEnvelope(org.apache.samza.system.IncomingMessageEnvelope) ArrayList(java.util.ArrayList) CoordinatorStreamMessage(org.apache.samza.coordinator.stream.messages.CoordinatorStreamMessage) SetConfig(org.apache.samza.coordinator.stream.messages.SetConfig) SinglePartitionWithoutOffsetsSystemAdmin(org.apache.samza.util.SinglePartitionWithoutOffsetsSystemAdmin) ArrayList(java.util.ArrayList) List(java.util.List) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) Test(org.junit.Test)

Aggregations

SystemStream (org.apache.samza.system.SystemStream)143 HashMap (java.util.HashMap)75 Test (org.junit.Test)74 SystemStreamPartition (org.apache.samza.system.SystemStreamPartition)72 Partition (org.apache.samza.Partition)58 Map (java.util.Map)55 TaskName (org.apache.samza.container.TaskName)52 MapConfig (org.apache.samza.config.MapConfig)49 Config (org.apache.samza.config.Config)46 SystemAdmin (org.apache.samza.system.SystemAdmin)42 SystemAdmins (org.apache.samza.system.SystemAdmins)40 TaskModel (org.apache.samza.job.model.TaskModel)39 Collections (java.util.Collections)37 Set (java.util.Set)37 TaskConfig (org.apache.samza.config.TaskConfig)37 Clock (org.apache.samza.util.Clock)36 File (java.io.File)35 ImmutableMap (com.google.common.collect.ImmutableMap)34 SystemStreamPartitionMetadata (org.apache.samza.system.SystemStreamMetadata.SystemStreamPartitionMetadata)33 TaskMode (org.apache.samza.job.model.TaskMode)32