Search in sources :

Example 6 with EndOfStreamMessage

use of org.apache.samza.system.EndOfStreamMessage in project beam by apache.

the class TranslationContext method createDummyStreamDescriptor.

/**
 * The dummy stream created will only be used in Beam tests.
 */
private static InputDescriptor<OpMessage<String>, ?> createDummyStreamDescriptor(String id) {
    final GenericSystemDescriptor dummySystem = new GenericSystemDescriptor(id, InMemorySystemFactory.class.getName());
    final GenericInputDescriptor<OpMessage<String>> dummyInput = dummySystem.getInputDescriptor(id, new NoOpSerde<>());
    dummyInput.withOffsetDefault(SystemStreamMetadata.OffsetType.OLDEST);
    final Config config = new MapConfig(dummyInput.toConfig(), dummySystem.toConfig());
    final SystemFactory factory = new InMemorySystemFactory();
    final StreamSpec dummyStreamSpec = new StreamSpec(id, id, id, 1);
    factory.getAdmin(id, config).createStream(dummyStreamSpec);
    final SystemProducer producer = factory.getProducer(id, config, null);
    final SystemStream sysStream = new SystemStream(id, id);
    final Consumer<Object> sendFn = (msg) -> {
        producer.send(id, new OutgoingMessageEnvelope(sysStream, 0, null, msg));
    };
    final WindowedValue<String> windowedValue = WindowedValue.timestampedValueInGlobalWindow("dummy", new Instant());
    sendFn.accept(OpMessage.ofElement(windowedValue));
    sendFn.accept(new WatermarkMessage(BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis()));
    sendFn.accept(new EndOfStreamMessage(null));
    return dummyInput;
}
Also used : InMemorySystemFactory(org.apache.samza.system.inmemory.InMemorySystemFactory) WindowedValue(org.apache.beam.sdk.util.WindowedValue) TableDescriptor(org.apache.samza.table.descriptors.TableDescriptor) GenericSystemDescriptor(org.apache.samza.system.descriptors.GenericSystemDescriptor) LoggerFactory(org.slf4j.LoggerFactory) HashMap(java.util.HashMap) OpMessage(org.apache.beam.runners.samza.runtime.OpMessage) GenericInputDescriptor(org.apache.samza.system.descriptors.GenericInputDescriptor) TransformInputs(org.apache.beam.runners.core.construction.TransformInputs) SystemStreamMetadata(org.apache.samza.system.SystemStreamMetadata) PTransform(org.apache.beam.sdk.transforms.PTransform) HashSet(java.util.HashSet) TupleTag(org.apache.beam.sdk.values.TupleTag) SystemStream(org.apache.samza.system.SystemStream) Map(java.util.Map) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) WatermarkMessage(org.apache.samza.system.WatermarkMessage) MapConfig(org.apache.samza.config.MapConfig) KV(org.apache.samza.operators.KV) NoOpSerde(org.apache.samza.serializers.NoOpSerde) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform) OutputDescriptor(org.apache.samza.system.descriptors.OutputDescriptor) MessageStream(org.apache.samza.operators.MessageStream) Table(org.apache.samza.table.Table) InputDescriptor(org.apache.samza.system.descriptors.InputDescriptor) Logger(org.slf4j.Logger) Set(java.util.Set) SystemFactory(org.apache.samza.system.SystemFactory) StreamSpec(org.apache.samza.system.StreamSpec) UUID(java.util.UUID) PCollection(org.apache.beam.sdk.values.PCollection) HashIdGenerator(org.apache.beam.runners.samza.util.HashIdGenerator) Consumer(java.util.function.Consumer) SamzaPipelineOptions(org.apache.beam.runners.samza.SamzaPipelineOptions) List(java.util.List) PValue(org.apache.beam.sdk.values.PValue) SystemProducer(org.apache.samza.system.SystemProducer) StreamApplicationDescriptor(org.apache.samza.application.descriptors.StreamApplicationDescriptor) PCollectionView(org.apache.beam.sdk.values.PCollectionView) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) Instant(org.joda.time.Instant) OutgoingMessageEnvelope(org.apache.samza.system.OutgoingMessageEnvelope) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) Config(org.apache.samza.config.Config) Collections(java.util.Collections) OutputStream(org.apache.samza.operators.OutputStream) StreamSpec(org.apache.samza.system.StreamSpec) InMemorySystemFactory(org.apache.samza.system.inmemory.InMemorySystemFactory) SystemFactory(org.apache.samza.system.SystemFactory) OpMessage(org.apache.beam.runners.samza.runtime.OpMessage) MapConfig(org.apache.samza.config.MapConfig) Config(org.apache.samza.config.Config) SystemProducer(org.apache.samza.system.SystemProducer) SystemStream(org.apache.samza.system.SystemStream) Instant(org.joda.time.Instant) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) WatermarkMessage(org.apache.samza.system.WatermarkMessage) MapConfig(org.apache.samza.config.MapConfig) OutgoingMessageEnvelope(org.apache.samza.system.OutgoingMessageEnvelope) GenericSystemDescriptor(org.apache.samza.system.descriptors.GenericSystemDescriptor) InMemorySystemFactory(org.apache.samza.system.inmemory.InMemorySystemFactory)

Example 7 with EndOfStreamMessage

use of org.apache.samza.system.EndOfStreamMessage in project samza by apache.

the class TestEndOfStreamStates method testUpdate.

@Test
public void testUpdate() {
    SystemStream input = new SystemStream("system", "input");
    SystemStream intermediate = new SystemStream("system", "intermediate");
    Set<SystemStreamPartition> ssps = new HashSet<>();
    SystemStreamPartition inputPartition0 = new SystemStreamPartition(input, new Partition(0));
    SystemStreamPartition intPartition0 = new SystemStreamPartition(intermediate, new Partition(0));
    SystemStreamPartition intPartition1 = new SystemStreamPartition(intermediate, new Partition(1));
    ssps.add(inputPartition0);
    ssps.add(intPartition0);
    ssps.add(intPartition1);
    Map<SystemStream, Integer> producerCounts = new HashMap<>();
    producerCounts.put(intermediate, 2);
    EndOfStreamStates endOfStreamStates = new EndOfStreamStates(ssps, producerCounts);
    assertFalse(endOfStreamStates.isEndOfStream(input));
    assertFalse(endOfStreamStates.isEndOfStream(intermediate));
    assertFalse(endOfStreamStates.allEndOfStream());
    IncomingMessageEnvelope envelope = IncomingMessageEnvelope.buildEndOfStreamEnvelope(inputPartition0);
    endOfStreamStates.update((EndOfStreamMessage) envelope.getMessage(), envelope.getSystemStreamPartition());
    assertTrue(endOfStreamStates.isEndOfStream(input));
    assertFalse(endOfStreamStates.isEndOfStream(intermediate));
    assertFalse(endOfStreamStates.allEndOfStream());
    EndOfStreamMessage eos = new EndOfStreamMessage("task 0");
    endOfStreamStates.update(eos, intPartition0);
    endOfStreamStates.update(eos, intPartition1);
    assertFalse(endOfStreamStates.isEndOfStream(intermediate));
    assertFalse(endOfStreamStates.allEndOfStream());
    eos = new EndOfStreamMessage("task 1");
    endOfStreamStates.update(eos, intPartition0);
    endOfStreamStates.update(eos, intPartition1);
    assertTrue(endOfStreamStates.isEndOfStream(intermediate));
    assertTrue(endOfStreamStates.allEndOfStream());
}
Also used : Partition(org.apache.samza.Partition) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) HashMap(java.util.HashMap) SystemStream(org.apache.samza.system.SystemStream) IncomingMessageEnvelope(org.apache.samza.system.IncomingMessageEnvelope) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 8 with EndOfStreamMessage

use of org.apache.samza.system.EndOfStreamMessage in project samza by apache.

the class TestInMemoryManager method testGetSystemStreamMetadata.

@Test
public void testGetSystemStreamMetadata() {
    this.inMemoryManager.initializeStream(new StreamSpec(STREAM0, STREAM0, SYSTEM, 1));
    this.inMemoryManager.initializeStream(new StreamSpec(STREAM1, STREAM1, SYSTEM, 1));
    // add some other stream which we won't request metadata for
    this.inMemoryManager.initializeStream(new StreamSpec("otherStream", "otherStream", SYSTEM, 1));
    // empty stream
    SystemStreamMetadata systemStreamMetadata0 = new SystemStreamMetadata(STREAM0, ImmutableMap.of(new Partition(0), new SystemStreamMetadata.SystemStreamPartitionMetadata(null, null, "0")));
    assertEquals(ImmutableMap.of(STREAM0, systemStreamMetadata0), this.inMemoryManager.getSystemStreamMetadata(SYSTEM, ImmutableSet.of(STREAM0)));
    // add a message in
    SystemStreamPartition ssp0 = new SystemStreamPartition(SYSTEM, STREAM0, new Partition(0));
    this.inMemoryManager.put(ssp0, "key00", "message00");
    systemStreamMetadata0 = new SystemStreamMetadata(STREAM0, ImmutableMap.of(new Partition(0), new SystemStreamMetadata.SystemStreamPartitionMetadata("0", "0", "1")));
    assertEquals(ImmutableMap.of(STREAM0, systemStreamMetadata0), this.inMemoryManager.getSystemStreamMetadata(SYSTEM, ImmutableSet.of(STREAM0)));
    // add a second message to the first stream and add one message to the second stream
    this.inMemoryManager.put(ssp0, "key01", "message01");
    SystemStreamPartition ssp1 = new SystemStreamPartition(SYSTEM, STREAM1, new Partition(0));
    this.inMemoryManager.put(ssp1, "key10", "message10");
    systemStreamMetadata0 = new SystemStreamMetadata(STREAM0, ImmutableMap.of(new Partition(0), new SystemStreamMetadata.SystemStreamPartitionMetadata("0", "1", "2")));
    SystemStreamMetadata systemStreamMetadata1 = new SystemStreamMetadata(STREAM1, ImmutableMap.of(new Partition(0), new SystemStreamMetadata.SystemStreamPartitionMetadata("0", "0", "1")));
    // also test a batch call for multiple streams here
    assertEquals(ImmutableMap.of(STREAM0, systemStreamMetadata0, STREAM1, systemStreamMetadata1), this.inMemoryManager.getSystemStreamMetadata(SYSTEM, ImmutableSet.of(STREAM0, STREAM1)));
    // test END_OF_STREAM doesn't alter new or upcoming offset
    this.inMemoryManager.put(ssp0, "key02", new EndOfStreamMessage());
    systemStreamMetadata0 = new SystemStreamMetadata(STREAM0, ImmutableMap.of(new Partition(0), new SystemStreamMetadata.SystemStreamPartitionMetadata("0", "1", "2")));
    assertEquals(ImmutableMap.of(STREAM0, systemStreamMetadata0), this.inMemoryManager.getSystemStreamMetadata(SYSTEM, ImmutableSet.of(STREAM0)));
}
Also used : StreamSpec(org.apache.samza.system.StreamSpec) Partition(org.apache.samza.Partition) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) SystemStreamMetadata(org.apache.samza.system.SystemStreamMetadata) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) Test(org.junit.Test)

Example 9 with EndOfStreamMessage

use of org.apache.samza.system.EndOfStreamMessage in project samza by apache.

the class OperatorImpl method aggregateEndOfStream.

/**
 * Aggregate {@link EndOfStreamMessage} from each ssp of the stream.
 * Invoke onEndOfStream() if the stream reaches the end.
 * @param eos {@link EndOfStreamMessage} object
 * @param ssp system stream partition
 * @param collector message collector
 * @param coordinator task coordinator
 */
public final CompletionStage<Void> aggregateEndOfStream(EndOfStreamMessage eos, SystemStreamPartition ssp, MessageCollector collector, TaskCoordinator coordinator) {
    LOG.info("Received end-of-stream message from task {} in {}", eos.getTaskName(), ssp);
    eosStates.update(eos, ssp);
    SystemStream stream = ssp.getSystemStream();
    CompletionStage<Void> endOfStreamFuture = CompletableFuture.completedFuture(null);
    if (eosStates.isEndOfStream(stream)) {
        LOG.info("Input {} reaches the end for task {}", stream.toString(), taskName.getTaskName());
        if (eos.getTaskName() != null && shouldTaskBroadcastToOtherPartitions(ssp)) {
            // This is the aggregation task, which already received all the eos messages from upstream
            // broadcast the end-of-stream to all the peer partitions
            // additionally if elasiticty is enabled
            // then only one of the elastic tasks of the ssp will broadcast
            controlMessageSender.broadcastToOtherPartitions(new EndOfStreamMessage(), ssp, collector);
        }
        // populate the end-of-stream through the dag
        endOfStreamFuture = onEndOfStream(collector, coordinator).thenAccept(result -> {
            if (eosStates.allEndOfStream()) {
                // all inputs have been end-of-stream, shut down the task
                LOG.info("All input streams have reached the end for task {}", taskName.getTaskName());
                coordinator.commit(TaskCoordinator.RequestScope.CURRENT_TASK);
                coordinator.shutdown(TaskCoordinator.RequestScope.CURRENT_TASK);
            }
        });
    }
    return endOfStreamFuture;
}
Also used : ScheduledFunction(org.apache.samza.operators.functions.ScheduledFunction) MetricsConfig(org.apache.samza.config.MetricsConfig) LoggerFactory(org.slf4j.LoggerFactory) JobConfig(org.apache.samza.config.JobConfig) CompletableFuture(java.util.concurrent.CompletableFuture) TaskModel(org.apache.samza.job.model.TaskModel) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) TaskContext(org.apache.samza.context.TaskContext) WatermarkFunction(org.apache.samza.operators.functions.WatermarkFunction) Counter(org.apache.samza.metrics.Counter) OperatorSpec(org.apache.samza.operators.spec.OperatorSpec) CallbackScheduler(org.apache.samza.scheduler.CallbackScheduler) MessageCollector(org.apache.samza.task.MessageCollector) SystemStream(org.apache.samza.system.SystemStream) WatermarkMessage(org.apache.samza.system.WatermarkMessage) HighResolutionClock(org.apache.samza.util.HighResolutionClock) LinkedHashSet(java.util.LinkedHashSet) TaskName(org.apache.samza.container.TaskName) Logger(org.slf4j.Logger) Timer(org.apache.samza.metrics.Timer) Collection(java.util.Collection) ContainerContext(org.apache.samza.context.ContainerContext) Set(java.util.Set) Scheduler(org.apache.samza.operators.Scheduler) MetricsRegistry(org.apache.samza.metrics.MetricsRegistry) SamzaException(org.apache.samza.SamzaException) TaskCoordinator(org.apache.samza.task.TaskCoordinator) Context(org.apache.samza.context.Context) CompletionStage(java.util.concurrent.CompletionStage) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) VisibleForTesting(com.google.common.annotations.VisibleForTesting) Config(org.apache.samza.config.Config) Collections(java.util.Collections) InternalTaskContext(org.apache.samza.context.InternalTaskContext) SystemStream(org.apache.samza.system.SystemStream) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage)

Example 10 with EndOfStreamMessage

use of org.apache.samza.system.EndOfStreamMessage in project samza by apache.

the class TestIntermediateMessageSerde method testEndOfStreamMessageSerde.

@Test
public void testEndOfStreamMessageSerde() {
    IntermediateMessageSerde imserde = new IntermediateMessageSerde(new ObjectSerde());
    String streamId = "test-stream";
    String taskName = "task-1";
    EndOfStreamMessage eos = new EndOfStreamMessage(taskName);
    byte[] bytes = imserde.toBytes(eos);
    EndOfStreamMessage de = (EndOfStreamMessage) imserde.fromBytes(bytes);
    assertEquals(MessageType.of(de), MessageType.END_OF_STREAM);
    assertEquals(de.getTaskName(), taskName);
    assertEquals(de.getVersion(), 1);
}
Also used : IntermediateMessageSerde(org.apache.samza.serializers.IntermediateMessageSerde) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) Test(org.junit.Test)

Aggregations

EndOfStreamMessage (org.apache.samza.system.EndOfStreamMessage)10 SystemStream (org.apache.samza.system.SystemStream)7 Config (org.apache.samza.config.Config)5 IncomingMessageEnvelope (org.apache.samza.system.IncomingMessageEnvelope)5 StreamSpec (org.apache.samza.system.StreamSpec)5 SystemStreamPartition (org.apache.samza.system.SystemStreamPartition)5 Test (org.junit.Test)5 Set (java.util.Set)4 Partition (org.apache.samza.Partition)4 MapConfig (org.apache.samza.config.MapConfig)4 OutgoingMessageEnvelope (org.apache.samza.system.OutgoingMessageEnvelope)4 WatermarkMessage (org.apache.samza.system.WatermarkMessage)4 List (java.util.List)3 Map (java.util.Map)3 SamzaException (org.apache.samza.SamzaException)3 MetricsRegistry (org.apache.samza.metrics.MetricsRegistry)3 SystemProducer (org.apache.samza.system.SystemProducer)3 VisibleForTesting (com.google.common.annotations.VisibleForTesting)2 ArrayList (java.util.ArrayList)2 Collections (java.util.Collections)2