Search in sources :

Example 1 with WatermarkMessage

use of org.apache.samza.system.WatermarkMessage in project samza by apache.

the class StreamOperatorTask method processAsync.

/**
 * Passes the incoming message envelopes along to the {@link InputOperatorImpl} node
 * for the input {@link SystemStream}. It is non-blocking and dispatches the message to the container thread
 * pool. The thread pool size is configured through job.container.thread.pool.size. In the absence of the config,
 * the task executes the DAG on the run loop thread.
 * <p>
 * From then on, each {@link org.apache.samza.operators.impl.OperatorImpl} propagates its transformed output to
 * its chained {@link org.apache.samza.operators.impl.OperatorImpl}s itself.
 *
 * @param ime incoming message envelope to process
 * @param collector the collector to send messages with
 * @param coordinator the coordinator to request commits or shutdown
 * @param callback the task callback handle
 */
@Override
public final void processAsync(IncomingMessageEnvelope ime, MessageCollector collector, TaskCoordinator coordinator, TaskCallback callback) {
    Runnable processRunnable = () -> {
        try {
            SystemStream systemStream = ime.getSystemStreamPartition().getSystemStream();
            InputOperatorImpl inputOpImpl = operatorImplGraph.getInputOperator(systemStream);
            if (inputOpImpl != null) {
                CompletionStage<Void> processFuture;
                MessageType messageType = MessageType.of(ime.getMessage());
                switch(messageType) {
                    case USER_MESSAGE:
                        processFuture = inputOpImpl.onMessageAsync(ime, collector, coordinator);
                        break;
                    case END_OF_STREAM:
                        EndOfStreamMessage eosMessage = (EndOfStreamMessage) ime.getMessage();
                        processFuture = inputOpImpl.aggregateEndOfStream(eosMessage, ime.getSystemStreamPartition(), collector, coordinator);
                        break;
                    case WATERMARK:
                        WatermarkMessage watermarkMessage = (WatermarkMessage) ime.getMessage();
                        processFuture = inputOpImpl.aggregateWatermark(watermarkMessage, ime.getSystemStreamPartition(), collector, coordinator);
                        break;
                    default:
                        processFuture = failedFuture(new SamzaException("Unknown message type " + messageType + " encountered."));
                        break;
                }
                processFuture.whenComplete((val, ex) -> {
                    if (ex != null) {
                        callback.failure(ex);
                    } else {
                        callback.complete();
                    }
                });
            } else {
                // If InputOperator is not found in the operator graph for a given SystemStream, throw an exception else the
                // job will timeout due to async task callback timeout (TaskCallbackTimeoutException)
                final String errMessage = String.format("InputOperator not found in OperatorGraph for %s. The available input" + " operators are: %s. Please check SystemStream configuration for the `SystemConsumer` and/or task.inputs" + " task configuration.", systemStream, operatorImplGraph.getAllInputOperators());
                LOG.error(errMessage);
                callback.failure(new SamzaException(errMessage));
            }
        } catch (Exception e) {
            LOG.error("Failed to process the incoming message due to ", e);
            callback.failure(e);
        }
    };
    if (taskThreadPool != null) {
        LOG.debug("Processing message using thread pool.");
        taskThreadPool.submit(processRunnable);
    } else {
        LOG.debug("Processing message on the run loop thread.");
        processRunnable.run();
    }
}
Also used : IncomingMessageEnvelope(org.apache.samza.system.IncomingMessageEnvelope) Logger(org.slf4j.Logger) InputOperatorImpl(org.apache.samza.operators.impl.InputOperatorImpl) LoggerFactory(org.slf4j.LoggerFactory) CompletableFuture(java.util.concurrent.CompletableFuture) Clock(org.apache.samza.util.Clock) MessageType(org.apache.samza.system.MessageType) OperatorSpecGraph(org.apache.samza.operators.OperatorSpecGraph) SamzaException(org.apache.samza.SamzaException) Context(org.apache.samza.context.Context) CompletionStage(java.util.concurrent.CompletionStage) SystemClock(org.apache.samza.util.SystemClock) OperatorImplGraph(org.apache.samza.operators.impl.OperatorImplGraph) SystemStream(org.apache.samza.system.SystemStream) WatermarkMessage(org.apache.samza.system.WatermarkMessage) Preconditions(com.google.common.base.Preconditions) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) VisibleForTesting(com.google.common.annotations.VisibleForTesting) ExecutorService(java.util.concurrent.ExecutorService) WatermarkMessage(org.apache.samza.system.WatermarkMessage) SystemStream(org.apache.samza.system.SystemStream) InputOperatorImpl(org.apache.samza.operators.impl.InputOperatorImpl) SamzaException(org.apache.samza.SamzaException) CompletionStage(java.util.concurrent.CompletionStage) MessageType(org.apache.samza.system.MessageType) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) SamzaException(org.apache.samza.SamzaException)

Example 2 with WatermarkMessage

use of org.apache.samza.system.WatermarkMessage in project samza by apache.

the class TestIntermediateMessageSerde method testWatermarkMessageSerde.

@Test
public void testWatermarkMessageSerde() {
    IntermediateMessageSerde imserde = new IntermediateMessageSerde(new ObjectSerde());
    String taskName = "task-1";
    WatermarkMessage watermark = new WatermarkMessage(System.currentTimeMillis(), taskName);
    byte[] bytes = imserde.toBytes(watermark);
    WatermarkMessage de = (WatermarkMessage) imserde.fromBytes(bytes);
    assertEquals(MessageType.of(de), MessageType.WATERMARK);
    assertEquals(de.getTaskName(), taskName);
    assertTrue(de.getTimestamp() > 0);
}
Also used : WatermarkMessage(org.apache.samza.system.WatermarkMessage) IntermediateMessageSerde(org.apache.samza.serializers.IntermediateMessageSerde) Test(org.junit.Test)

Example 3 with WatermarkMessage

use of org.apache.samza.system.WatermarkMessage in project samza by apache.

the class IntermediateMessageSerde method toBytes.

@Override
public byte[] toBytes(Object object) {
    final byte[] data;
    final MessageType type = MessageType.of(object);
    switch(type) {
        case USER_MESSAGE:
            data = userMessageSerde.toBytes(object);
            break;
        case WATERMARK:
            data = watermarkSerde.toBytes((WatermarkMessage) object);
            break;
        case END_OF_STREAM:
            data = eosSerde.toBytes((EndOfStreamMessage) object);
            break;
        default:
            throw new SamzaException("Unknown message type: " + type.name());
    }
    final byte[] bytes = new byte[data.length + 1];
    bytes[0] = (byte) type.ordinal();
    System.arraycopy(data, 0, bytes, 1, data.length);
    return bytes;
}
Also used : WatermarkMessage(org.apache.samza.system.WatermarkMessage) SamzaException(org.apache.samza.SamzaException) MessageType(org.apache.samza.system.MessageType) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage)

Example 4 with WatermarkMessage

use of org.apache.samza.system.WatermarkMessage in project samza by apache.

the class TestWatermarkStates method testUpdate.

@Test
public void testUpdate() {
    SystemStream input = new SystemStream("system", "input");
    SystemStream intermediate = new SystemStream("system", "intermediate");
    Set<SystemStreamPartition> ssps = new HashSet<>();
    SystemStreamPartition inputPartition0 = new SystemStreamPartition(input, new Partition(0));
    SystemStreamPartition intPartition0 = new SystemStreamPartition(intermediate, new Partition(0));
    SystemStreamPartition intPartition1 = new SystemStreamPartition(intermediate, new Partition(1));
    ssps.add(inputPartition0);
    ssps.add(intPartition0);
    ssps.add(intPartition1);
    Map<SystemStream, Integer> producerCounts = new HashMap<>();
    producerCounts.put(intermediate, 2);
    // advance watermark on input to 5
    WatermarkStates watermarkStates = new WatermarkStates(ssps, producerCounts, new MetricsRegistryMap());
    IncomingMessageEnvelope envelope = IncomingMessageEnvelope.buildWatermarkEnvelope(inputPartition0, 5L);
    watermarkStates.update((WatermarkMessage) envelope.getMessage(), envelope.getSystemStreamPartition());
    assertEquals(watermarkStates.getWatermark(input), 5L);
    assertEquals(watermarkStates.getWatermark(intermediate), WATERMARK_NOT_EXIST);
    // watermark from task 0 on int p0 to 6
    WatermarkMessage watermarkMessage = new WatermarkMessage(6L, "task 0");
    watermarkStates.update(watermarkMessage, intPartition0);
    assertEquals(watermarkStates.getWatermarkPerSSP(intPartition0), WATERMARK_NOT_EXIST);
    assertEquals(watermarkStates.getWatermark(intermediate), WATERMARK_NOT_EXIST);
    // watermark from task 1 on int p0 to 3
    watermarkMessage = new WatermarkMessage(3L, "task 1");
    watermarkStates.update(watermarkMessage, intPartition0);
    assertEquals(watermarkStates.getWatermarkPerSSP(intPartition0), 3L);
    assertEquals(watermarkStates.getWatermark(intermediate), WATERMARK_NOT_EXIST);
    // watermark from task 0 on int p1 to 10
    watermarkMessage = new WatermarkMessage(10L, "task 0");
    watermarkStates.update(watermarkMessage, intPartition1);
    assertEquals(watermarkStates.getWatermarkPerSSP(intPartition1), WATERMARK_NOT_EXIST);
    assertEquals(watermarkStates.getWatermark(intermediate), WATERMARK_NOT_EXIST);
    // watermark from task 1 on int p1 to 4
    watermarkMessage = new WatermarkMessage(4L, "task 1");
    watermarkStates.update(watermarkMessage, intPartition1);
    assertEquals(watermarkStates.getWatermarkPerSSP(intPartition1), 4L);
    // verify we got a watermark 3 (min) for int stream
    assertEquals(watermarkStates.getWatermark(intermediate), 3L);
    // advance watermark from task 1 on int p0 to 8
    watermarkMessage = new WatermarkMessage(8L, "task 1");
    watermarkStates.update(watermarkMessage, intPartition0);
    assertEquals(watermarkStates.getWatermarkPerSSP(intPartition0), 6L);
    // verify we got a watermark 4 (min) for int stream
    assertEquals(watermarkStates.getWatermark(intermediate), 4L);
    // advance watermark from task 1 on int p1 to 7
    watermarkMessage = new WatermarkMessage(7L, "task 1");
    watermarkStates.update(watermarkMessage, intPartition1);
    assertEquals(watermarkStates.getWatermarkPerSSP(intPartition1), 7L);
    // verify we got a watermark 6 (min) for int stream
    assertEquals(watermarkStates.getWatermark(intermediate), 6L);
}
Also used : Partition(org.apache.samza.Partition) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) WatermarkMessage(org.apache.samza.system.WatermarkMessage) HashMap(java.util.HashMap) SystemStream(org.apache.samza.system.SystemStream) IncomingMessageEnvelope(org.apache.samza.system.IncomingMessageEnvelope) MetricsRegistryMap(org.apache.samza.metrics.MetricsRegistryMap) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 5 with WatermarkMessage

use of org.apache.samza.system.WatermarkMessage in project beam by apache.

the class TranslationContext method createDummyStreamDescriptor.

/**
 * The dummy stream created will only be used in Beam tests.
 */
private static InputDescriptor<OpMessage<String>, ?> createDummyStreamDescriptor(String id) {
    final GenericSystemDescriptor dummySystem = new GenericSystemDescriptor(id, InMemorySystemFactory.class.getName());
    final GenericInputDescriptor<OpMessage<String>> dummyInput = dummySystem.getInputDescriptor(id, new NoOpSerde<>());
    dummyInput.withOffsetDefault(SystemStreamMetadata.OffsetType.OLDEST);
    final Config config = new MapConfig(dummyInput.toConfig(), dummySystem.toConfig());
    final SystemFactory factory = new InMemorySystemFactory();
    final StreamSpec dummyStreamSpec = new StreamSpec(id, id, id, 1);
    factory.getAdmin(id, config).createStream(dummyStreamSpec);
    final SystemProducer producer = factory.getProducer(id, config, null);
    final SystemStream sysStream = new SystemStream(id, id);
    final Consumer<Object> sendFn = (msg) -> {
        producer.send(id, new OutgoingMessageEnvelope(sysStream, 0, null, msg));
    };
    final WindowedValue<String> windowedValue = WindowedValue.timestampedValueInGlobalWindow("dummy", new Instant());
    sendFn.accept(OpMessage.ofElement(windowedValue));
    sendFn.accept(new WatermarkMessage(BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis()));
    sendFn.accept(new EndOfStreamMessage(null));
    return dummyInput;
}
Also used : InMemorySystemFactory(org.apache.samza.system.inmemory.InMemorySystemFactory) WindowedValue(org.apache.beam.sdk.util.WindowedValue) TableDescriptor(org.apache.samza.table.descriptors.TableDescriptor) GenericSystemDescriptor(org.apache.samza.system.descriptors.GenericSystemDescriptor) LoggerFactory(org.slf4j.LoggerFactory) HashMap(java.util.HashMap) OpMessage(org.apache.beam.runners.samza.runtime.OpMessage) GenericInputDescriptor(org.apache.samza.system.descriptors.GenericInputDescriptor) TransformInputs(org.apache.beam.runners.core.construction.TransformInputs) SystemStreamMetadata(org.apache.samza.system.SystemStreamMetadata) PTransform(org.apache.beam.sdk.transforms.PTransform) HashSet(java.util.HashSet) TupleTag(org.apache.beam.sdk.values.TupleTag) SystemStream(org.apache.samza.system.SystemStream) Map(java.util.Map) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) WatermarkMessage(org.apache.samza.system.WatermarkMessage) MapConfig(org.apache.samza.config.MapConfig) KV(org.apache.samza.operators.KV) NoOpSerde(org.apache.samza.serializers.NoOpSerde) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform) OutputDescriptor(org.apache.samza.system.descriptors.OutputDescriptor) MessageStream(org.apache.samza.operators.MessageStream) Table(org.apache.samza.table.Table) InputDescriptor(org.apache.samza.system.descriptors.InputDescriptor) Logger(org.slf4j.Logger) Set(java.util.Set) SystemFactory(org.apache.samza.system.SystemFactory) StreamSpec(org.apache.samza.system.StreamSpec) UUID(java.util.UUID) PCollection(org.apache.beam.sdk.values.PCollection) HashIdGenerator(org.apache.beam.runners.samza.util.HashIdGenerator) Consumer(java.util.function.Consumer) SamzaPipelineOptions(org.apache.beam.runners.samza.SamzaPipelineOptions) List(java.util.List) PValue(org.apache.beam.sdk.values.PValue) SystemProducer(org.apache.samza.system.SystemProducer) StreamApplicationDescriptor(org.apache.samza.application.descriptors.StreamApplicationDescriptor) PCollectionView(org.apache.beam.sdk.values.PCollectionView) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) Instant(org.joda.time.Instant) OutgoingMessageEnvelope(org.apache.samza.system.OutgoingMessageEnvelope) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) Config(org.apache.samza.config.Config) Collections(java.util.Collections) OutputStream(org.apache.samza.operators.OutputStream) StreamSpec(org.apache.samza.system.StreamSpec) InMemorySystemFactory(org.apache.samza.system.inmemory.InMemorySystemFactory) SystemFactory(org.apache.samza.system.SystemFactory) OpMessage(org.apache.beam.runners.samza.runtime.OpMessage) MapConfig(org.apache.samza.config.MapConfig) Config(org.apache.samza.config.Config) SystemProducer(org.apache.samza.system.SystemProducer) SystemStream(org.apache.samza.system.SystemStream) Instant(org.joda.time.Instant) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) WatermarkMessage(org.apache.samza.system.WatermarkMessage) MapConfig(org.apache.samza.config.MapConfig) OutgoingMessageEnvelope(org.apache.samza.system.OutgoingMessageEnvelope) GenericSystemDescriptor(org.apache.samza.system.descriptors.GenericSystemDescriptor) InMemorySystemFactory(org.apache.samza.system.inmemory.InMemorySystemFactory)

Aggregations

WatermarkMessage (org.apache.samza.system.WatermarkMessage)9 SystemStream (org.apache.samza.system.SystemStream)6 Test (org.junit.Test)5 HashMap (java.util.HashMap)4 HashSet (java.util.HashSet)4 EndOfStreamMessage (org.apache.samza.system.EndOfStreamMessage)4 SystemStreamPartition (org.apache.samza.system.SystemStreamPartition)4 Partition (org.apache.samza.Partition)3 SamzaException (org.apache.samza.SamzaException)3 MessageCollector (org.apache.samza.task.MessageCollector)3 VisibleForTesting (com.google.common.annotations.VisibleForTesting)2 Collections (java.util.Collections)2 List (java.util.List)2 Set (java.util.Set)2 CompletableFuture (java.util.concurrent.CompletableFuture)2 CompletionStage (java.util.concurrent.CompletionStage)2 OpMessage (org.apache.beam.runners.samza.runtime.OpMessage)2 Config (org.apache.samza.config.Config)2 MapConfig (org.apache.samza.config.MapConfig)2 Context (org.apache.samza.context.Context)2