Search in sources :

Example 16 with NoOpSerde

use of org.apache.samza.serializers.NoOpSerde in project beam by apache.

the class SamzaTestStreamTranslator method createInputDescriptor.

@SuppressWarnings("unchecked")
private static <T> GenericInputDescriptor<KV<?, OpMessage<T>>> createInputDescriptor(String id, String encodedTestStream, SerializableFunction<String, TestStream<T>> testStreamDecoder) {
    final Map<String, String> systemConfig = ImmutableMap.of(ENCODED_TEST_STREAM, encodedTestStream, TEST_STREAM_DECODER, Base64Serializer.serializeUnchecked(testStreamDecoder));
    final GenericSystemDescriptor systemDescriptor = new GenericSystemDescriptor(id, SamzaTestStreamSystemFactory.class.getName()).withSystemConfigs(systemConfig);
    // The KvCoder is needed here for Samza not to crop the key.
    final Serde<KV<?, OpMessage<T>>> kvSerde = KVSerde.of(new NoOpSerde(), new NoOpSerde<>());
    return systemDescriptor.getInputDescriptor(id, kvSerde);
}
Also used : NoOpSerde(org.apache.samza.serializers.NoOpSerde) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) KV(org.apache.samza.operators.KV) GenericSystemDescriptor(org.apache.samza.system.descriptors.GenericSystemDescriptor)

Example 17 with NoOpSerde

use of org.apache.samza.serializers.NoOpSerde in project beam by apache.

the class TranslationContext method createDummyStreamDescriptor.

/**
 * The dummy stream created will only be used in Beam tests.
 */
private static InputDescriptor<OpMessage<String>, ?> createDummyStreamDescriptor(String id) {
    final GenericSystemDescriptor dummySystem = new GenericSystemDescriptor(id, InMemorySystemFactory.class.getName());
    final GenericInputDescriptor<OpMessage<String>> dummyInput = dummySystem.getInputDescriptor(id, new NoOpSerde<>());
    dummyInput.withOffsetDefault(SystemStreamMetadata.OffsetType.OLDEST);
    final Config config = new MapConfig(dummyInput.toConfig(), dummySystem.toConfig());
    final SystemFactory factory = new InMemorySystemFactory();
    final StreamSpec dummyStreamSpec = new StreamSpec(id, id, id, 1);
    factory.getAdmin(id, config).createStream(dummyStreamSpec);
    final SystemProducer producer = factory.getProducer(id, config, null);
    final SystemStream sysStream = new SystemStream(id, id);
    final Consumer<Object> sendFn = (msg) -> {
        producer.send(id, new OutgoingMessageEnvelope(sysStream, 0, null, msg));
    };
    final WindowedValue<String> windowedValue = WindowedValue.timestampedValueInGlobalWindow("dummy", new Instant());
    sendFn.accept(OpMessage.ofElement(windowedValue));
    sendFn.accept(new WatermarkMessage(BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis()));
    sendFn.accept(new EndOfStreamMessage(null));
    return dummyInput;
}
Also used : InMemorySystemFactory(org.apache.samza.system.inmemory.InMemorySystemFactory) WindowedValue(org.apache.beam.sdk.util.WindowedValue) TableDescriptor(org.apache.samza.table.descriptors.TableDescriptor) GenericSystemDescriptor(org.apache.samza.system.descriptors.GenericSystemDescriptor) LoggerFactory(org.slf4j.LoggerFactory) HashMap(java.util.HashMap) OpMessage(org.apache.beam.runners.samza.runtime.OpMessage) GenericInputDescriptor(org.apache.samza.system.descriptors.GenericInputDescriptor) TransformInputs(org.apache.beam.runners.core.construction.TransformInputs) SystemStreamMetadata(org.apache.samza.system.SystemStreamMetadata) PTransform(org.apache.beam.sdk.transforms.PTransform) HashSet(java.util.HashSet) TupleTag(org.apache.beam.sdk.values.TupleTag) SystemStream(org.apache.samza.system.SystemStream) Map(java.util.Map) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) WatermarkMessage(org.apache.samza.system.WatermarkMessage) MapConfig(org.apache.samza.config.MapConfig) KV(org.apache.samza.operators.KV) NoOpSerde(org.apache.samza.serializers.NoOpSerde) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform) OutputDescriptor(org.apache.samza.system.descriptors.OutputDescriptor) MessageStream(org.apache.samza.operators.MessageStream) Table(org.apache.samza.table.Table) InputDescriptor(org.apache.samza.system.descriptors.InputDescriptor) Logger(org.slf4j.Logger) Set(java.util.Set) SystemFactory(org.apache.samza.system.SystemFactory) StreamSpec(org.apache.samza.system.StreamSpec) UUID(java.util.UUID) PCollection(org.apache.beam.sdk.values.PCollection) HashIdGenerator(org.apache.beam.runners.samza.util.HashIdGenerator) Consumer(java.util.function.Consumer) SamzaPipelineOptions(org.apache.beam.runners.samza.SamzaPipelineOptions) List(java.util.List) PValue(org.apache.beam.sdk.values.PValue) SystemProducer(org.apache.samza.system.SystemProducer) StreamApplicationDescriptor(org.apache.samza.application.descriptors.StreamApplicationDescriptor) PCollectionView(org.apache.beam.sdk.values.PCollectionView) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) Instant(org.joda.time.Instant) OutgoingMessageEnvelope(org.apache.samza.system.OutgoingMessageEnvelope) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) Config(org.apache.samza.config.Config) Collections(java.util.Collections) OutputStream(org.apache.samza.operators.OutputStream) StreamSpec(org.apache.samza.system.StreamSpec) InMemorySystemFactory(org.apache.samza.system.inmemory.InMemorySystemFactory) SystemFactory(org.apache.samza.system.SystemFactory) OpMessage(org.apache.beam.runners.samza.runtime.OpMessage) MapConfig(org.apache.samza.config.MapConfig) Config(org.apache.samza.config.Config) SystemProducer(org.apache.samza.system.SystemProducer) SystemStream(org.apache.samza.system.SystemStream) Instant(org.joda.time.Instant) EndOfStreamMessage(org.apache.samza.system.EndOfStreamMessage) WatermarkMessage(org.apache.samza.system.WatermarkMessage) MapConfig(org.apache.samza.config.MapConfig) OutgoingMessageEnvelope(org.apache.samza.system.OutgoingMessageEnvelope) GenericSystemDescriptor(org.apache.samza.system.descriptors.GenericSystemDescriptor) InMemorySystemFactory(org.apache.samza.system.inmemory.InMemorySystemFactory)

Example 18 with NoOpSerde

use of org.apache.samza.serializers.NoOpSerde in project samza by apache.

the class TestOperatorSpecGraph method setUp.

@Before
public void setUp() {
    this.mockAppDesc = mock(StreamApplicationDescriptorImpl.class);
    /**
     * Setup two linear transformation pipelines:
     * 1) input1 --> filter --> sendTo
     * 2) input2 --> map --> sink
     */
    String inputStreamId1 = "test-input-1";
    String outputStreamId = "test-output-1";
    InputOperatorSpec testInput = new InputOperatorSpec(inputStreamId1, new NoOpSerde(), new NoOpSerde(), null, true, inputStreamId1);
    StreamOperatorSpec filterOp = OperatorSpecs.createFilterOperatorSpec(m -> true, "test-filter-2");
    OutputStreamImpl outputStream1 = new OutputStreamImpl(outputStreamId, null, null, true);
    OutputOperatorSpec outputSpec = OperatorSpecs.createSendToOperatorSpec(outputStream1, "test-output-3");
    testInput.registerNextOperatorSpec(filterOp);
    filterOp.registerNextOperatorSpec(outputSpec);
    String streamId2 = "test-input-2";
    InputOperatorSpec testInput2 = new InputOperatorSpec(streamId2, new NoOpSerde(), new NoOpSerde(), null, true, "test-input-4");
    StreamOperatorSpec testMap = OperatorSpecs.createMapOperatorSpec(m -> m, "test-map-5");
    SinkOperatorSpec testSink = OperatorSpecs.createSinkOperatorSpec((m, mc, tc) -> {
    }, "test-sink-6");
    testInput2.registerNextOperatorSpec(testMap);
    testMap.registerNextOperatorSpec(testSink);
    this.inputOpSpecMap = new LinkedHashMap<>();
    inputOpSpecMap.put(inputStreamId1, testInput);
    inputOpSpecMap.put(streamId2, testInput2);
    this.outputStrmMap = new LinkedHashMap<>();
    outputStrmMap.put(outputStreamId, outputStream1);
    when(mockAppDesc.getInputOperators()).thenReturn(Collections.unmodifiableMap(inputOpSpecMap));
    when(mockAppDesc.getOutputStreams()).thenReturn(Collections.unmodifiableMap(outputStrmMap));
    this.allOpSpecs = new HashSet<OperatorSpec>() {

        {
            this.add(testInput);
            this.add(filterOp);
            this.add(outputSpec);
            this.add(testInput2);
            this.add(testMap);
            this.add(testSink);
        }
    };
}
Also used : StreamOperatorSpec(org.apache.samza.operators.spec.StreamOperatorSpec) OperatorSpec(org.apache.samza.operators.spec.OperatorSpec) SinkOperatorSpec(org.apache.samza.operators.spec.SinkOperatorSpec) OutputOperatorSpec(org.apache.samza.operators.spec.OutputOperatorSpec) InputOperatorSpec(org.apache.samza.operators.spec.InputOperatorSpec) InputOperatorSpec(org.apache.samza.operators.spec.InputOperatorSpec) StreamOperatorSpec(org.apache.samza.operators.spec.StreamOperatorSpec) OutputStreamImpl(org.apache.samza.operators.spec.OutputStreamImpl) StreamApplicationDescriptorImpl(org.apache.samza.application.descriptors.StreamApplicationDescriptorImpl) NoOpSerde(org.apache.samza.serializers.NoOpSerde) OutputOperatorSpec(org.apache.samza.operators.spec.OutputOperatorSpec) SinkOperatorSpec(org.apache.samza.operators.spec.SinkOperatorSpec) Before(org.junit.Before)

Example 19 with NoOpSerde

use of org.apache.samza.serializers.NoOpSerde in project samza by apache.

the class TestPartitionByOperatorSpec method testPartitionBy.

@Test
public void testPartitionBy() {
    MapFunction<Object, String> keyFn = m -> m.toString();
    MapFunction<Object, Object> valueFn = m -> m;
    KVSerde<Object, Object> partitionBySerde = KVSerde.of(new NoOpSerde<>(), new NoOpSerde<>());
    StreamApplicationDescriptorImpl streamAppDesc = new StreamApplicationDescriptorImpl(appDesc -> {
        MessageStream inputStream = appDesc.getInputStream(testInputDescriptor);
        inputStream.partitionBy(keyFn, valueFn, partitionBySerde, testRepartitionedStreamName);
    }, getConfig());
    assertEquals(2, streamAppDesc.getInputOperators().size());
    Map<String, InputOperatorSpec> inputOpSpecs = streamAppDesc.getInputOperators();
    assertTrue(inputOpSpecs.keySet().contains(String.format("%s-%s-partition_by-%s", testJobName, testJobId, testRepartitionedStreamName)));
    InputOperatorSpec inputOpSpec = inputOpSpecs.get(String.format("%s-%s-partition_by-%s", testJobName, testJobId, testRepartitionedStreamName));
    assertEquals(String.format("%s-%s-partition_by-%s", testJobName, testJobId, testRepartitionedStreamName), inputOpSpec.getStreamId());
    assertTrue(inputOpSpec.getKeySerde() instanceof NoOpSerde);
    assertTrue(inputOpSpec.getValueSerde() instanceof NoOpSerde);
    assertTrue(inputOpSpec.isKeyed());
    assertNull(inputOpSpec.getScheduledFn());
    assertNull(inputOpSpec.getWatermarkFn());
    InputOperatorSpec originInputSpec = inputOpSpecs.get(testInputDescriptor.getStreamId());
    assertTrue(originInputSpec.getRegisteredOperatorSpecs().toArray()[0] instanceof PartitionByOperatorSpec);
    PartitionByOperatorSpec reparOpSpec = (PartitionByOperatorSpec) originInputSpec.getRegisteredOperatorSpecs().toArray()[0];
    assertEquals(reparOpSpec.getOpId(), String.format("%s-%s-partition_by-%s", testJobName, testJobId, testRepartitionedStreamName));
    assertEquals(reparOpSpec.getKeyFunction(), keyFn);
    assertEquals(reparOpSpec.getValueFunction(), valueFn);
    assertEquals(reparOpSpec.getOutputStream().getStreamId(), reparOpSpec.getOpId());
    assertNull(reparOpSpec.getScheduledFn());
    assertNull(reparOpSpec.getWatermarkFn());
}
Also used : StreamApplicationDescriptorImpl(org.apache.samza.application.descriptors.StreamApplicationDescriptorImpl) ScheduledFunction(org.apache.samza.operators.functions.ScheduledFunction) Assert.assertNotNull(org.junit.Assert.assertNotNull) Collection(java.util.Collection) GenericSystemDescriptor(org.apache.samza.system.descriptors.GenericSystemDescriptor) JobConfig(org.apache.samza.config.JobConfig) Assert.assertTrue(org.junit.Assert.assertTrue) HashMap(java.util.HashMap) Scheduler(org.apache.samza.operators.Scheduler) Serde(org.apache.samza.serializers.Serde) Test(org.junit.Test) GenericInputDescriptor(org.apache.samza.system.descriptors.GenericInputDescriptor) OperatorSpecGraph(org.apache.samza.operators.OperatorSpecGraph) MapFunction(org.apache.samza.operators.functions.MapFunction) WatermarkFunction(org.apache.samza.operators.functions.WatermarkFunction) Assert.assertNull(org.junit.Assert.assertNull) Map(java.util.Map) Config(org.apache.samza.config.Config) KVSerde(org.apache.samza.serializers.KVSerde) MapConfig(org.apache.samza.config.MapConfig) NoOpSerde(org.apache.samza.serializers.NoOpSerde) Assert.assertEquals(org.junit.Assert.assertEquals) MessageStream(org.apache.samza.operators.MessageStream) Mockito.mock(org.mockito.Mockito.mock) StreamApplicationDescriptorImpl(org.apache.samza.application.descriptors.StreamApplicationDescriptorImpl) MessageStream(org.apache.samza.operators.MessageStream) NoOpSerde(org.apache.samza.serializers.NoOpSerde) Test(org.junit.Test)

Example 20 with NoOpSerde

use of org.apache.samza.serializers.NoOpSerde in project samza by apache.

the class TestRemoteTableWithBatchEndToEnd method doTestStreamTableJoinRemoteTablePartialUpdates.

private void doTestStreamTableJoinRemoteTablePartialUpdates(String testName, boolean isCompactBatch) throws Exception {
    final InMemoryWriteFunction writer = new InMemoryWriteFunction(testName);
    BATCH_READS.put(testName, new AtomicInteger());
    BATCH_WRITES.put(testName, new AtomicInteger());
    WRITTEN_RECORDS.put(testName, new HashMap<>());
    int count = 16;
    int batchSize = 4;
    String profiles = Base64Serializer.serialize(generateProfiles(count));
    final RateLimiter readRateLimiter = mock(RateLimiter.class, withSettings().serializable());
    final RateLimiter writeRateLimiter = mock(RateLimiter.class, withSettings().serializable());
    final TableRateLimiter.CreditFunction creditFunction = (k, v, args) -> 1;
    final StreamApplication app = appDesc -> {
        RemoteTableDescriptor<Integer, Profile, Void> inputTableDesc = new RemoteTableDescriptor<>("profile-table-1");
        inputTableDesc.withReadFunction(InMemoryReadFunction.getInMemoryReadFunction(testName, profiles)).withRateLimiter(readRateLimiter, creditFunction, null);
        // dummy reader
        TableReadFunction<Integer, EnrichedPageView> readFn = new MyReadFunction();
        RemoteTableDescriptor<Integer, EnrichedPageView, EnrichedPageView> outputTableDesc = new RemoteTableDescriptor<>("enriched-page-view-table-1");
        outputTableDesc.withReadFunction(readFn).withWriteFunction(writer).withRateLimiter(writeRateLimiter, creditFunction, creditFunction);
        if (isCompactBatch) {
            outputTableDesc.withBatchProvider(new CompactBatchProvider<Integer, EnrichedPageView, EnrichedPageView>().withMaxBatchSize(batchSize).withMaxBatchDelay(Duration.ofHours(1)));
        } else {
            outputTableDesc.withBatchProvider(new CompleteBatchProvider<Integer, EnrichedPageView, EnrichedPageView>().withMaxBatchSize(batchSize).withMaxBatchDelay(Duration.ofHours(1)));
        }
        Table<KV<Integer, EnrichedPageView>> table = appDesc.getTable(outputTableDesc);
        Table<KV<Integer, Profile>> inputTable = appDesc.getTable(inputTableDesc);
        DelegatingSystemDescriptor ksd = new DelegatingSystemDescriptor("test");
        GenericInputDescriptor<PageView> isd = ksd.getInputDescriptor("PageView", new NoOpSerde<>());
        appDesc.getInputStream(isd).map(pv -> new KV<>(pv.getMemberId(), pv)).join(inputTable, new PageViewToProfileJoinFunction()).map(m -> new KV<>(m.getMemberId(), UpdateMessage.of(m, m))).sendTo(table, UpdateOptions.UPDATE_WITH_DEFAULTS);
    };
    InMemorySystemDescriptor isd = new InMemorySystemDescriptor("test");
    InMemoryInputDescriptor<PageView> inputDescriptor = isd.getInputDescriptor("PageView", new NoOpSerde<>());
    TestRunner.of(app).addInputStream(inputDescriptor, Arrays.asList(generatePageViewsWithDistinctKeys(count))).addConfig("task.max.concurrency", String.valueOf(count)).addConfig("task.async.commit", String.valueOf(true)).run(Duration.ofSeconds(10));
    Assert.assertEquals(count, WRITTEN_RECORDS.get(testName).size());
    Assert.assertNotNull(WRITTEN_RECORDS.get(testName).get(0));
    Assert.assertEquals(count / batchSize, BATCH_WRITES.get(testName).get());
}
Also used : Arrays(java.util.Arrays) UpdateMessage(org.apache.samza.operators.UpdateMessage) InMemorySystemDescriptor(org.apache.samza.test.framework.system.descriptors.InMemorySystemDescriptor) ObjectInputStream(java.io.ObjectInputStream) HashMap(java.util.HashMap) CompletableFuture(java.util.concurrent.CompletableFuture) GenericInputDescriptor(org.apache.samza.system.descriptors.GenericInputDescriptor) Function(java.util.function.Function) TableReadFunction(org.apache.samza.table.remote.TableReadFunction) Base64Serializer(org.apache.samza.test.util.Base64Serializer) InMemoryInputDescriptor(org.apache.samza.test.framework.system.descriptors.InMemoryInputDescriptor) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) TestTableData(org.apache.samza.test.table.TestTableData) DelegatingSystemDescriptor(org.apache.samza.system.descriptors.DelegatingSystemDescriptor) Duration(java.time.Duration) Map(java.util.Map) TableWriteFunction(org.apache.samza.table.remote.TableWriteFunction) KV(org.apache.samza.operators.KV) NoOpSerde(org.apache.samza.serializers.NoOpSerde) Table(org.apache.samza.table.Table) Collection(java.util.Collection) BaseTableFunction(org.apache.samza.table.remote.BaseTableFunction) IOException(java.io.IOException) Test(org.junit.Test) Collectors(java.util.stream.Collectors) SamzaException(org.apache.samza.SamzaException) TestRunner(org.apache.samza.test.framework.TestRunner) Mockito(org.mockito.Mockito) Entry(org.apache.samza.storage.kv.Entry) RateLimiter(org.apache.samza.util.RateLimiter) UpdateOptions(org.apache.samza.operators.UpdateOptions) CompactBatchProvider(org.apache.samza.table.batching.CompactBatchProvider) StreamApplication(org.apache.samza.application.StreamApplication) Assert(org.junit.Assert) CompleteBatchProvider(org.apache.samza.table.batching.CompleteBatchProvider) RemoteTableDescriptor(org.apache.samza.table.descriptors.RemoteTableDescriptor) TableRateLimiter(org.apache.samza.table.remote.TableRateLimiter) TableReadFunction(org.apache.samza.table.remote.TableReadFunction) InMemorySystemDescriptor(org.apache.samza.test.framework.system.descriptors.InMemorySystemDescriptor) CompleteBatchProvider(org.apache.samza.table.batching.CompleteBatchProvider) GenericInputDescriptor(org.apache.samza.system.descriptors.GenericInputDescriptor) Table(org.apache.samza.table.Table) StreamApplication(org.apache.samza.application.StreamApplication) RemoteTableDescriptor(org.apache.samza.table.descriptors.RemoteTableDescriptor) RateLimiter(org.apache.samza.util.RateLimiter) TableRateLimiter(org.apache.samza.table.remote.TableRateLimiter) TableRateLimiter(org.apache.samza.table.remote.TableRateLimiter) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) DelegatingSystemDescriptor(org.apache.samza.system.descriptors.DelegatingSystemDescriptor) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) NoOpSerde(org.apache.samza.serializers.NoOpSerde) CompactBatchProvider(org.apache.samza.table.batching.CompactBatchProvider)

Aggregations

NoOpSerde (org.apache.samza.serializers.NoOpSerde)34 KV (org.apache.samza.operators.KV)23 Test (org.junit.Test)23 KVSerde (org.apache.samza.serializers.KVSerde)17 GenericInputDescriptor (org.apache.samza.system.descriptors.GenericInputDescriptor)17 HashMap (java.util.HashMap)16 Duration (java.time.Duration)14 List (java.util.List)14 Map (java.util.Map)14 StreamApplication (org.apache.samza.application.StreamApplication)13 Table (org.apache.samza.table.Table)13 InMemorySystemDescriptor (org.apache.samza.test.framework.system.descriptors.InMemorySystemDescriptor)13 MapConfig (org.apache.samza.config.MapConfig)12 DelegatingSystemDescriptor (org.apache.samza.system.descriptors.DelegatingSystemDescriptor)12 GenericSystemDescriptor (org.apache.samza.system.descriptors.GenericSystemDescriptor)12 ArrayList (java.util.ArrayList)11 Collectors (java.util.stream.Collectors)11 StreamApplicationDescriptor (org.apache.samza.application.descriptors.StreamApplicationDescriptor)11 InMemoryInputDescriptor (org.apache.samza.test.framework.system.descriptors.InMemoryInputDescriptor)11 Assert (org.junit.Assert)11