Search in sources :

Example 1 with StreamEntity

use of org.apache.gobblin.stream.StreamEntity in project incubator-gobblin by apache.

the class Converter method processStream.

/**
 * Apply conversions to the input {@link RecordStreamWithMetadata}.
 */
@Override
public RecordStreamWithMetadata<DO, SO> processStream(RecordStreamWithMetadata<DI, SI> inputStream, WorkUnitState workUnitState) throws SchemaConversionException {
    init(workUnitState);
    this.outputGlobalMetadata = GlobalMetadata.<SI, SO>builderWithInput(inputStream.getGlobalMetadata(), Optional.fromNullable(convertSchema(inputStream.getGlobalMetadata().getSchema(), workUnitState))).build();
    Flowable<StreamEntity<DO>> outputStream = inputStream.getRecordStream().flatMap(in -> {
        if (in instanceof ControlMessage) {
            ControlMessage out = (ControlMessage) in;
            getMessageHandler().handleMessage((ControlMessage) in);
            // update the output schema with the new input schema from the MetadataUpdateControlMessage
            if (in instanceof MetadataUpdateControlMessage) {
                this.outputGlobalMetadata = GlobalMetadata.<SI, SO>builderWithInput(((MetadataUpdateControlMessage) in).getGlobalMetadata(), Optional.fromNullable(convertSchema((SI) ((MetadataUpdateControlMessage) in).getGlobalMetadata().getSchema(), workUnitState))).build();
                out = new MetadataUpdateControlMessage<SO, DO>(this.outputGlobalMetadata);
            }
            return Flowable.just(((ControlMessage<DO>) out));
        } else if (in instanceof RecordEnvelope) {
            RecordEnvelope<DI> recordEnvelope = (RecordEnvelope<DI>) in;
            Iterator<DO> convertedIterable = convertRecord(this.outputGlobalMetadata.getSchema(), recordEnvelope.getRecord(), workUnitState).iterator();
            if (!convertedIterable.hasNext()) {
                // if the iterable is empty, ack the record, return an empty flowable
                in.ack();
                return Flowable.empty();
            }
            DO firstRecord = convertedIterable.next();
            if (!convertedIterable.hasNext()) {
                // if the iterable has only one element, use RecordEnvelope.withRecord, which is more efficient
                return Flowable.just(recordEnvelope.withRecord(firstRecord));
            } else {
                // if the iterable has multiple records, use a ForkRecordBuilder
                RecordEnvelope<DI>.ForkRecordBuilder<DO> forkRecordBuilder = recordEnvelope.forkRecordBuilder();
                return Flowable.just(firstRecord).concatWith(Flowable.fromIterable(() -> convertedIterable)).map(forkRecordBuilder::childRecord).doOnComplete(forkRecordBuilder::close);
            }
        } else {
            throw new UnsupportedOperationException();
        }
    }, 1);
    outputStream = outputStream.doOnComplete(this::close);
    return inputStream.withRecordStream(outputStream, this.outputGlobalMetadata);
}
Also used : RecordEnvelope(org.apache.gobblin.stream.RecordEnvelope) StreamEntity(org.apache.gobblin.stream.StreamEntity) MetadataUpdateControlMessage(org.apache.gobblin.stream.MetadataUpdateControlMessage) Iterator(java.util.Iterator) MetadataUpdateControlMessage(org.apache.gobblin.stream.MetadataUpdateControlMessage) ControlMessage(org.apache.gobblin.stream.ControlMessage)

Example 2 with StreamEntity

use of org.apache.gobblin.stream.StreamEntity in project incubator-gobblin by apache.

the class ConverterTest method testMultiOutputIterable.

@Test
public void testMultiOutputIterable() throws Exception {
    MyConverter converter = new MyConverter();
    BasicAckableForTesting ackable = new BasicAckableForTesting();
    RecordStreamWithMetadata<Integer, String> stream = new RecordStreamWithMetadata<>(Flowable.just(new RecordEnvelope<>(2)), GlobalMetadata.<String>builder().schema("schema").build()).mapRecords(r -> {
        r.addCallBack(ackable);
        return r;
    });
    List<StreamEntity<Integer>> outputRecords = Lists.newArrayList();
    converter.processStream(stream, new WorkUnitState()).getRecordStream().subscribe(outputRecords::add);
    Assert.assertEquals(outputRecords.size(), 2);
    // output record has not been acked
    Assert.assertEquals(ackable.acked, 0);
    outputRecords.get(0).ack();
    // only one output record acked, still need to ack another derived record
    Assert.assertEquals(ackable.acked, 0);
    outputRecords.get(1).ack();
    // all output records acked
    Assert.assertEquals(ackable.acked, 1);
}
Also used : WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) StreamEntity(org.apache.gobblin.stream.StreamEntity) BasicAckableForTesting(org.apache.gobblin.ack.BasicAckableForTesting) Test(org.testng.annotations.Test)

Example 3 with StreamEntity

use of org.apache.gobblin.stream.StreamEntity in project incubator-gobblin by apache.

the class ConverterTest method testMixedStream.

@Test
public void testMixedStream() throws Exception {
    MyConverter converter = new MyConverter();
    BasicAckableForTesting ackable = new BasicAckableForTesting();
    RecordStreamWithMetadata<Integer, String> stream = new RecordStreamWithMetadata<>(Flowable.just(new RecordEnvelope<>(1), new MyControlMessage<>()), GlobalMetadata.<String>builder().schema("schema").build()).mapRecords(r -> {
        r.addCallBack(ackable);
        return r;
    });
    List<StreamEntity<Integer>> outputRecords = Lists.newArrayList();
    converter.processStream(stream, new WorkUnitState()).getRecordStream().subscribe(outputRecords::add);
    Assert.assertEquals(outputRecords.size(), 2);
    Assert.assertEquals(((RecordEnvelope<Integer>) outputRecords.get(0)).getRecord(), new Integer(0));
    Assert.assertTrue(outputRecords.get(1) instanceof MyControlMessage);
}
Also used : WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) StreamEntity(org.apache.gobblin.stream.StreamEntity) BasicAckableForTesting(org.apache.gobblin.ack.BasicAckableForTesting) Test(org.testng.annotations.Test)

Example 4 with StreamEntity

use of org.apache.gobblin.stream.StreamEntity in project incubator-gobblin by apache.

the class ForkerTest method test.

@Test
public void test() throws Exception {
    Forker forker = new Forker();
    MyFlowable<StreamEntity<byte[]>> flowable = new MyFlowable<>();
    RecordStreamWithMetadata<byte[], String> stream = new RecordStreamWithMetadata<>(flowable, GlobalMetadata.<String>builder().schema("schema").build());
    WorkUnitState workUnitState = new WorkUnitState();
    workUnitState.setProp(ConfigurationKeys.FORK_BRANCHES_KEY, "3");
    Forker.ForkedStream<byte[], String> forkedStream = forker.forkStream(stream, new MyForkOperator(), workUnitState);
    Assert.assertEquals(forkedStream.getForkedStreams().size(), 3);
    Queue<StreamEntity<byte[]>> output0 = new LinkedList<>();
    forkedStream.getForkedStreams().get(0).getRecordStream().subscribe(output0::add);
    Queue<StreamEntity<byte[]>> output1 = new LinkedList<>();
    forkedStream.getForkedStreams().get(1).getRecordStream().subscribe(output1::add);
    Queue<StreamEntity<byte[]>> output2 = new LinkedList<>();
    forkedStream.getForkedStreams().get(2).getRecordStream().subscribe(output2::add);
    flowable._subscriber.onNext(new RecordEnvelope<>(new byte[] { 1, 1, 1 }));
    Assert.assertTrue(output0.poll() instanceof RecordEnvelope);
    Assert.assertTrue(output1.poll() instanceof RecordEnvelope);
    Assert.assertTrue(output2.poll() instanceof RecordEnvelope);
    flowable._subscriber.onNext(new RecordEnvelope<>(new byte[] { 1, 0, 0 }));
    Assert.assertTrue(output0.poll() instanceof RecordEnvelope);
    Assert.assertNull(output1.poll());
    Assert.assertNull(output2.poll());
    flowable._subscriber.onNext(new RecordEnvelope<>(new byte[] { 0, 1, 1 }));
    Assert.assertNull(output0.poll());
    Assert.assertTrue(output1.poll() instanceof RecordEnvelope);
    Assert.assertTrue(output2.poll() instanceof RecordEnvelope);
    flowable._subscriber.onNext(new BasicTestControlMessage<byte[]>("control"));
    Assert.assertTrue(output0.poll() instanceof BasicTestControlMessage);
    Assert.assertTrue(output1.poll() instanceof BasicTestControlMessage);
    Assert.assertTrue(output2.poll() instanceof BasicTestControlMessage);
    flowable._subscriber.onComplete();
}
Also used : RecordEnvelope(org.apache.gobblin.stream.RecordEnvelope) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) RecordStreamWithMetadata(org.apache.gobblin.records.RecordStreamWithMetadata) StreamEntity(org.apache.gobblin.stream.StreamEntity) LinkedList(java.util.LinkedList) BasicTestControlMessage(org.apache.gobblin.runtime.BasicTestControlMessage) Test(org.testng.annotations.Test)

Example 5 with StreamEntity

use of org.apache.gobblin.stream.StreamEntity in project incubator-gobblin by apache.

the class TestRecordStream method testAcks.

@Test
public void testAcks() throws Exception {
    StreamEntity[] entities = new StreamEntity[] { new RecordEnvelope<>("a"), new BasicTestControlMessage("1"), new RecordEnvelope<>("b"), new BasicTestControlMessage("2") };
    BasicAckableForTesting ackable = new BasicAckableForTesting();
    for (int i = 0; i < entities.length; i++) {
        entities[i].addCallBack(ackable);
    }
    MyExtractor extractor = new MyExtractor(entities);
    MyConverter converter = new MyConverter();
    MyDataWriter writer = new MyDataWriter();
    // Create a TaskState
    TaskState taskState = getEmptyTestTaskState("testRetryTaskId");
    taskState.setProp(ConfigurationKeys.TASK_SYNCHRONOUS_EXECUTION_MODEL_KEY, false);
    // Create a mock TaskContext
    TaskContext mockTaskContext = mock(TaskContext.class);
    when(mockTaskContext.getExtractor()).thenReturn(extractor);
    when(mockTaskContext.getForkOperator()).thenReturn(new IdentityForkOperator());
    when(mockTaskContext.getTaskState()).thenReturn(taskState);
    when(mockTaskContext.getConverters()).thenReturn(Lists.newArrayList(converter));
    when(mockTaskContext.getTaskLevelPolicyChecker(any(TaskState.class), anyInt())).thenReturn(mock(TaskLevelPolicyChecker.class));
    when(mockTaskContext.getRowLevelPolicyChecker()).thenReturn(new RowLevelPolicyChecker(Lists.newArrayList(), "ss", FileSystem.getLocal(new Configuration())));
    when(mockTaskContext.getRowLevelPolicyChecker(anyInt())).thenReturn(new RowLevelPolicyChecker(Lists.newArrayList(), "ss", FileSystem.getLocal(new Configuration())));
    when(mockTaskContext.getDataWriterBuilder(anyInt(), anyInt())).thenReturn(writer);
    // Create a mock TaskPublisher
    TaskPublisher mockTaskPublisher = mock(TaskPublisher.class);
    when(mockTaskPublisher.canPublish()).thenReturn(TaskPublisher.PublisherState.SUCCESS);
    when(mockTaskContext.getTaskPublisher(any(TaskState.class), any(TaskLevelPolicyCheckResults.class))).thenReturn(mockTaskPublisher);
    // Create a mock TaskStateTracker
    TaskStateTracker mockTaskStateTracker = mock(TaskStateTracker.class);
    // Create a TaskExecutor - a real TaskExecutor must be created so a Fork is run in a separate thread
    TaskExecutor taskExecutor = new TaskExecutor(new Properties());
    // Create the Task
    Task realTask = new Task(mockTaskContext, mockTaskStateTracker, taskExecutor, Optional.<CountDownLatch>absent());
    Task task = spy(realTask);
    doNothing().when(task).submitTaskCommittedEvent();
    task.run();
    task.commit();
    Assert.assertEquals(task.getTaskState().getWorkingState(), WorkUnitState.WorkingState.SUCCESSFUL);
    Assert.assertEquals(ackable.acked, 4);
}
Also used : TaskPublisher(org.apache.gobblin.publisher.TaskPublisher) Configuration(org.apache.hadoop.conf.Configuration) RecordEnvelope(org.apache.gobblin.stream.RecordEnvelope) TaskLevelPolicyChecker(org.apache.gobblin.qualitychecker.task.TaskLevelPolicyChecker) StreamEntity(org.apache.gobblin.stream.StreamEntity) TaskLevelPolicyCheckResults(org.apache.gobblin.qualitychecker.task.TaskLevelPolicyCheckResults) Properties(java.util.Properties) IdentityForkOperator(org.apache.gobblin.fork.IdentityForkOperator) RowLevelPolicyChecker(org.apache.gobblin.qualitychecker.row.RowLevelPolicyChecker) BasicAckableForTesting(org.apache.gobblin.ack.BasicAckableForTesting) Test(org.testng.annotations.Test)

Aggregations

StreamEntity (org.apache.gobblin.stream.StreamEntity)7 Test (org.testng.annotations.Test)6 BasicAckableForTesting (org.apache.gobblin.ack.BasicAckableForTesting)5 WorkUnitState (org.apache.gobblin.configuration.WorkUnitState)5 RecordEnvelope (org.apache.gobblin.stream.RecordEnvelope)3 Iterator (java.util.Iterator)1 LinkedList (java.util.LinkedList)1 Properties (java.util.Properties)1 IdentityForkOperator (org.apache.gobblin.fork.IdentityForkOperator)1 TaskPublisher (org.apache.gobblin.publisher.TaskPublisher)1 RowLevelPolicyChecker (org.apache.gobblin.qualitychecker.row.RowLevelPolicyChecker)1 TaskLevelPolicyCheckResults (org.apache.gobblin.qualitychecker.task.TaskLevelPolicyCheckResults)1 TaskLevelPolicyChecker (org.apache.gobblin.qualitychecker.task.TaskLevelPolicyChecker)1 RecordStreamWithMetadata (org.apache.gobblin.records.RecordStreamWithMetadata)1 BasicTestControlMessage (org.apache.gobblin.runtime.BasicTestControlMessage)1 ControlMessage (org.apache.gobblin.stream.ControlMessage)1 MetadataUpdateControlMessage (org.apache.gobblin.stream.MetadataUpdateControlMessage)1 Configuration (org.apache.hadoop.conf.Configuration)1