Search in sources :

Example 1 with WriteInstruction

use of com.google.api.services.dataflow.model.WriteInstruction in project beam by apache.

the class BeamFnMapTaskExecutorFactory method createWriteOperation.

OperationNode createWriteOperation(ParallelInstructionNode node, PipelineOptions options, SinkFactory sinkFactory, DataflowExecutionContext executionContext, DataflowOperationContext context) throws Exception {
    ParallelInstruction instruction = node.getParallelInstruction();
    WriteInstruction write = instruction.getWrite();
    Coder<?> coder = CloudObjects.coderFromCloudObject(CloudObject.fromSpec(write.getSink().getCodec()));
    CloudObject cloudSink = CloudObject.fromSpec(write.getSink().getSpec());
    Sink<?> sink = sinkFactory.create(cloudSink, coder, options, executionContext, context);
    return OperationNode.create(WriteOperation.create(sink, EMPTY_OUTPUT_RECEIVER_ARRAY, context));
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) WriteInstruction(com.google.api.services.dataflow.model.WriteInstruction)

Example 2 with WriteInstruction

use of com.google.api.services.dataflow.model.WriteInstruction in project beam by apache.

the class LengthPrefixUnknownCodersTest method testLengthPrefixWriteInstructionCoder.

@Test
public void testLengthPrefixWriteInstructionCoder() throws Exception {
    WriteInstruction writeInstruction = new WriteInstruction();
    writeInstruction.setSink(new Sink().setCodec(CloudObjects.asCloudObject(windowedValueCoder, /*sdkComponents=*/
    null)));
    instruction.setWrite(writeInstruction);
    ParallelInstruction prefixedInstruction = forParallelInstruction(instruction, false);
    assertEqualsAsJson(CloudObjects.asCloudObject(prefixedWindowedValueCoder, /*sdkComponents=*/
    null), prefixedInstruction.getWrite().getSink().getCodec());
    // Should not mutate the instruction.
    assertEqualsAsJson(CloudObjects.asCloudObject(windowedValueCoder, /*sdkComponents=*/
    null), writeInstruction.getSink().getCodec());
}
Also used : LengthPrefixUnknownCoders.forParallelInstruction(org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCoders.forParallelInstruction) ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) Sink(com.google.api.services.dataflow.model.Sink) WriteInstruction(com.google.api.services.dataflow.model.WriteInstruction) Test(org.junit.Test)

Example 3 with WriteInstruction

use of com.google.api.services.dataflow.model.WriteInstruction in project beam by apache.

the class IntrinsicMapTaskExecutorFactoryTest method createWriteInstruction.

static ParallelInstruction createWriteInstruction(int producerIndex, int producerOutputNum, String systemName) {
    InstructionInput cloudInput = new InstructionInput();
    cloudInput.setProducerInstructionIndex(producerIndex);
    cloudInput.setOutputNum(producerOutputNum);
    CloudObject spec = CloudObject.forClass(IntrinsicMapTaskExecutorFactoryTest.TestSinkFactory.class);
    com.google.api.services.dataflow.model.Sink cloudSink = new com.google.api.services.dataflow.model.Sink();
    cloudSink.setSpec(spec);
    cloudSink.setCodec(windowedStringCoder);
    WriteInstruction writeInstruction = new WriteInstruction();
    writeInstruction.setInput(cloudInput);
    writeInstruction.setSink(cloudSink);
    ParallelInstruction instruction = new ParallelInstruction();
    instruction.setWrite(writeInstruction);
    instruction.setSystemName(systemName);
    instruction.setOriginalName(systemName + "OriginalName");
    return instruction;
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) Sink(org.apache.beam.runners.dataflow.worker.util.common.worker.Sink) WriteInstruction(com.google.api.services.dataflow.model.WriteInstruction) InstructionInput(com.google.api.services.dataflow.model.InstructionInput)

Example 4 with WriteInstruction

use of com.google.api.services.dataflow.model.WriteInstruction in project beam by apache.

the class IntrinsicMapTaskExecutorFactory method createWriteOperation.

OperationNode createWriteOperation(ParallelInstructionNode node, PipelineOptions options, SinkFactory sinkFactory, DataflowExecutionContext executionContext, DataflowOperationContext context) throws Exception {
    ParallelInstruction instruction = node.getParallelInstruction();
    WriteInstruction write = instruction.getWrite();
    Coder<?> coder = CloudObjects.coderFromCloudObject(CloudObject.fromSpec(write.getSink().getCodec()));
    CloudObject cloudSink = CloudObject.fromSpec(write.getSink().getSpec());
    Sink<?> sink = sinkFactory.create(cloudSink, coder, options, executionContext, context);
    return OperationNode.create(WriteOperation.create(sink, EMPTY_OUTPUT_RECEIVER_ARRAY, context));
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) WriteInstruction(com.google.api.services.dataflow.model.WriteInstruction)

Example 5 with WriteInstruction

use of com.google.api.services.dataflow.model.WriteInstruction in project beam by apache.

the class MapTaskToNetworkFunctionTest method testMultipleOutput.

@Test
public void testMultipleOutput() {
    // /---> WriteA
    // Read ---> ParDo
    // \---> WriteB
    InstructionOutput readOutput = createInstructionOutput("Read.out");
    ParallelInstruction read = createParallelInstruction("Read", readOutput);
    read.setRead(new ReadInstruction());
    MultiOutputInfo parDoMultiOutput1 = createMultiOutputInfo("output1");
    MultiOutputInfo parDoMultiOutput2 = createMultiOutputInfo("output2");
    ParDoInstruction parDoInstruction = new ParDoInstruction();
    // Read.out
    parDoInstruction.setInput(createInstructionInput(0, 0));
    parDoInstruction.setMultiOutputInfos(ImmutableList.of(parDoMultiOutput1, parDoMultiOutput2));
    InstructionOutput parDoOutput1 = createInstructionOutput("ParDo.out1");
    InstructionOutput parDoOutput2 = createInstructionOutput("ParDo.out2");
    ParallelInstruction parDo = createParallelInstruction("ParDo", parDoOutput1, parDoOutput2);
    parDo.setParDo(parDoInstruction);
    WriteInstruction writeAInstruction = new WriteInstruction();
    // ParDo.out1
    writeAInstruction.setInput(createInstructionInput(1, 0));
    ParallelInstruction writeA = createParallelInstruction("WriteA");
    writeA.setWrite(writeAInstruction);
    WriteInstruction writeBInstruction = new WriteInstruction();
    // ParDo.out2
    writeBInstruction.setInput(createInstructionInput(1, 1));
    ParallelInstruction writeB = createParallelInstruction("WriteB");
    writeB.setWrite(writeBInstruction);
    MapTask mapTask = new MapTask();
    mapTask.setInstructions(ImmutableList.of(read, parDo, writeA, writeB));
    mapTask.setFactory(Transport.getJsonFactory());
    Network<Node, Edge> network = new MapTaskToNetworkFunction(IdGenerators.decrementingLongs()).apply(mapTask);
    assertNetworkProperties(network);
    assertEquals(7, network.nodes().size());
    assertEquals(6, network.edges().size());
    ParallelInstructionNode parDoNode = get(network, parDo);
    ParallelInstructionNode writeANode = get(network, writeA);
    ParallelInstructionNode writeBNode = get(network, writeB);
    InstructionOutputNode parDoOutput1Node = getOnlyPredecessor(network, writeANode);
    assertEquals(parDoOutput1, parDoOutput1Node.getInstructionOutput());
    InstructionOutputNode parDoOutput2Node = getOnlyPredecessor(network, writeBNode);
    assertEquals(parDoOutput2, parDoOutput2Node.getInstructionOutput());
    assertThat(network.successors(parDoNode), Matchers.<Node>containsInAnyOrder(parDoOutput1Node, parDoOutput2Node));
    assertEquals(parDoMultiOutput1, ((MultiOutputInfoEdge) Iterables.getOnlyElement(network.edgesConnecting(parDoNode, parDoOutput1Node))).getMultiOutputInfo());
    assertEquals(parDoMultiOutput2, ((MultiOutputInfoEdge) Iterables.getOnlyElement(network.edgesConnecting(parDoNode, parDoOutput2Node))).getMultiOutputInfo());
}
Also used : MultiOutputInfo(com.google.api.services.dataflow.model.MultiOutputInfo) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) ReadInstruction(com.google.api.services.dataflow.model.ReadInstruction) ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) ParDoInstruction(com.google.api.services.dataflow.model.ParDoInstruction) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) MapTask(com.google.api.services.dataflow.model.MapTask) WriteInstruction(com.google.api.services.dataflow.model.WriteInstruction) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) Test(org.junit.Test)

Aggregations

ParallelInstruction (com.google.api.services.dataflow.model.ParallelInstruction)8 WriteInstruction (com.google.api.services.dataflow.model.WriteInstruction)8 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)4 Test (org.junit.Test)4 InstructionOutput (com.google.api.services.dataflow.model.InstructionOutput)3 MapTask (com.google.api.services.dataflow.model.MapTask)3 ReadInstruction (com.google.api.services.dataflow.model.ReadInstruction)3 DefaultEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge)3 Edge (org.apache.beam.runners.dataflow.worker.graph.Edges.Edge)3 MultiOutputInfoEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge)3 InstructionOutputNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode)3 Node (org.apache.beam.runners.dataflow.worker.graph.Nodes.Node)3 ParallelInstructionNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode)3 InstructionInput (com.google.api.services.dataflow.model.InstructionInput)2 Sink (com.google.api.services.dataflow.model.Sink)2 MultiOutputInfo (com.google.api.services.dataflow.model.MultiOutputInfo)1 ParDoInstruction (com.google.api.services.dataflow.model.ParDoInstruction)1 PartialGroupByKeyInstruction (com.google.api.services.dataflow.model.PartialGroupByKeyInstruction)1 LengthPrefixUnknownCoders.forParallelInstruction (org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCoders.forParallelInstruction)1 Sink (org.apache.beam.runners.dataflow.worker.util.common.worker.Sink)1