Search in sources :

Example 41 with DiscardingOutputFormat

use of org.apache.flink.api.java.io.DiscardingOutputFormat in project flink by apache.

the class DistinctCompilationTest method testDistinctWithSelectorFunctionKey.

@Test
public void testDistinctWithSelectorFunctionKey() {
    try {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(8);
        DataSet<Tuple2<String, Double>> data = env.readCsvFile("file:///will/never/be/read").types(String.class, Double.class).name("source").setParallelism(6);
        data.distinct(new KeySelector<Tuple2<String, Double>, String>() {

            public String getKey(Tuple2<String, Double> value) {
                return value.f0;
            }
        }).name("reducer").output(new DiscardingOutputFormat<Tuple2<String, Double>>()).name("sink");
        Plan p = env.createProgramPlan();
        OptimizedPlan op = compileNoStats(p);
        OptimizerPlanNodeResolver resolver = getOptimizerPlanNodeResolver(op);
        // get the original nodes
        SourcePlanNode sourceNode = resolver.getNode("source");
        SingleInputPlanNode reduceNode = resolver.getNode("reducer");
        SinkPlanNode sinkNode = resolver.getNode("sink");
        // get the combiner
        SingleInputPlanNode combineNode = (SingleInputPlanNode) reduceNode.getInput().getSource();
        // get the key extractors and projectors
        SingleInputPlanNode keyExtractor = (SingleInputPlanNode) combineNode.getInput().getSource();
        SingleInputPlanNode keyProjector = (SingleInputPlanNode) sinkNode.getInput().getSource();
        // check wiring
        assertEquals(sourceNode, keyExtractor.getInput().getSource());
        assertEquals(keyProjector, sinkNode.getInput().getSource());
        // check that both reduce and combiner have the same strategy
        assertEquals(DriverStrategy.SORTED_REDUCE, reduceNode.getDriverStrategy());
        assertEquals(DriverStrategy.SORTED_PARTIAL_REDUCE, combineNode.getDriverStrategy());
        // check the keys
        assertEquals(new FieldList(0), reduceNode.getKeys(0));
        assertEquals(new FieldList(0), combineNode.getKeys(0));
        assertEquals(new FieldList(0), reduceNode.getInput().getLocalStrategyKeys());
        // check parallelism
        assertEquals(6, sourceNode.getParallelism());
        assertEquals(6, keyExtractor.getParallelism());
        assertEquals(6, combineNode.getParallelism());
        assertEquals(8, reduceNode.getParallelism());
        assertEquals(8, keyProjector.getParallelism());
        assertEquals(8, sinkNode.getParallelism());
    } catch (Exception e) {
        System.err.println(e.getMessage());
        e.printStackTrace();
        fail(e.getClass().getSimpleName() + " in test: " + e.getMessage());
    }
}
Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) KeySelector(org.apache.flink.api.java.functions.KeySelector) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) FieldList(org.apache.flink.api.common.operators.util.FieldList) SingleInputPlanNode(org.apache.flink.optimizer.plan.SingleInputPlanNode) Tuple2(org.apache.flink.api.java.tuple.Tuple2) SourcePlanNode(org.apache.flink.optimizer.plan.SourcePlanNode) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) Test(org.junit.Test)

Example 42 with DiscardingOutputFormat

use of org.apache.flink.api.java.io.DiscardingOutputFormat in project flink by apache.

the class HardPlansCompilationTest method testTicket158.

/**
 * Source -> Map -> Reduce -> Cross -> Reduce -> Cross -> Reduce -> |--------------------------/
 * / |--------------------------------------------/
 *
 * <p>First cross has SameKeyFirst output contract
 */
@Test
public void testTicket158() {
    // construct the plan
    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(DEFAULT_PARALLELISM);
    DataSet<Long> set1 = env.generateSequence(0, 1);
    set1.map(new IdentityMapper<Long>()).name("Map1").groupBy("*").reduceGroup(new IdentityGroupReducer<Long>()).name("Reduce1").cross(set1).with(new IdentityCrosser<Long>()).withForwardedFieldsFirst("*").name("Cross1").groupBy("*").reduceGroup(new IdentityGroupReducer<Long>()).name("Reduce2").cross(set1).with(new IdentityCrosser<Long>()).name("Cross2").groupBy("*").reduceGroup(new IdentityGroupReducer<Long>()).name("Reduce3").output(new DiscardingOutputFormat<Long>()).name("Sink");
    Plan plan = env.createProgramPlan();
    OptimizedPlan oPlan = compileNoStats(plan);
    JobGraphGenerator jobGen = new JobGraphGenerator();
    jobGen.compileJobGraph(oPlan);
}
Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) IdentityMapper(org.apache.flink.optimizer.testfunctions.IdentityMapper) JobGraphGenerator(org.apache.flink.optimizer.plantranslate.JobGraphGenerator) IdentityCrosser(org.apache.flink.optimizer.testfunctions.IdentityCrosser) IdentityGroupReducer(org.apache.flink.optimizer.testfunctions.IdentityGroupReducer) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) Test(org.junit.Test)

Example 43 with DiscardingOutputFormat

use of org.apache.flink.api.java.io.DiscardingOutputFormat in project flink by apache.

the class UnionTranslationTest method translateUnion3SortedGroup.

@Test
public void translateUnion3SortedGroup() {
    try {
        final int parallelism = 4;
        ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment(parallelism);
        DataSet<Tuple3<Double, StringValue, LongValue>> dataset1 = getSourceDataSet(env, 2);
        DataSet<Tuple3<Double, StringValue, LongValue>> dataset2 = getSourceDataSet(env, 3);
        DataSet<Tuple3<Double, StringValue, LongValue>> dataset3 = getSourceDataSet(env, -1);
        dataset1.union(dataset2).union(dataset3).groupBy((KeySelector<Tuple3<Double, StringValue, LongValue>, String>) value -> "").sortGroup((KeySelector<Tuple3<Double, StringValue, LongValue>, String>) value -> "", Order.ASCENDING).reduceGroup((GroupReduceFunction<Tuple3<Double, StringValue, LongValue>, String>) (values, out) -> {
        }).returns(String.class).output(new DiscardingOutputFormat<>());
        Plan p = env.createProgramPlan();
        // The plan should look like the following one.
        // 
        // DataSet1(2) - MapOperator(2)-+
        // |- Union(-1) -+
        // DataSet2(3) - MapOperator(3)-+             |- Union(-1) - SingleInputOperator - Sink
        // |
        // DataSet3(-1) - MapOperator(-1)-+
        GenericDataSinkBase<?> sink = p.getDataSinks().iterator().next();
        Union secondUnionOperator = (Union) ((SingleInputOperator) sink.getInput()).getInput();
        // The first input of the second union should be the first union.
        Union firstUnionOperator = (Union) secondUnionOperator.getFirstInput();
        // The key mapper should be added to the second input stream of the second union.
        assertTrue(secondUnionOperator.getSecondInput() instanceof MapOperatorBase<?, ?, ?>);
        // The key mappers should be added to both of the two input streams for the first union.
        assertTrue(firstUnionOperator.getFirstInput() instanceof MapOperatorBase<?, ?, ?>);
        assertTrue(firstUnionOperator.getSecondInput() instanceof MapOperatorBase<?, ?, ?>);
        // The parallelisms of the key mappers should be equal to those of their inputs.
        assertEquals(firstUnionOperator.getFirstInput().getParallelism(), 2);
        assertEquals(firstUnionOperator.getSecondInput().getParallelism(), 3);
        assertEquals(secondUnionOperator.getSecondInput().getParallelism(), -1);
        // The union should always have the default parallelism.
        assertEquals(secondUnionOperator.getParallelism(), ExecutionConfig.PARALLELISM_DEFAULT);
        assertEquals(firstUnionOperator.getParallelism(), ExecutionConfig.PARALLELISM_DEFAULT);
    } catch (Exception e) {
        System.err.println(e.getMessage());
        e.printStackTrace();
        fail("Test caused an error: " + e.getMessage());
    }
}
Also used : KeySelector(org.apache.flink.api.java.functions.KeySelector) Tuple3(org.apache.flink.api.java.tuple.Tuple3) DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) LongValue(org.apache.flink.types.LongValue) GroupReduceFunction(org.apache.flink.api.common.functions.GroupReduceFunction) MapOperatorBase(org.apache.flink.api.common.operators.base.MapOperatorBase) Union(org.apache.flink.api.common.operators.Union) Assert.assertTrue(org.junit.Assert.assertTrue) Test(org.junit.Test) SingleInputOperator(org.apache.flink.api.common.operators.SingleInputOperator) DataSet(org.apache.flink.api.java.DataSet) ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) StringValue(org.apache.flink.types.StringValue) GenericDataSinkBase(org.apache.flink.api.common.operators.GenericDataSinkBase) ExecutionConfig(org.apache.flink.api.common.ExecutionConfig) Plan(org.apache.flink.api.common.Plan) Assert.fail(org.junit.Assert.fail) Order(org.apache.flink.api.common.operators.Order) Assert.assertEquals(org.junit.Assert.assertEquals) ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) KeySelector(org.apache.flink.api.java.functions.KeySelector) Plan(org.apache.flink.api.common.Plan) Union(org.apache.flink.api.common.operators.Union) Tuple3(org.apache.flink.api.java.tuple.Tuple3) LongValue(org.apache.flink.types.LongValue) StringValue(org.apache.flink.types.StringValue) Test(org.junit.Test)

Example 44 with DiscardingOutputFormat

use of org.apache.flink.api.java.io.DiscardingOutputFormat in project beam by apache.

the class FlinkBatchPortablePipelineTranslator method translateExecutableStage.

private static <InputT> void translateExecutableStage(PTransformNode transform, RunnerApi.Pipeline pipeline, BatchTranslationContext context) {
    // TODO: Fail on splittable DoFns.
    // TODO: Special-case single outputs to avoid multiplexing PCollections.
    RunnerApi.Components components = pipeline.getComponents();
    Map<String, String> outputs = transform.getTransform().getOutputsMap();
    // Mapping from PCollection id to coder tag id.
    BiMap<String, Integer> outputMap = createOutputMap(outputs.values());
    // Collect all output Coders and create a UnionCoder for our tagged outputs.
    List<Coder<?>> unionCoders = Lists.newArrayList();
    // Enforce tuple tag sorting by union tag index.
    Map<String, Coder<WindowedValue<?>>> outputCoders = Maps.newHashMap();
    for (String collectionId : new TreeMap<>(outputMap.inverse()).values()) {
        PCollectionNode collectionNode = PipelineNode.pCollection(collectionId, components.getPcollectionsOrThrow(collectionId));
        Coder<WindowedValue<?>> coder;
        try {
            coder = (Coder) WireCoders.instantiateRunnerWireCoder(collectionNode, components);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
        outputCoders.put(collectionId, coder);
        unionCoders.add(coder);
    }
    UnionCoder unionCoder = UnionCoder.of(unionCoders);
    TypeInformation<RawUnionValue> typeInformation = new CoderTypeInformation<>(unionCoder, context.getPipelineOptions());
    RunnerApi.ExecutableStagePayload stagePayload;
    try {
        stagePayload = RunnerApi.ExecutableStagePayload.parseFrom(transform.getTransform().getSpec().getPayload());
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
    String inputPCollectionId = stagePayload.getInput();
    Coder<WindowedValue<InputT>> windowedInputCoder = instantiateCoder(inputPCollectionId, components);
    DataSet<WindowedValue<InputT>> inputDataSet = context.getDataSetOrThrow(inputPCollectionId);
    final FlinkExecutableStageFunction<InputT> function = new FlinkExecutableStageFunction<>(transform.getTransform().getUniqueName(), context.getPipelineOptions(), stagePayload, context.getJobInfo(), outputMap, FlinkExecutableStageContextFactory.getInstance(), getWindowingStrategy(inputPCollectionId, components).getWindowFn().windowCoder(), windowedInputCoder);
    final String operatorName = generateNameFromStagePayload(stagePayload);
    final SingleInputUdfOperator taggedDataset;
    if (stagePayload.getUserStatesCount() > 0 || stagePayload.getTimersCount() > 0) {
        Coder valueCoder = ((WindowedValue.FullWindowedValueCoder) windowedInputCoder).getValueCoder();
        // Stateful stages are only allowed of KV input to be able to group on the key
        if (!(valueCoder instanceof KvCoder)) {
            throw new IllegalStateException(String.format(Locale.ENGLISH, "The element coder for stateful DoFn '%s' must be KvCoder but is: %s", inputPCollectionId, valueCoder.getClass().getSimpleName()));
        }
        Coder keyCoder = ((KvCoder) valueCoder).getKeyCoder();
        Grouping<WindowedValue<InputT>> groupedInput = inputDataSet.groupBy(new KvKeySelector<>(keyCoder));
        boolean requiresTimeSortedInput = requiresTimeSortedInput(stagePayload, false);
        if (requiresTimeSortedInput) {
            groupedInput = ((UnsortedGrouping<WindowedValue<InputT>>) groupedInput).sortGroup(WindowedValue::getTimestamp, Order.ASCENDING);
        }
        taggedDataset = new GroupReduceOperator<>(groupedInput, typeInformation, function, operatorName);
    } else {
        taggedDataset = new MapPartitionOperator<>(inputDataSet, typeInformation, function, operatorName);
    }
    for (SideInputId sideInputId : stagePayload.getSideInputsList()) {
        String collectionId = stagePayload.getComponents().getTransformsOrThrow(sideInputId.getTransformId()).getInputsOrThrow(sideInputId.getLocalName());
        // Register under the global PCollection name. Only ExecutableStageFunction needs to know the
        // mapping from local name to global name and how to translate the broadcast data to a state
        // API view.
        taggedDataset.withBroadcastSet(context.getDataSetOrThrow(collectionId), collectionId);
    }
    for (String collectionId : outputs.values()) {
        pruneOutput(taggedDataset, context, outputMap.get(collectionId), outputCoders.get(collectionId), collectionId);
    }
    if (outputs.isEmpty()) {
        // NOTE: After pipeline translation, we traverse the set of unconsumed PCollections and add a
        // no-op sink to each to make sure they are materialized by Flink. However, some SDK-executed
        // stages have no runner-visible output after fusion. We handle this case by adding a sink
        // here.
        taggedDataset.output(new DiscardingOutputFormat<>()).name("DiscardingOutput");
    }
}
Also used : DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) FlinkExecutableStageFunction(org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction) WindowedValue(org.apache.beam.sdk.util.WindowedValue) SideInputId(org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload.SideInputId) CoderTypeInformation(org.apache.beam.runners.flink.translation.types.CoderTypeInformation) WindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) UnionCoder(org.apache.beam.sdk.transforms.join.UnionCoder) PipelineTranslatorUtils.instantiateCoder(org.apache.beam.runners.fnexecution.translation.PipelineTranslatorUtils.instantiateCoder) Coder(org.apache.beam.sdk.coders.Coder) ByteArrayCoder(org.apache.beam.sdk.coders.ByteArrayCoder) VoidCoder(org.apache.beam.sdk.coders.VoidCoder) UnionCoder(org.apache.beam.sdk.transforms.join.UnionCoder) RawUnionValue(org.apache.beam.sdk.transforms.join.RawUnionValue) SingleInputUdfOperator(org.apache.flink.api.java.operators.SingleInputUdfOperator) KvCoder(org.apache.beam.sdk.coders.KvCoder) IOException(java.io.IOException) PCollectionNode(org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode)

Example 45 with DiscardingOutputFormat

use of org.apache.flink.api.java.io.DiscardingOutputFormat in project flink by apache.

the class InputOutputFormatContainerTest method testOnlyOutputFormat.

@Test
public void testOnlyOutputFormat() {
    InputOutputFormatContainer formatContainer = new InputOutputFormatContainer(Thread.currentThread().getContextClassLoader());
    OperatorID operatorID = new OperatorID();
    formatContainer.addOutputFormat(operatorID, new DiscardingOutputFormat<>());
    Configuration parameters = new Configuration();
    parameters.setString("parameter1", "bcd234");
    formatContainer.addParameters(operatorID, parameters);
    TaskConfig taskConfig = new TaskConfig(new Configuration());
    formatContainer.write(taskConfig);
    InputOutputFormatContainer loadedFormatContainer = new InputOutputFormatContainer(taskConfig, getClass().getClassLoader());
    Map<OperatorID, UserCodeWrapper<? extends OutputFormat<?>>> outputFormats = loadedFormatContainer.getOutputFormats();
    assertEquals(1, outputFormats.size());
    assertEquals(0, loadedFormatContainer.getInputFormats().size());
    assertTrue(outputFormats.get(operatorID).getUserCodeObject() instanceof DiscardingOutputFormat);
    Configuration loadedParameters = loadedFormatContainer.getParameters(operatorID);
    assertEquals(1, loadedParameters.keySet().size());
    assertEquals("bcd234", loadedParameters.getString("parameter1", null));
}
Also used : Configuration(org.apache.flink.configuration.Configuration) DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) OutputFormat(org.apache.flink.api.common.io.OutputFormat) TaskConfig(org.apache.flink.runtime.operators.util.TaskConfig) UserCodeWrapper(org.apache.flink.api.common.operators.util.UserCodeWrapper) DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) Test(org.junit.Test)

Aggregations

DiscardingOutputFormat (org.apache.flink.api.java.io.DiscardingOutputFormat)49 ExecutionEnvironment (org.apache.flink.api.java.ExecutionEnvironment)44 Test (org.junit.Test)39 OptimizedPlan (org.apache.flink.optimizer.plan.OptimizedPlan)34 Plan (org.apache.flink.api.common.Plan)33 SingleInputPlanNode (org.apache.flink.optimizer.plan.SingleInputPlanNode)28 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)26 SinkPlanNode (org.apache.flink.optimizer.plan.SinkPlanNode)25 FieldList (org.apache.flink.api.common.operators.util.FieldList)20 SourcePlanNode (org.apache.flink.optimizer.plan.SourcePlanNode)19 DataSet (org.apache.flink.api.java.DataSet)11 Channel (org.apache.flink.optimizer.plan.Channel)10 DualInputPlanNode (org.apache.flink.optimizer.plan.DualInputPlanNode)9 MapFunction (org.apache.flink.api.common.functions.MapFunction)6 Configuration (org.apache.flink.configuration.Configuration)6 PlanNode (org.apache.flink.optimizer.plan.PlanNode)5 File (java.io.File)4 RichReduceFunction (org.apache.flink.api.common.functions.RichReduceFunction)4 Tuple3 (org.apache.flink.api.java.tuple.Tuple3)4 NAryUnionPlanNode (org.apache.flink.optimizer.plan.NAryUnionPlanNode)4