Search in sources :

Example 1 with SimpleFunction

use of org.apache.beam.sdk.transforms.SimpleFunction in project beam by apache.

the class PCollectionTupleTest method testExpandHasMatchingTags.

@Test
public void testExpandHasMatchingTags() {
    TupleTag<Integer> intTag = new TupleTag<>();
    TupleTag<String> strTag = new TupleTag<>();
    TupleTag<Long> longTag = new TupleTag<>();
    Pipeline p = TestPipeline.create();
    PCollection<Long> longs = p.apply(GenerateSequence.from(0).to(100));
    PCollection<String> strs = p.apply(Create.of("foo", "bar", "baz"));
    PCollection<Integer> ints = longs.apply(MapElements.via(new SimpleFunction<Long, Integer>() {

        @Override
        public Integer apply(Long input) {
            return input.intValue();
        }
    }));
    Map<TupleTag<?>, PCollection<?>> pcsByTag = ImmutableMap.<TupleTag<?>, PCollection<?>>builder().put(strTag, strs).put(intTag, ints).put(longTag, longs).build();
    PCollectionTuple tuple = PCollectionTuple.of(intTag, ints).and(longTag, longs).and(strTag, strs);
    assertThat(tuple.getAll(), equalTo(pcsByTag));
    PCollectionTuple reconstructed = PCollectionTuple.empty(p);
    for (Entry<TupleTag<?>, PValue> taggedValue : tuple.expand().entrySet()) {
        TupleTag<?> tag = taggedValue.getKey();
        PValue value = taggedValue.getValue();
        assertThat("The tag should map back to the value", tuple.get(tag), equalTo(value));
        assertThat(value, Matchers.<PValue>equalTo(pcsByTag.get(tag)));
        reconstructed = reconstructed.and(tag, (PCollection) value);
    }
    assertThat(reconstructed, equalTo(tuple));
}
Also used : TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) SimpleFunction(org.apache.beam.sdk.transforms.SimpleFunction) Test(org.junit.Test)

Example 2 with SimpleFunction

use of org.apache.beam.sdk.transforms.SimpleFunction in project beam by apache.

the class DirectRunnerTest method wordCountShouldSucceed.

@Test
public void wordCountShouldSucceed() throws Throwable {
    Pipeline p = getPipeline();
    PCollection<KV<String, Long>> counts = p.apply(Create.of("foo", "bar", "foo", "baz", "bar", "foo")).apply(MapElements.via(new SimpleFunction<String, String>() {

        @Override
        public String apply(String input) {
            return input;
        }
    })).apply(Count.<String>perElement());
    PCollection<String> countStrs = counts.apply(MapElements.via(new SimpleFunction<KV<String, Long>, String>() {

        @Override
        public String apply(KV<String, Long> input) {
            String str = String.format("%s: %s", input.getKey(), input.getValue());
            return str;
        }
    }));
    PAssert.that(countStrs).containsInAnyOrder("baz: 1", "bar: 2", "foo: 3");
    DirectPipelineResult result = ((DirectPipelineResult) p.run());
    result.waitUntilFinish();
}
Also used : SimpleFunction(org.apache.beam.sdk.transforms.SimpleFunction) KV(org.apache.beam.sdk.values.KV) DirectPipelineResult(org.apache.beam.runners.direct.DirectRunner.DirectPipelineResult) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 3 with SimpleFunction

use of org.apache.beam.sdk.transforms.SimpleFunction in project beam by apache.

the class SparkPipelineStateTest method testFailedPipeline.

private void testFailedPipeline(final SparkPipelineOptions options) throws Exception {
    SparkPipelineResult result = null;
    try {
        final Pipeline pipeline = Pipeline.create(options);
        pipeline.apply(getValues(options)).setCoder(StringUtf8Coder.of()).apply(MapElements.via(new SimpleFunction<String, String>() {

            @Override
            public String apply(final String input) {
                throw new MyCustomException(FAILED_THE_BATCH_INTENTIONALLY);
            }
        }));
        result = (SparkPipelineResult) pipeline.run();
        result.waitUntilFinish();
    } catch (final Exception e) {
        assertThat(e, instanceOf(Pipeline.PipelineExecutionException.class));
        assertThat(e.getCause(), instanceOf(MyCustomException.class));
        assertThat(e.getCause().getMessage(), is(FAILED_THE_BATCH_INTENTIONALLY));
        assertThat(result.getState(), is(PipelineResult.State.FAILED));
        result.cancel();
        return;
    }
    fail("An injected failure did not affect the pipeline as expected.");
}
Also used : SimpleFunction(org.apache.beam.sdk.transforms.SimpleFunction) Pipeline(org.apache.beam.sdk.Pipeline)

Example 4 with SimpleFunction

use of org.apache.beam.sdk.transforms.SimpleFunction in project beam by apache.

the class DirectRunnerTest method reusePipelineSucceeds.

@Test
public void reusePipelineSucceeds() throws Throwable {
    Pipeline p = getPipeline();
    changed = new AtomicInteger(0);
    PCollection<KV<String, Long>> counts = p.apply(Create.of("foo", "bar", "foo", "baz", "bar", "foo")).apply(MapElements.via(new SimpleFunction<String, String>() {

        @Override
        public String apply(String input) {
            return input;
        }
    })).apply(Count.<String>perElement());
    PCollection<String> countStrs = counts.apply(MapElements.via(new SimpleFunction<KV<String, Long>, String>() {

        @Override
        public String apply(KV<String, Long> input) {
            String str = String.format("%s: %s", input.getKey(), input.getValue());
            return str;
        }
    }));
    counts.apply(ParDo.of(new DoFn<KV<String, Long>, Void>() {

        @ProcessElement
        public void updateChanged(ProcessContext c) {
            changed.getAndIncrement();
        }
    }));
    PAssert.that(countStrs).containsInAnyOrder("baz: 1", "bar: 2", "foo: 3");
    DirectPipelineResult result = ((DirectPipelineResult) p.run());
    result.waitUntilFinish();
    DirectPipelineResult otherResult = ((DirectPipelineResult) p.run());
    otherResult.waitUntilFinish();
    assertThat("Each element should have been processed twice", changed.get(), equalTo(6));
}
Also used : KV(org.apache.beam.sdk.values.KV) DirectPipelineResult(org.apache.beam.runners.direct.DirectRunner.DirectPipelineResult) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) DoFn(org.apache.beam.sdk.transforms.DoFn) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) SimpleFunction(org.apache.beam.sdk.transforms.SimpleFunction) Test(org.junit.Test)

Example 5 with SimpleFunction

use of org.apache.beam.sdk.transforms.SimpleFunction in project beam by apache.

the class DirectRunnerTest method byteArrayCountShouldSucceed.

@Test
public void byteArrayCountShouldSucceed() {
    Pipeline p = getPipeline();
    SerializableFunction<Integer, byte[]> getBytes = new SerializableFunction<Integer, byte[]>() {

        @Override
        public byte[] apply(Integer input) {
            try {
                return CoderUtils.encodeToByteArray(VarIntCoder.of(), input);
            } catch (CoderException e) {
                fail("Unexpected Coder Exception " + e);
                throw new AssertionError("Unreachable");
            }
        }
    };
    TypeDescriptor<byte[]> td = new TypeDescriptor<byte[]>() {
    };
    PCollection<byte[]> foos = p.apply(Create.of(1, 1, 1, 2, 2, 3)).apply(MapElements.into(td).via(getBytes));
    PCollection<byte[]> msync = p.apply(Create.of(1, -2, -8, -16)).apply(MapElements.into(td).via(getBytes));
    PCollection<byte[]> bytes = PCollectionList.of(foos).and(msync).apply(Flatten.<byte[]>pCollections());
    PCollection<KV<byte[], Long>> counts = bytes.apply(Count.<byte[]>perElement());
    PCollection<KV<Integer, Long>> countsBackToString = counts.apply(MapElements.via(new SimpleFunction<KV<byte[], Long>, KV<Integer, Long>>() {

        @Override
        public KV<Integer, Long> apply(KV<byte[], Long> input) {
            try {
                return KV.of(CoderUtils.decodeFromByteArray(VarIntCoder.of(), input.getKey()), input.getValue());
            } catch (CoderException e) {
                fail("Unexpected Coder Exception " + e);
                throw new AssertionError("Unreachable");
            }
        }
    }));
    Map<Integer, Long> expected = ImmutableMap.<Integer, Long>builder().put(1, 4L).put(2, 2L).put(3, 1L).put(-2, 1L).put(-8, 1L).put(-16, 1L).build();
    PAssert.thatMap(countsBackToString).isEqualTo(expected);
}
Also used : SerializableFunction(org.apache.beam.sdk.transforms.SerializableFunction) KV(org.apache.beam.sdk.values.KV) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) TypeDescriptor(org.apache.beam.sdk.values.TypeDescriptor) SimpleFunction(org.apache.beam.sdk.transforms.SimpleFunction) CoderException(org.apache.beam.sdk.coders.CoderException) Test(org.junit.Test)

Aggregations

Pipeline (org.apache.beam.sdk.Pipeline)6 SimpleFunction (org.apache.beam.sdk.transforms.SimpleFunction)6 TestPipeline (org.apache.beam.sdk.testing.TestPipeline)5 Test (org.junit.Test)4 KV (org.apache.beam.sdk.values.KV)3 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)2 DirectPipelineResult (org.apache.beam.runners.direct.DirectRunner.DirectPipelineResult)2 TableReference (com.google.api.services.bigquery.model.TableReference)1 TableRow (com.google.api.services.bigquery.model.TableRow)1 TableSchema (com.google.api.services.bigquery.model.TableSchema)1 CoderException (org.apache.beam.sdk.coders.CoderException)1 JsonSchemaToTableSchema (org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.JsonSchemaToTableSchema)1 BigQueryHelpers.createTempTableReference (org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.createTempTableReference)1 BigQueryHelpers.toJsonString (org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.toJsonString)1 DoFn (org.apache.beam.sdk.transforms.DoFn)1 SerializableFunction (org.apache.beam.sdk.transforms.SerializableFunction)1 TypeDescriptor (org.apache.beam.sdk.values.TypeDescriptor)1