Search in sources :

Example 11 with BoundedSource

use of org.apache.beam.sdk.io.BoundedSource in project beam by apache.

the class WorkerCustomSources method serializeSplitToCloudSource.

/**
 * Version of {@link CustomSources#serializeToCloudSource(Source, PipelineOptions)} intended for
 * use on splits of {@link BoundedSource}.
 */
private static com.google.api.services.dataflow.model.Source serializeSplitToCloudSource(BoundedSource<?> source) throws Exception {
    com.google.api.services.dataflow.model.Source cloudSource = new com.google.api.services.dataflow.model.Source();
    cloudSource.setSpec(CloudObject.forClass(CustomSources.class));
    addString(cloudSource.getSpec(), SERIALIZED_SOURCE, encodeBase64String(serializeToByteArray(source)));
    SourceMetadata metadata = new SourceMetadata();
    // Size estimation is best effort so we continue even if it fails here.
    try {
        long estimatedSize = source.getEstimatedSizeBytes(PipelineOptionsFactory.create());
        if (estimatedSize >= 0) {
            metadata.setEstimatedSizeBytes(estimatedSize);
        } else {
            LOG.warn("Ignoring negative estimated size {} produced by source {}", estimatedSize, source);
        }
    } catch (Exception e) {
        LOG.warn("Size estimation of the source failed: " + source, e);
    }
    cloudSource.setMetadata(metadata);
    return cloudSource;
}
Also used : SourceMetadata(com.google.api.services.dataflow.model.SourceMetadata) CustomSources(org.apache.beam.runners.dataflow.internal.CustomSources) UnboundedSource(org.apache.beam.sdk.io.UnboundedSource) Source(org.apache.beam.sdk.io.Source) DerivedSource(com.google.api.services.dataflow.model.DerivedSource) BoundedSource(org.apache.beam.sdk.io.BoundedSource) NoSuchElementException(java.util.NoSuchElementException) IOException(java.io.IOException)

Example 12 with BoundedSource

use of org.apache.beam.sdk.io.BoundedSource in project beam by apache.

the class ReadTranslationTest method testToFromProtoBounded.

@Test
public void testToFromProtoBounded() throws Exception {
    // TODO: Split into two tests.
    assumeThat(source, instanceOf(BoundedSource.class));
    BoundedSource<?> boundedSource = (BoundedSource<?>) this.source;
    SplittableParDo.PrimitiveBoundedRead<?> boundedRead = new SplittableParDo.PrimitiveBoundedRead<>(Read.from(boundedSource));
    ReadPayload payload = ReadTranslation.toProto(boundedRead);
    assertThat(payload.getIsBounded(), equalTo(RunnerApi.IsBounded.Enum.BOUNDED));
    BoundedSource<?> deserializedSource = ReadTranslation.boundedSourceFromProto(payload);
    assertThat(deserializedSource, equalTo(source));
}
Also used : BoundedSource(org.apache.beam.sdk.io.BoundedSource) ReadPayload(org.apache.beam.model.pipeline.v1.RunnerApi.ReadPayload) Test(org.junit.Test)

Example 13 with BoundedSource

use of org.apache.beam.sdk.io.BoundedSource in project beam by apache.

the class ReadTranslator method translate.

@Override
public void translate(PTransform<PBegin, PCollection<T>> transform, TransformHierarchy.Node node, TranslationContext ctx) {
    final PCollection<T> output = ctx.getOutput(transform);
    final Coder<WindowedValue<T>> coder = SamzaCoders.of(output);
    final Source<?> source = transform instanceof SplittableParDo.PrimitiveBoundedRead ? ((SplittableParDo.PrimitiveBoundedRead) transform).getSource() : ((SplittableParDo.PrimitiveUnboundedRead) transform).getSource();
    final String id = ctx.getIdForPValue(output);
    // Create system descriptor
    final GenericSystemDescriptor systemDescriptor;
    if (source instanceof BoundedSource) {
        systemDescriptor = new GenericSystemDescriptor(id, BoundedSourceSystem.Factory.class.getName());
    } else {
        systemDescriptor = new GenericSystemDescriptor(id, UnboundedSourceSystem.Factory.class.getName());
    }
    final Map<String, String> systemConfig = ImmutableMap.of("source", Base64Serializer.serializeUnchecked(source), "coder", Base64Serializer.serializeUnchecked(coder), "stepName", node.getFullName());
    systemDescriptor.withSystemConfigs(systemConfig);
    // Create stream descriptor
    @SuppressWarnings("unchecked") final Serde<KV<?, OpMessage<T>>> kvSerde = (Serde) KVSerde.of(new NoOpSerde<>(), new NoOpSerde<>());
    final GenericInputDescriptor<KV<?, OpMessage<T>>> inputDescriptor = systemDescriptor.getInputDescriptor(id, kvSerde);
    if (source instanceof BoundedSource) {
        inputDescriptor.isBounded();
    }
    ctx.registerInputMessageStream(output, inputDescriptor);
}
Also used : Serde(org.apache.samza.serializers.Serde) KVSerde(org.apache.samza.serializers.KVSerde) NoOpSerde(org.apache.samza.serializers.NoOpSerde) BoundedSource(org.apache.beam.sdk.io.BoundedSource) KV(org.apache.samza.operators.KV) SplittableParDo(org.apache.beam.runners.core.construction.SplittableParDo) WindowedValue(org.apache.beam.sdk.util.WindowedValue) UnboundedSourceSystem(org.apache.beam.runners.samza.adapter.UnboundedSourceSystem) NoOpSerde(org.apache.samza.serializers.NoOpSerde) GenericSystemDescriptor(org.apache.samza.system.descriptors.GenericSystemDescriptor) BoundedSourceSystem(org.apache.beam.runners.samza.adapter.BoundedSourceSystem)

Example 14 with BoundedSource

use of org.apache.beam.sdk.io.BoundedSource in project component-runtime by Talend.

the class DIPipeline method wrapTransformIfNeeded.

private <PT extends POutput> PTransform<? super PBegin, PT> wrapTransformIfNeeded(final PTransform<? super PBegin, PT> root) {
    if (Read.Bounded.class.isInstance(root)) {
        final BoundedSource source = Read.Bounded.class.cast(root).getSource();
        final DelegatingBoundedSource boundedSource = new DelegatingBoundedSource(source, null);
        setState(boundedSource);
        return Read.from(boundedSource);
    }
    if (Read.Unbounded.class.isInstance(root)) {
        final UnboundedSource source = Read.Unbounded.class.cast(root).getSource();
        if (InMemoryQueueIO.UnboundedQueuedInput.class.isInstance(source)) {
            return root;
        }
        final DelegatingUnBoundedSource unBoundedSource = new DelegatingUnBoundedSource(source, null);
        setState(unBoundedSource);
        return Read.from(unBoundedSource);
    }
    return root;
}
Also used : Read(org.apache.beam.sdk.io.Read) DelegatingUnBoundedSource(org.talend.sdk.component.runtime.di.beam.DelegatingUnBoundedSource) BoundedSource(org.apache.beam.sdk.io.BoundedSource) DelegatingBoundedSource(org.talend.sdk.component.runtime.di.beam.DelegatingBoundedSource) InMemoryQueueIO(org.talend.sdk.component.runtime.di.beam.InMemoryQueueIO) DelegatingBoundedSource(org.talend.sdk.component.runtime.di.beam.DelegatingBoundedSource) DelegatingUnBoundedSource(org.talend.sdk.component.runtime.di.beam.DelegatingUnBoundedSource) UnboundedSource(org.apache.beam.sdk.io.UnboundedSource)

Example 15 with BoundedSource

use of org.apache.beam.sdk.io.BoundedSource in project beam by apache.

the class XmlSourceTest method testReadXMLInvalidRecordClassWithCustomEventHandler.

@Test
public void testReadXMLInvalidRecordClassWithCustomEventHandler() throws IOException {
    File file = tempFolder.newFile("trainXMLSmall");
    Files.write(file.toPath(), trainXML.getBytes(StandardCharsets.UTF_8));
    ValidationEventHandler validationEventHandler = event -> {
        throw new RuntimeException("MyCustomValidationEventHandler failure mesage");
    };
    BoundedSource<WrongTrainType> source = XmlIO.<WrongTrainType>read().from(file.toPath().toString()).withRootElement("trains").withRecordElement("train").withRecordClass(WrongTrainType.class).withValidationEventHandler(validationEventHandler).createSource();
    exception.expect(RuntimeException.class);
    // JAXB internationalizes the error message. So this is all we can match for.
    exception.expectMessage("MyCustomValidationEventHandler failure mesage");
    try (Reader<WrongTrainType> reader = source.createReader(null)) {
        for (boolean available = reader.start(); available; available = reader.advance()) {
            reader.getCurrent();
        }
    }
}
Also used : RunWith(org.junit.runner.RunWith) Random(java.util.Random) PipelineOptionsFactory(org.apache.beam.sdk.options.PipelineOptionsFactory) ArrayList(java.util.ArrayList) SourceTestUtils.assertSplitAtFractionFails(org.apache.beam.sdk.testing.SourceTestUtils.assertSplitAtFractionFails) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) ValidationEventHandler(javax.xml.bind.ValidationEventHandler) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) ExpectedException(org.junit.rules.ExpectedException) Nullable(org.checkerframework.checker.nullness.qual.Nullable) XmlAttribute(javax.xml.bind.annotation.XmlAttribute) Files(java.nio.file.Files) PAssert(org.apache.beam.sdk.testing.PAssert) BufferedWriter(java.io.BufferedWriter) Assert.assertTrue(org.junit.Assert.assertTrue) IOException(java.io.IOException) Test(org.junit.Test) JUnit4(org.junit.runners.JUnit4) XmlRootElement(javax.xml.bind.annotation.XmlRootElement) PCollection(org.apache.beam.sdk.values.PCollection) File(java.io.File) StandardCharsets(java.nio.charset.StandardCharsets) List(java.util.List) BoundedSource(org.apache.beam.sdk.io.BoundedSource) Rule(org.junit.Rule) Ignore(org.junit.Ignore) Matchers.containsInAnyOrder(org.hamcrest.Matchers.containsInAnyOrder) SourceTestUtils.assertSplitAtFractionExhaustive(org.apache.beam.sdk.testing.SourceTestUtils.assertSplitAtFractionExhaustive) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) SourceTestUtils.assertSplitAtFractionSucceedsAndConsistent(org.apache.beam.sdk.testing.SourceTestUtils.assertSplitAtFractionSucceedsAndConsistent) Reader(org.apache.beam.sdk.io.Source.Reader) Assert.assertEquals(org.junit.Assert.assertEquals) TemporaryFolder(org.junit.rules.TemporaryFolder) ValidationEventHandler(javax.xml.bind.ValidationEventHandler) File(java.io.File) Test(org.junit.Test)

Aggregations

BoundedSource (org.apache.beam.sdk.io.BoundedSource)16 ArrayList (java.util.ArrayList)6 Test (org.junit.Test)6 List (java.util.List)3 UnboundedSource (org.apache.beam.sdk.io.UnboundedSource)3 SourceMetadata (com.google.api.services.dataflow.model.SourceMetadata)2 ByteString (com.google.protobuf.ByteString)2 IOException (java.io.IOException)2 GenericRecord (org.apache.avro.generic.GenericRecord)2 Source (org.apache.beam.sdk.io.Source)2 ResourceId (org.apache.beam.sdk.io.fs.ResourceId)2 SerializableFunction (org.apache.beam.sdk.transforms.SerializableFunction)2 WindowedValue (org.apache.beam.sdk.util.WindowedValue)2 KV (org.apache.beam.sdk.values.KV)2 Base64.encodeBase64String (com.google.api.client.util.Base64.encodeBase64String)1 TableRow (com.google.api.services.bigquery.model.TableRow)1 TableSchema (com.google.api.services.bigquery.model.TableSchema)1 DerivedSource (com.google.api.services.dataflow.model.DerivedSource)1 SourceOperationResponse (com.google.api.services.dataflow.model.SourceOperationResponse)1 SourceSplitOptions (com.google.api.services.dataflow.model.SourceSplitOptions)1