Search in sources :

Example 16 with BoundedSource

use of org.apache.beam.sdk.io.BoundedSource in project beam by apache.

the class BigQuerySourceBase method createSources.

List<BoundedSource<T>> createSources(List<ResourceId> files, TableSchema schema, List<MatchResult.Metadata> metadata) throws IOException, InterruptedException {
    final String jsonSchema = BigQueryIO.JSON_FACTORY.toString(schema);
    SerializableFunction<GenericRecord, T> fnWrapper = new SerializableFunction<GenericRecord, T>() {

        private Supplier<TableSchema> schema = Suppliers.memoize(Suppliers.compose(new TableSchemaFunction(), Suppliers.ofInstance(jsonSchema)));

        @Override
        public T apply(GenericRecord input) {
            return parseFn.apply(new SchemaAndRecord(input, schema.get()));
        }
    };
    List<BoundedSource<T>> avroSources = Lists.newArrayList();
    // mode.
    if (metadata != null) {
        for (MatchResult.Metadata file : metadata) {
            avroSources.add(AvroSource.from(file).withParseFn(fnWrapper, getOutputCoder()));
        }
    } else {
        for (ResourceId file : files) {
            avroSources.add(AvroSource.from(file.toString()).withParseFn(fnWrapper, getOutputCoder()));
        }
    }
    return ImmutableList.copyOf(avroSources);
}
Also used : BoundedSource(org.apache.beam.sdk.io.BoundedSource) SerializableFunction(org.apache.beam.sdk.transforms.SerializableFunction) MatchResult(org.apache.beam.sdk.io.fs.MatchResult) ResourceId(org.apache.beam.sdk.io.fs.ResourceId) Supplier(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier) GenericRecord(org.apache.avro.generic.GenericRecord)

Aggregations

BoundedSource (org.apache.beam.sdk.io.BoundedSource)16 ArrayList (java.util.ArrayList)6 Test (org.junit.Test)6 List (java.util.List)3 UnboundedSource (org.apache.beam.sdk.io.UnboundedSource)3 SourceMetadata (com.google.api.services.dataflow.model.SourceMetadata)2 ByteString (com.google.protobuf.ByteString)2 IOException (java.io.IOException)2 GenericRecord (org.apache.avro.generic.GenericRecord)2 Source (org.apache.beam.sdk.io.Source)2 ResourceId (org.apache.beam.sdk.io.fs.ResourceId)2 SerializableFunction (org.apache.beam.sdk.transforms.SerializableFunction)2 WindowedValue (org.apache.beam.sdk.util.WindowedValue)2 KV (org.apache.beam.sdk.values.KV)2 Base64.encodeBase64String (com.google.api.client.util.Base64.encodeBase64String)1 TableRow (com.google.api.services.bigquery.model.TableRow)1 TableSchema (com.google.api.services.bigquery.model.TableSchema)1 DerivedSource (com.google.api.services.dataflow.model.DerivedSource)1 SourceOperationResponse (com.google.api.services.dataflow.model.SourceOperationResponse)1 SourceSplitOptions (com.google.api.services.dataflow.model.SourceSplitOptions)1