Search in sources :

Example 1 with SyntheticSourceOptions

use of org.apache.beam.sdk.io.synthetic.SyntheticSourceOptions in project beam by apache.

the class SyntheticDataPublisher method main.

public static void main(String[] args) throws IOException {
    options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
    SyntheticSourceOptions sourceOptions = SyntheticOptions.fromJsonString(options.getSourceOptions(), SyntheticSourceOptions.class);
    Pipeline pipeline = Pipeline.create(options);
    PCollection<KV<byte[], byte[]>> syntheticData = pipeline.apply("Read synthetic data", Read.from(new SyntheticBoundedSource(sourceOptions)));
    if (options.getKafkaBootstrapServerAddress() != null && options.getKafkaTopic() != null) {
        writeToKafka(syntheticData);
    }
    if (options.getPubSubTopic() != null) {
        writeToPubSub(syntheticData);
    }
    if (allKinesisOptionsConfigured()) {
        writeToKinesis(syntheticData);
    }
    pipeline.run().waitUntilFinish();
}
Also used : SyntheticBoundedSource(org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) ApplicationNameOptions(org.apache.beam.sdk.options.ApplicationNameOptions) SyntheticSourceOptions(org.apache.beam.sdk.io.synthetic.SyntheticSourceOptions) SyntheticOptions(org.apache.beam.sdk.io.synthetic.SyntheticOptions) KV(org.apache.beam.sdk.values.KV) SyntheticSourceOptions(org.apache.beam.sdk.io.synthetic.SyntheticSourceOptions) Pipeline(org.apache.beam.sdk.Pipeline)

Example 2 with SyntheticSourceOptions

use of org.apache.beam.sdk.io.synthetic.SyntheticSourceOptions in project beam by apache.

the class CoGroupByKeyLoadTest method loadTest.

@Override
void loadTest() throws IOException {
    SyntheticSourceOptions coSourceOptions = fromJsonString(options.getCoSourceOptions(), SyntheticSourceOptions.class);
    Optional<SyntheticStep> syntheticStep = createStep(options.getStepOptions());
    PCollection<KV<byte[], byte[]>> input = pipeline.apply("Read input", readFromSource(sourceOptions));
    input = input.apply("Collect start time metrics (input)", ParDo.of(runtimeMonitor));
    input = applyWindowing(input);
    input = applyStepIfPresent(input, "Synthetic step for input", syntheticStep);
    PCollection<KV<byte[], byte[]>> coInput = pipeline.apply("Read co-input", readFromSource(coSourceOptions));
    coInput = coInput.apply("Collect start time metrics (co-input)", ParDo.of(runtimeMonitor));
    coInput = applyWindowing(coInput, options.getCoInputWindowDurationSec());
    coInput = applyStepIfPresent(coInput, "Synthetic step for co-input", syntheticStep);
    KeyedPCollectionTuple.of(INPUT_TAG, input).and(CO_INPUT_TAG, coInput).apply("CoGroupByKey", CoGroupByKey.create()).apply("Ungroup and reiterate", ParDo.of(new UngroupAndReiterate(options.getIterations()))).apply("Collect total bytes", ParDo.of(new ByteMonitor(METRICS_NAMESPACE, "totalBytes.count"))).apply("Collect end time metrics", ParDo.of(runtimeMonitor));
}
Also used : KV(org.apache.beam.sdk.values.KV) ByteMonitor(org.apache.beam.sdk.testutils.metrics.ByteMonitor) SyntheticSourceOptions(org.apache.beam.sdk.io.synthetic.SyntheticSourceOptions) SyntheticStep(org.apache.beam.sdk.io.synthetic.SyntheticStep)

Aggregations

SyntheticSourceOptions (org.apache.beam.sdk.io.synthetic.SyntheticSourceOptions)2 KV (org.apache.beam.sdk.values.KV)2 Pipeline (org.apache.beam.sdk.Pipeline)1 SyntheticBoundedSource (org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource)1 SyntheticOptions (org.apache.beam.sdk.io.synthetic.SyntheticOptions)1 SyntheticStep (org.apache.beam.sdk.io.synthetic.SyntheticStep)1 ApplicationNameOptions (org.apache.beam.sdk.options.ApplicationNameOptions)1 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)1 ByteMonitor (org.apache.beam.sdk.testutils.metrics.ByteMonitor)1