Search in sources :

Example 6 with FailureCollector

use of io.cdap.cdap.etl.api.FailureCollector in project cdap by caskdata.

the class SparkStreamingPipelineRunner method handleJoin.

@Override
protected SparkCollection<Object> handleJoin(Map<String, SparkCollection<Object>> inputDataCollections, PipelinePhase pipelinePhase, PluginFunctionContext pluginFunctionContext, StageSpec stageSpec, FunctionCache.Factory functionCacheFactory, Object plugin, Integer numPartitions, StageStatisticsCollector collector, Set<String> shufflers) throws Exception {
    String stageName = stageSpec.getName();
    BatchJoiner<?, ?, ?> joiner;
    if (plugin instanceof BatchAutoJoiner) {
        BatchAutoJoiner autoJoiner = (BatchAutoJoiner) plugin;
        Map<String, Schema> inputSchemas = new HashMap<>();
        for (String inputStageName : pipelinePhase.getStageInputs(stageName)) {
            StageSpec inputStageSpec = pipelinePhase.getStage(inputStageName);
            inputSchemas.put(inputStageName, inputStageSpec.getOutputSchema());
        }
        FailureCollector failureCollector = new LoggingFailureCollector(stageName, inputSchemas);
        AutoJoinerContext autoJoinerContext = DefaultAutoJoinerContext.from(inputSchemas, failureCollector);
        failureCollector.getOrThrowException();
        JoinDefinition joinDefinition = autoJoiner.define(autoJoinerContext);
        if (joinDefinition == null) {
            throw new IllegalStateException(String.format("Joiner stage '%s' did not specify a join definition. " + "Check with the plugin developer to ensure it is implemented correctly.", stageName));
        }
        joiner = new JoinerBridge(stageName, autoJoiner, joinDefinition);
    } else if (plugin instanceof BatchJoiner) {
        joiner = (BatchJoiner) plugin;
    } else {
        // should never happen unless there is a bug in the code. should have failed during deployment
        throw new IllegalStateException(String.format("Stage '%s' is an unknown joiner type %s", stageName, plugin.getClass().getName()));
    }
    BatchJoinerRuntimeContext joinerRuntimeContext = pluginFunctionContext.createBatchRuntimeContext();
    joiner.initialize(joinerRuntimeContext);
    shufflers.add(stageName);
    return handleJoin(joiner, inputDataCollections, stageSpec, functionCacheFactory, numPartitions, collector);
}
Also used : BatchJoinerRuntimeContext(io.cdap.cdap.etl.api.batch.BatchJoinerRuntimeContext) LoggingFailureCollector(io.cdap.cdap.etl.validation.LoggingFailureCollector) HashMap(java.util.HashMap) Schema(io.cdap.cdap.api.data.schema.Schema) BatchJoiner(io.cdap.cdap.etl.api.batch.BatchJoiner) BatchAutoJoiner(io.cdap.cdap.etl.api.batch.BatchAutoJoiner) DefaultAutoJoinerContext(io.cdap.cdap.etl.common.DefaultAutoJoinerContext) AutoJoinerContext(io.cdap.cdap.etl.api.join.AutoJoinerContext) JoinDefinition(io.cdap.cdap.etl.api.join.JoinDefinition) StageSpec(io.cdap.cdap.etl.proto.v2.spec.StageSpec) LoggingFailureCollector(io.cdap.cdap.etl.validation.LoggingFailureCollector) FailureCollector(io.cdap.cdap.etl.api.FailureCollector) JoinerBridge(io.cdap.cdap.etl.common.plugin.JoinerBridge)

Example 7 with FailureCollector

use of io.cdap.cdap.etl.api.FailureCollector in project hydrator-plugins by cdapio.

the class FileStreamingSource method getStream.

@Override
public JavaDStream<StructuredRecord> getStream(StreamingContext context) throws Exception {
    FailureCollector collector = context.getFailureCollector();
    conf.validate(collector);
    conf.getSchema(collector);
    collector.getOrThrowException();
    JavaStreamingContext jsc = context.getSparkStreamingContext();
    return FileStreamingSourceUtil.getJavaDStream(jsc, conf);
}
Also used : JavaStreamingContext(org.apache.spark.streaming.api.java.JavaStreamingContext) FailureCollector(io.cdap.cdap.etl.api.FailureCollector)

Example 8 with FailureCollector

use of io.cdap.cdap.etl.api.FailureCollector in project hydrator-plugins by cdapio.

the class HTTPPollerSource method configurePipeline.

@Override
public void configurePipeline(PipelineConfigurer pipelineConfigurer) {
    super.configurePipeline(pipelineConfigurer);
    FailureCollector collector = pipelineConfigurer.getStageConfigurer().getFailureCollector();
    conf.validate(collector);
}
Also used : FailureCollector(io.cdap.cdap.etl.api.FailureCollector)

Example 9 with FailureCollector

use of io.cdap.cdap.etl.api.FailureCollector in project hydrator-plugins by cdapio.

the class HTTPPollerSource method getStream.

@Override
public JavaDStream<StructuredRecord> getStream(StreamingContext streamingContext) {
    FailureCollector collector = streamingContext.getFailureCollector();
    conf.validate(collector);
    collector.getOrThrowException();
    return HTTPPollerSourceUtil.getJavaDStream(streamingContext, conf);
}
Also used : FailureCollector(io.cdap.cdap.etl.api.FailureCollector)

Example 10 with FailureCollector

use of io.cdap.cdap.etl.api.FailureCollector in project hydrator-plugins by cdapio.

the class ReferenceStreamingSource method configurePipeline.

@Override
public void configurePipeline(PipelineConfigurer pipelineConfigurer) throws IllegalArgumentException {
    super.configurePipeline(pipelineConfigurer);
    // Verify that reference name meets dataset id constraints
    FailureCollector collector = pipelineConfigurer.getStageConfigurer().getFailureCollector();
    IdUtils.validateReferenceName(conf.referenceName, collector);
    // if reference name is not valid, throw an exception before creating external dataset
    collector.getOrThrowException();
    pipelineConfigurer.createDataset(conf.referenceName, Constants.EXTERNAL_DATASET_TYPE, DatasetProperties.EMPTY);
}
Also used : FailureCollector(io.cdap.cdap.etl.api.FailureCollector)

Aggregations

FailureCollector (io.cdap.cdap.etl.api.FailureCollector)156 Schema (io.cdap.cdap.api.data.schema.Schema)69 Test (org.junit.Test)57 MockFailureCollector (io.cdap.cdap.etl.mock.validation.MockFailureCollector)31 MockPipelineConfigurer (io.cdap.cdap.etl.mock.common.MockPipelineConfigurer)26 HashMap (java.util.HashMap)20 AutoJoinerContext (io.cdap.cdap.etl.api.join.AutoJoinerContext)18 JoinDefinition (io.cdap.cdap.etl.api.join.JoinDefinition)15 IOException (java.io.IOException)15 StageConfigurer (io.cdap.cdap.etl.api.StageConfigurer)14 ValidationFailure (io.cdap.cdap.etl.api.validation.ValidationFailure)12 Cause (io.cdap.cdap.etl.api.validation.ValidationFailure.Cause)12 ValidationException (io.cdap.cdap.etl.api.validation.ValidationException)10 Field (io.cdap.cdap.api.data.schema.Schema.Field)8 Map (java.util.Map)8 BatchJoiner (io.cdap.cdap.etl.api.batch.BatchJoiner)7 DefaultAutoJoinerContext (io.cdap.cdap.etl.common.DefaultAutoJoinerContext)7 LoggingFailureCollector (io.cdap.cdap.etl.validation.LoggingFailureCollector)7 LineageRecorder (io.cdap.plugin.common.LineageRecorder)7 StructuredRecord (io.cdap.cdap.api.data.format.StructuredRecord)6