Search in sources :

Example 6 with SparkPipelineRuntime

use of co.cask.cdap.etl.spark.SparkPipelineRuntime in project cdap by caskdata.

the class RDDCollection method compute.

@Override
public <U> SparkCollection<U> compute(StageSpec stageSpec, SparkCompute<T, U> compute) throws Exception {
    String stageName = stageSpec.getName();
    PipelineRuntime pipelineRuntime = new SparkPipelineRuntime(sec);
    SparkExecutionPluginContext sparkPluginContext = new BasicSparkExecutionPluginContext(sec, jsc, datasetContext, pipelineRuntime, stageSpec);
    compute.initialize(sparkPluginContext);
    JavaRDD<T> countedInput = rdd.map(new CountingFunction<T>(stageName, sec.getMetrics(), "records.in", null)).cache();
    return wrap(compute.transform(sparkPluginContext, countedInput).map(new CountingFunction<U>(stageName, sec.getMetrics(), "records.out", sec.getDataTracer(stageName))));
}
Also used : SparkExecutionPluginContext(co.cask.cdap.etl.api.batch.SparkExecutionPluginContext) SparkPipelineRuntime(co.cask.cdap.etl.spark.SparkPipelineRuntime) PipelineRuntime(co.cask.cdap.etl.common.PipelineRuntime) SparkPipelineRuntime(co.cask.cdap.etl.spark.SparkPipelineRuntime) CountingFunction(co.cask.cdap.etl.spark.function.CountingFunction)

Example 7 with SparkPipelineRuntime

use of co.cask.cdap.etl.spark.SparkPipelineRuntime in project cdap by caskdata.

the class DynamicSparkCompute method lazyInit.

// when checkpointing is enabled, and Spark is loading DStream operations from an existing checkpoint,
// delegate will be null and the initialize() method won't have been called. So we need to instantiate
// the delegate and initialize it.
private void lazyInit(final JavaSparkContext jsc) throws Exception {
    if (delegate == null) {
        PluginFunctionContext pluginFunctionContext = dynamicDriverContext.getPluginFunctionContext();
        delegate = pluginFunctionContext.createPlugin();
        final StageSpec stageSpec = pluginFunctionContext.getStageSpec();
        final JavaSparkExecutionContext sec = dynamicDriverContext.getSparkExecutionContext();
        Transactionals.execute(sec, new TxRunnable() {

            @Override
            public void run(DatasetContext datasetContext) throws Exception {
                PipelineRuntime pipelineRuntime = new SparkPipelineRuntime(sec);
                SparkExecutionPluginContext sparkPluginContext = new BasicSparkExecutionPluginContext(sec, jsc, datasetContext, pipelineRuntime, stageSpec);
                delegate.initialize(sparkPluginContext);
            }
        }, Exception.class);
    }
}
Also used : BasicSparkExecutionPluginContext(co.cask.cdap.etl.spark.batch.BasicSparkExecutionPluginContext) PluginFunctionContext(co.cask.cdap.etl.spark.function.PluginFunctionContext) SparkExecutionPluginContext(co.cask.cdap.etl.api.batch.SparkExecutionPluginContext) BasicSparkExecutionPluginContext(co.cask.cdap.etl.spark.batch.BasicSparkExecutionPluginContext) PipelineRuntime(co.cask.cdap.etl.common.PipelineRuntime) SparkPipelineRuntime(co.cask.cdap.etl.spark.SparkPipelineRuntime) SparkPipelineRuntime(co.cask.cdap.etl.spark.SparkPipelineRuntime) TxRunnable(co.cask.cdap.api.TxRunnable) StageSpec(co.cask.cdap.etl.spec.StageSpec) JavaSparkExecutionContext(co.cask.cdap.api.spark.JavaSparkExecutionContext) DatasetContext(co.cask.cdap.api.data.DatasetContext)

Example 8 with SparkPipelineRuntime

use of co.cask.cdap.etl.spark.SparkPipelineRuntime in project cdap by caskdata.

the class StreamingAlertPublishFunction method call.

@Override
public Void call(JavaRDD<Alert> data, Time batchTime) throws Exception {
    MacroEvaluator evaluator = new DefaultMacroEvaluator(new BasicArguments(sec), batchTime.milliseconds(), sec.getSecureStore(), sec.getNamespace());
    PluginContext pluginContext = new SparkPipelinePluginContext(sec.getPluginContext(), sec.getMetrics(), stageSpec.isStageLoggingEnabled(), stageSpec.isProcessTimingEnabled());
    String stageName = stageSpec.getName();
    AlertPublisher alertPublisher = pluginContext.newPluginInstance(stageName, evaluator);
    PipelineRuntime pipelineRuntime = new SparkPipelineRuntime(sec, batchTime.milliseconds());
    AlertPublisherContext alertPublisherContext = new DefaultAlertPublisherContext(pipelineRuntime, stageSpec, sec.getMessagingContext(), sec.getAdmin());
    alertPublisher.initialize(alertPublisherContext);
    StageMetrics stageMetrics = new DefaultStageMetrics(sec.getMetrics(), stageName);
    TrackedIterator<Alert> trackedAlerts = new TrackedIterator<>(data.collect().iterator(), stageMetrics, Constants.Metrics.RECORDS_IN);
    alertPublisher.publish(trackedAlerts);
    alertPublisher.destroy();
    return null;
}
Also used : MacroEvaluator(co.cask.cdap.api.macro.MacroEvaluator) DefaultMacroEvaluator(co.cask.cdap.etl.common.DefaultMacroEvaluator) AlertPublisher(co.cask.cdap.etl.api.AlertPublisher) PipelineRuntime(co.cask.cdap.etl.common.PipelineRuntime) SparkPipelineRuntime(co.cask.cdap.etl.spark.SparkPipelineRuntime) SparkPipelinePluginContext(co.cask.cdap.etl.spark.plugin.SparkPipelinePluginContext) PluginContext(co.cask.cdap.api.plugin.PluginContext) SparkPipelineRuntime(co.cask.cdap.etl.spark.SparkPipelineRuntime) TrackedIterator(co.cask.cdap.etl.common.TrackedIterator) SparkPipelinePluginContext(co.cask.cdap.etl.spark.plugin.SparkPipelinePluginContext) DefaultMacroEvaluator(co.cask.cdap.etl.common.DefaultMacroEvaluator) Alert(co.cask.cdap.etl.api.Alert) BasicArguments(co.cask.cdap.etl.common.BasicArguments) DefaultAlertPublisherContext(co.cask.cdap.etl.common.DefaultAlertPublisherContext) AlertPublisherContext(co.cask.cdap.etl.api.AlertPublisherContext) DefaultAlertPublisherContext(co.cask.cdap.etl.common.DefaultAlertPublisherContext) StageMetrics(co.cask.cdap.etl.api.StageMetrics) DefaultStageMetrics(co.cask.cdap.etl.common.DefaultStageMetrics) DefaultStageMetrics(co.cask.cdap.etl.common.DefaultStageMetrics)

Aggregations

PipelineRuntime (co.cask.cdap.etl.common.PipelineRuntime)8 SparkPipelineRuntime (co.cask.cdap.etl.spark.SparkPipelineRuntime)8 SparkExecutionPluginContext (co.cask.cdap.etl.api.batch.SparkExecutionPluginContext)5 TxRunnable (co.cask.cdap.api.TxRunnable)4 DatasetContext (co.cask.cdap.api.data.DatasetContext)4 MacroEvaluator (co.cask.cdap.api.macro.MacroEvaluator)3 PluginContext (co.cask.cdap.api.plugin.PluginContext)3 BasicArguments (co.cask.cdap.etl.common.BasicArguments)3 DefaultMacroEvaluator (co.cask.cdap.etl.common.DefaultMacroEvaluator)3 CountingFunction (co.cask.cdap.etl.spark.function.CountingFunction)3 PluginFunctionContext (co.cask.cdap.etl.spark.function.PluginFunctionContext)3 SparkPipelinePluginContext (co.cask.cdap.etl.spark.plugin.SparkPipelinePluginContext)3 Alert (co.cask.cdap.etl.api.Alert)2 AlertPublisher (co.cask.cdap.etl.api.AlertPublisher)2 AlertPublisherContext (co.cask.cdap.etl.api.AlertPublisherContext)2 StageMetrics (co.cask.cdap.etl.api.StageMetrics)2 DefaultAlertPublisherContext (co.cask.cdap.etl.common.DefaultAlertPublisherContext)2 DefaultStageMetrics (co.cask.cdap.etl.common.DefaultStageMetrics)2 NoopStageStatisticsCollector (co.cask.cdap.etl.common.NoopStageStatisticsCollector)2 TrackedIterator (co.cask.cdap.etl.common.TrackedIterator)2