Search in sources :

Example 1 with PipelinePluginContext

use of co.cask.cdap.etl.common.plugin.PipelinePluginContext in project cdap by caskdata.

the class SparkStreamingPipelineDriver method run.

private JavaStreamingContext run(final DataStreamsPipelineSpec pipelineSpec, final PipelinePhase pipelinePhase, final JavaSparkExecutionContext sec, @Nullable final String checkpointDir) throws Exception {
    Function0<JavaStreamingContext> contextFunction = new Function0<JavaStreamingContext>() {

        @Override
        public JavaStreamingContext call() throws Exception {
            JavaStreamingContext jssc = new JavaStreamingContext(new JavaSparkContext(), Durations.milliseconds(pipelineSpec.getBatchIntervalMillis()));
            SparkStreamingPipelineRunner runner = new SparkStreamingPipelineRunner(sec, jssc, pipelineSpec, false);
            PipelinePluginContext pluginContext = new PipelinePluginContext(sec.getPluginContext(), sec.getMetrics(), pipelineSpec.isStageLoggingEnabled(), pipelineSpec.isProcessTimingEnabled());
            // Seems like they should be set at configure time instead of runtime? but that requires an API change.
            try {
                runner.runPipeline(pipelinePhase, StreamingSource.PLUGIN_TYPE, sec, new HashMap<String, Integer>(), pluginContext, new HashMap<String, StageStatisticsCollector>());
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
            if (checkpointDir != null) {
                jssc.checkpoint(checkpointDir);
            }
            return jssc;
        }
    };
    return checkpointDir == null ? contextFunction.call() : StreamingCompat.getOrCreate(checkpointDir, contextFunction);
}
Also used : Function0(org.apache.spark.api.java.function.Function0) JavaStreamingContext(org.apache.spark.streaming.api.java.JavaStreamingContext) StageStatisticsCollector(co.cask.cdap.etl.common.StageStatisticsCollector) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) PipelinePluginContext(co.cask.cdap.etl.common.plugin.PipelinePluginContext)

Example 2 with PipelinePluginContext

use of co.cask.cdap.etl.common.plugin.PipelinePluginContext in project cdap by caskdata.

the class JavaSparkMainWrapper method run.

@Override
public void run(JavaSparkExecutionContext sec) throws Exception {
    String stageName = sec.getSpecification().getProperty(ExternalSparkProgram.STAGE_NAME);
    BatchPhaseSpec batchPhaseSpec = GSON.fromJson(sec.getSpecification().getProperty(Constants.PIPELINEID), BatchPhaseSpec.class);
    PipelinePluginContext pluginContext = new SparkPipelinePluginContext(sec.getPluginContext(), sec.getMetrics(), batchPhaseSpec.isStageLoggingEnabled(), batchPhaseSpec.isProcessTimingEnabled());
    Class<?> mainClass = pluginContext.loadPluginClass(stageName);
    // if it's a CDAP JavaSparkMain, instantiate it and call the run method
    if (JavaSparkMain.class.isAssignableFrom(mainClass)) {
        MacroEvaluator macroEvaluator = new DefaultMacroEvaluator(new BasicArguments(sec), sec.getLogicalStartTime(), sec.getSecureStore(), sec.getNamespace());
        JavaSparkMain javaSparkMain = pluginContext.newPluginInstance(stageName, macroEvaluator);
        javaSparkMain.run(sec);
    } else {
        // otherwise, assume there is a 'main' method and call it
        String programArgs = getProgramArgs(sec, stageName);
        String[] args = programArgs == null ? RuntimeArguments.toPosixArray(sec.getRuntimeArguments()) : programArgs.split(" ");
        final Method mainMethod = mainClass.getMethod("main", String[].class);
        final Object[] methodArgs = new Object[1];
        methodArgs[0] = args;
        Caller caller = pluginContext.getCaller(stageName);
        caller.call(new Callable<Void>() {

            @Override
            public Void call() throws Exception {
                mainMethod.invoke(null, methodArgs);
                return null;
            }
        });
    }
}
Also used : MacroEvaluator(co.cask.cdap.api.macro.MacroEvaluator) DefaultMacroEvaluator(co.cask.cdap.etl.common.DefaultMacroEvaluator) Method(java.lang.reflect.Method) SparkPipelinePluginContext(co.cask.cdap.etl.spark.plugin.SparkPipelinePluginContext) Caller(co.cask.cdap.etl.common.plugin.Caller) DefaultMacroEvaluator(co.cask.cdap.etl.common.DefaultMacroEvaluator) BatchPhaseSpec(co.cask.cdap.etl.batch.BatchPhaseSpec) JavaSparkMain(co.cask.cdap.api.spark.JavaSparkMain) BasicArguments(co.cask.cdap.etl.common.BasicArguments) PipelinePluginContext(co.cask.cdap.etl.common.plugin.PipelinePluginContext) SparkPipelinePluginContext(co.cask.cdap.etl.spark.plugin.SparkPipelinePluginContext)

Example 3 with PipelinePluginContext

use of co.cask.cdap.etl.common.plugin.PipelinePluginContext in project cdap by caskdata.

the class PipelineCondition method apply.

@Override
public boolean apply(@Nullable WorkflowContext input) {
    if (input == null) {
        // should not happen
        throw new IllegalStateException("WorkflowContext for the Condition cannot be null.");
    }
    Map<String, String> properties = input.getConditionSpecification().getProperties();
    BatchPhaseSpec phaseSpec = GSON.fromJson(properties.get(Constants.PIPELINEID), BatchPhaseSpec.class);
    PipelinePhase phase = phaseSpec.getPhase();
    StageSpec stageSpec = phase.iterator().next();
    PluginContext pluginContext = new PipelinePluginContext(input, metrics, phaseSpec.isStageLoggingEnabled(), phaseSpec.isProcessTimingEnabled());
    MacroEvaluator macroEvaluator = new DefaultMacroEvaluator(new BasicArguments(input.getToken(), input.getRuntimeArguments()), input.getLogicalStartTime(), input, input.getNamespace());
    try {
        Condition condition = pluginContext.newPluginInstance(stageSpec.getName(), macroEvaluator);
        PipelineRuntime pipelineRuntime = new PipelineRuntime(input, metrics);
        ConditionContext conditionContext = new BasicConditionContext(input, pipelineRuntime, stageSpec);
        boolean result = condition.apply(conditionContext);
        WorkflowToken token = input.getToken();
        if (token == null) {
            throw new IllegalStateException("WorkflowToken cannot be null when Condition is executed through Workflow.");
        }
        for (Map.Entry<String, String> entry : pipelineRuntime.getArguments().getAddedArguments().entrySet()) {
            token.put(entry.getKey(), entry.getValue());
        }
        return result;
    } catch (Exception e) {
        String msg = String.format("Error executing condition '%s' in the pipeline.", stageSpec.getName());
        throw new RuntimeException(msg, e);
    }
}
Also used : Condition(co.cask.cdap.etl.api.condition.Condition) AbstractCondition(co.cask.cdap.api.workflow.AbstractCondition) MacroEvaluator(co.cask.cdap.api.macro.MacroEvaluator) DefaultMacroEvaluator(co.cask.cdap.etl.common.DefaultMacroEvaluator) PipelineRuntime(co.cask.cdap.etl.common.PipelineRuntime) PluginContext(co.cask.cdap.api.plugin.PluginContext) PipelinePluginContext(co.cask.cdap.etl.common.plugin.PipelinePluginContext) WorkflowToken(co.cask.cdap.api.workflow.WorkflowToken) ConditionContext(co.cask.cdap.etl.api.condition.ConditionContext) PipelinePhase(co.cask.cdap.etl.common.PipelinePhase) StageSpec(co.cask.cdap.etl.spec.StageSpec) DefaultMacroEvaluator(co.cask.cdap.etl.common.DefaultMacroEvaluator) BatchPhaseSpec(co.cask.cdap.etl.batch.BatchPhaseSpec) BasicArguments(co.cask.cdap.etl.common.BasicArguments) HashMap(java.util.HashMap) Map(java.util.Map) PipelinePluginContext(co.cask.cdap.etl.common.plugin.PipelinePluginContext)

Example 4 with PipelinePluginContext

use of co.cask.cdap.etl.common.plugin.PipelinePluginContext in project cdap by caskdata.

the class PipelineAction method run.

@Override
public void run() throws Exception {
    CustomActionContext context = getContext();
    Map<String, String> properties = context.getSpecification().getProperties();
    BatchPhaseSpec phaseSpec = GSON.fromJson(properties.get(Constants.PIPELINEID), BatchPhaseSpec.class);
    PipelinePhase phase = phaseSpec.getPhase();
    StageSpec stageSpec = phase.iterator().next();
    PluginContext pluginContext = new PipelinePluginContext(context, metrics, phaseSpec.isStageLoggingEnabled(), phaseSpec.isProcessTimingEnabled());
    PipelineRuntime pipelineRuntime = new PipelineRuntime(context, metrics);
    Action action = pluginContext.newPluginInstance(stageSpec.getName(), new DefaultMacroEvaluator(pipelineRuntime.getArguments(), context.getLogicalStartTime(), context, context.getNamespace()));
    ActionContext actionContext = new BasicActionContext(context, pipelineRuntime, stageSpec);
    if (!context.getDataTracer(stageSpec.getName()).isEnabled()) {
        action.run(actionContext);
    }
    WorkflowToken token = context.getWorkflowToken();
    if (token == null) {
        throw new IllegalStateException("WorkflowToken cannot be null when action is executed through Workflow.");
    }
    for (Map.Entry<String, String> entry : pipelineRuntime.getArguments().getAddedArguments().entrySet()) {
        token.put(entry.getKey(), entry.getValue());
    }
}
Also used : Action(co.cask.cdap.etl.api.action.Action) CustomAction(co.cask.cdap.api.customaction.CustomAction) AbstractCustomAction(co.cask.cdap.api.customaction.AbstractCustomAction) PipelineRuntime(co.cask.cdap.etl.common.PipelineRuntime) PipelinePluginContext(co.cask.cdap.etl.common.plugin.PipelinePluginContext) PluginContext(co.cask.cdap.api.plugin.PluginContext) WorkflowToken(co.cask.cdap.api.workflow.WorkflowToken) CustomActionContext(co.cask.cdap.api.customaction.CustomActionContext) ActionContext(co.cask.cdap.etl.api.action.ActionContext) PipelinePhase(co.cask.cdap.etl.common.PipelinePhase) StageSpec(co.cask.cdap.etl.spec.StageSpec) DefaultMacroEvaluator(co.cask.cdap.etl.common.DefaultMacroEvaluator) CustomActionContext(co.cask.cdap.api.customaction.CustomActionContext) BatchPhaseSpec(co.cask.cdap.etl.batch.BatchPhaseSpec) HashMap(java.util.HashMap) Map(java.util.Map) PipelinePluginContext(co.cask.cdap.etl.common.plugin.PipelinePluginContext)

Example 5 with PipelinePluginContext

use of co.cask.cdap.etl.common.plugin.PipelinePluginContext in project cdap by caskdata.

the class BatchSparkPipelineDriver method run.

@Override
public void run(DatasetContext context) throws Exception {
    BatchPhaseSpec phaseSpec = GSON.fromJson(sec.getSpecification().getProperty(Constants.PIPELINEID), BatchPhaseSpec.class);
    Path configFile = sec.getLocalizationContext().getLocalFile("HydratorSpark.config").toPath();
    try (BufferedReader reader = Files.newBufferedReader(configFile, StandardCharsets.UTF_8)) {
        String object = reader.readLine();
        SparkBatchSourceSinkFactoryInfo sourceSinkInfo = GSON.fromJson(object, SparkBatchSourceSinkFactoryInfo.class);
        sourceFactory = sourceSinkInfo.getSparkBatchSourceFactory();
        sinkFactory = sourceSinkInfo.getSparkBatchSinkFactory();
        stagePartitions = sourceSinkInfo.getStagePartitions();
    }
    datasetContext = context;
    numOfRecordsPreview = phaseSpec.getNumOfRecordsPreview();
    PipelinePluginContext pluginContext = new PipelinePluginContext(sec.getPluginContext(), sec.getMetrics(), phaseSpec.isStageLoggingEnabled(), phaseSpec.isProcessTimingEnabled());
    Map<String, StageStatisticsCollector> collectors = new HashMap<>();
    if (phaseSpec.pipelineContainsCondition()) {
        Iterator<StageSpec> iterator = phaseSpec.getPhase().iterator();
        while (iterator.hasNext()) {
            StageSpec spec = iterator.next();
            collectors.put(spec.getName(), new SparkStageStatisticsCollector(jsc));
        }
    }
    try {
        PipelinePluginInstantiator pluginInstantiator = new PipelinePluginInstantiator(pluginContext, sec.getMetrics(), phaseSpec, new SingleConnectorFactory());
        runPipeline(phaseSpec.getPhase(), BatchSource.PLUGIN_TYPE, sec, stagePartitions, pluginInstantiator, collectors);
    } finally {
        updateWorkflowToken(sec.getWorkflowToken(), collectors);
    }
}
Also used : Path(java.nio.file.Path) HashMap(java.util.HashMap) SingleConnectorFactory(co.cask.cdap.etl.batch.connector.SingleConnectorFactory) SparkStageStatisticsCollector(co.cask.cdap.etl.spark.SparkStageStatisticsCollector) SparkStageStatisticsCollector(co.cask.cdap.etl.spark.SparkStageStatisticsCollector) StageStatisticsCollector(co.cask.cdap.etl.common.StageStatisticsCollector) StageSpec(co.cask.cdap.etl.spec.StageSpec) BufferedReader(java.io.BufferedReader) BatchPhaseSpec(co.cask.cdap.etl.batch.BatchPhaseSpec) PipelinePluginInstantiator(co.cask.cdap.etl.batch.PipelinePluginInstantiator) PipelinePluginContext(co.cask.cdap.etl.common.plugin.PipelinePluginContext)

Aggregations

PipelinePluginContext (co.cask.cdap.etl.common.plugin.PipelinePluginContext)7 StageSpec (co.cask.cdap.etl.spec.StageSpec)5 BatchPhaseSpec (co.cask.cdap.etl.batch.BatchPhaseSpec)4 DefaultMacroEvaluator (co.cask.cdap.etl.common.DefaultMacroEvaluator)4 HashMap (java.util.HashMap)4 MacroEvaluator (co.cask.cdap.api.macro.MacroEvaluator)3 PluginContext (co.cask.cdap.api.plugin.PluginContext)3 PipelineRuntime (co.cask.cdap.etl.common.PipelineRuntime)3 Map (java.util.Map)3 WorkflowToken (co.cask.cdap.api.workflow.WorkflowToken)2 BasicArguments (co.cask.cdap.etl.common.BasicArguments)2 PipelinePhase (co.cask.cdap.etl.common.PipelinePhase)2 StageStatisticsCollector (co.cask.cdap.etl.common.StageStatisticsCollector)2 SparkPipelinePluginContext (co.cask.cdap.etl.spark.plugin.SparkPipelinePluginContext)2 TransactionPolicy (co.cask.cdap.api.annotation.TransactionPolicy)1 AbstractCustomAction (co.cask.cdap.api.customaction.AbstractCustomAction)1 CustomAction (co.cask.cdap.api.customaction.CustomAction)1 CustomActionContext (co.cask.cdap.api.customaction.CustomActionContext)1 FileSet (co.cask.cdap.api.dataset.lib.FileSet)1 TriggeringScheduleInfo (co.cask.cdap.api.schedule.TriggeringScheduleInfo)1