Search in sources :

Example 1 with SparkStageStatisticsCollector

use of co.cask.cdap.etl.spark.SparkStageStatisticsCollector in project cdap by caskdata.

the class BatchSparkPipelineDriver method updateWorkflowToken.

private void updateWorkflowToken(WorkflowToken token, Map<String, StageStatisticsCollector> collectors) {
    for (Map.Entry<String, StageStatisticsCollector> entry : collectors.entrySet()) {
        SparkStageStatisticsCollector collector = (SparkStageStatisticsCollector) entry.getValue();
        String keyPrefix = Constants.StageStatistics.PREFIX + "." + entry.getKey() + ".";
        String inputRecordKey = keyPrefix + Constants.StageStatistics.INPUT_RECORDS;
        token.put(inputRecordKey, String.valueOf(collector.getInputRecordCount()));
        String outputRecordKey = keyPrefix + Constants.StageStatistics.OUTPUT_RECORDS;
        token.put(outputRecordKey, String.valueOf(collector.getOutputRecordCount()));
        String errorRecordKey = keyPrefix + Constants.StageStatistics.ERROR_RECORDS;
        token.put(errorRecordKey, String.valueOf(collector.getErrorRecordCount()));
    }
}
Also used : SparkStageStatisticsCollector(co.cask.cdap.etl.spark.SparkStageStatisticsCollector) StageStatisticsCollector(co.cask.cdap.etl.common.StageStatisticsCollector) SparkStageStatisticsCollector(co.cask.cdap.etl.spark.SparkStageStatisticsCollector) HashMap(java.util.HashMap) Map(java.util.Map)

Example 2 with SparkStageStatisticsCollector

use of co.cask.cdap.etl.spark.SparkStageStatisticsCollector in project cdap by caskdata.

the class BatchSparkPipelineDriver method run.

@Override
public void run(DatasetContext context) throws Exception {
    BatchPhaseSpec phaseSpec = GSON.fromJson(sec.getSpecification().getProperty(Constants.PIPELINEID), BatchPhaseSpec.class);
    Path configFile = sec.getLocalizationContext().getLocalFile("HydratorSpark.config").toPath();
    try (BufferedReader reader = Files.newBufferedReader(configFile, StandardCharsets.UTF_8)) {
        String object = reader.readLine();
        SparkBatchSourceSinkFactoryInfo sourceSinkInfo = GSON.fromJson(object, SparkBatchSourceSinkFactoryInfo.class);
        sourceFactory = sourceSinkInfo.getSparkBatchSourceFactory();
        sinkFactory = sourceSinkInfo.getSparkBatchSinkFactory();
        stagePartitions = sourceSinkInfo.getStagePartitions();
    }
    datasetContext = context;
    numOfRecordsPreview = phaseSpec.getNumOfRecordsPreview();
    PipelinePluginContext pluginContext = new PipelinePluginContext(sec.getPluginContext(), sec.getMetrics(), phaseSpec.isStageLoggingEnabled(), phaseSpec.isProcessTimingEnabled());
    Map<String, StageStatisticsCollector> collectors = new HashMap<>();
    if (phaseSpec.pipelineContainsCondition()) {
        Iterator<StageSpec> iterator = phaseSpec.getPhase().iterator();
        while (iterator.hasNext()) {
            StageSpec spec = iterator.next();
            collectors.put(spec.getName(), new SparkStageStatisticsCollector(jsc));
        }
    }
    try {
        PipelinePluginInstantiator pluginInstantiator = new PipelinePluginInstantiator(pluginContext, sec.getMetrics(), phaseSpec, new SingleConnectorFactory());
        runPipeline(phaseSpec.getPhase(), BatchSource.PLUGIN_TYPE, sec, stagePartitions, pluginInstantiator, collectors);
    } finally {
        updateWorkflowToken(sec.getWorkflowToken(), collectors);
    }
}
Also used : Path(java.nio.file.Path) HashMap(java.util.HashMap) SingleConnectorFactory(co.cask.cdap.etl.batch.connector.SingleConnectorFactory) SparkStageStatisticsCollector(co.cask.cdap.etl.spark.SparkStageStatisticsCollector) SparkStageStatisticsCollector(co.cask.cdap.etl.spark.SparkStageStatisticsCollector) StageStatisticsCollector(co.cask.cdap.etl.common.StageStatisticsCollector) StageSpec(co.cask.cdap.etl.spec.StageSpec) BufferedReader(java.io.BufferedReader) BatchPhaseSpec(co.cask.cdap.etl.batch.BatchPhaseSpec) PipelinePluginInstantiator(co.cask.cdap.etl.batch.PipelinePluginInstantiator) PipelinePluginContext(co.cask.cdap.etl.common.plugin.PipelinePluginContext)

Aggregations

StageStatisticsCollector (co.cask.cdap.etl.common.StageStatisticsCollector)2 SparkStageStatisticsCollector (co.cask.cdap.etl.spark.SparkStageStatisticsCollector)2 HashMap (java.util.HashMap)2 BatchPhaseSpec (co.cask.cdap.etl.batch.BatchPhaseSpec)1 PipelinePluginInstantiator (co.cask.cdap.etl.batch.PipelinePluginInstantiator)1 SingleConnectorFactory (co.cask.cdap.etl.batch.connector.SingleConnectorFactory)1 PipelinePluginContext (co.cask.cdap.etl.common.plugin.PipelinePluginContext)1 StageSpec (co.cask.cdap.etl.spec.StageSpec)1 BufferedReader (java.io.BufferedReader)1 Path (java.nio.file.Path)1 Map (java.util.Map)1