Search in sources :

Example 26 with TxRunnable

use of co.cask.cdap.api.TxRunnable in project cdap by caskdata.

the class SparkStreamingPipelineDriver method run.

@Override
public void run(final JavaSparkExecutionContext sec) throws Exception {
    final DataStreamsPipelineSpec pipelineSpec = GSON.fromJson(sec.getSpecification().getProperty(Constants.PIPELINEID), DataStreamsPipelineSpec.class);
    PipelinePhase.Builder phaseBuilder = PipelinePhase.builder(SUPPORTED_PLUGIN_TYPES).addConnections(pipelineSpec.getConnections());
    for (StageSpec stageSpec : pipelineSpec.getStages()) {
        phaseBuilder.addStage(StageInfo.builder(stageSpec.getName(), stageSpec.getPlugin().getType()).addInputs(stageSpec.getInputs()).addOutputs(stageSpec.getOutputs()).addInputSchemas(stageSpec.getInputSchemas()).setOutputSchema(stageSpec.getOutputSchema()).setErrorSchema(stageSpec.getErrorSchema()).setStageLoggingEnabled(pipelineSpec.isStageLoggingEnabled()).setProcessTimingEnabled(pipelineSpec.isProcessTimingEnabled()).build());
    }
    final PipelinePhase pipelinePhase = phaseBuilder.build();
    boolean checkpointsDisabled = pipelineSpec.isCheckpointsDisabled();
    String checkpointDir = null;
    if (!checkpointsDisabled) {
        // Get the location of the checkpoint directory.
        String pipelineName = sec.getApplicationSpecification().getName();
        String relativeCheckpointDir = pipelineSpec.getCheckpointDirectory();
        // there isn't any way to instantiate the fileset except in a TxRunnable, so need to use a reference.
        final AtomicReference<Location> checkpointBaseRef = new AtomicReference<>();
        Transactionals.execute(sec, new TxRunnable() {

            @Override
            public void run(DatasetContext context) throws Exception {
                FileSet checkpointFileSet = context.getDataset(DataStreamsApp.CHECKPOINT_FILESET);
                checkpointBaseRef.set(checkpointFileSet.getBaseLocation());
            }
        }, Exception.class);
        Location pipelineCheckpointDir = checkpointBaseRef.get().append(pipelineName).append(relativeCheckpointDir);
        checkpointDir = pipelineCheckpointDir.toURI().toString();
    }
    JavaStreamingContext jssc = run(pipelineSpec, pipelinePhase, sec, checkpointDir);
    jssc.start();
    boolean stopped = false;
    try {
        // most programs will just keep running forever.
        // however, when CDAP stops the program, we get an interrupted exception.
        // at that point, we need to call stop on jssc, otherwise the program will hang and never stop.
        stopped = jssc.awaitTerminationOrTimeout(Long.MAX_VALUE);
    } finally {
        if (!stopped) {
            jssc.stop(true, pipelineSpec.isStopGracefully());
        }
    }
}
Also used : FileSet(co.cask.cdap.api.dataset.lib.FileSet) AtomicReference(java.util.concurrent.atomic.AtomicReference) JavaStreamingContext(org.apache.spark.streaming.api.java.JavaStreamingContext) PipelinePhase(co.cask.cdap.etl.common.PipelinePhase) TxRunnable(co.cask.cdap.api.TxRunnable) StageSpec(co.cask.cdap.etl.spec.StageSpec) DatasetContext(co.cask.cdap.api.data.DatasetContext) Location(org.apache.twill.filesystem.Location)

Example 27 with TxRunnable

use of co.cask.cdap.api.TxRunnable in project cdap by caskdata.

the class PartitionedFileSetDataset method fixPartitions.

/**
   * This method can bring a partitioned file set in sync with explore. It scans the partition table and adds
   * every partition to explore. It will start multiple transactions, processing a batch of partitions in each
   * transaction. Optionally, it can disable and re-enable explore first, that is, drop and recreate the Hive table.
   * @param transactional the Transactional for executing transactions
   * @param datasetName the name of the dataset to fix
   * @param doDisable whether to disable and re-enable explore first
   * @param partitionsPerTx how many partitions to process per transaction
   * @param verbose whether to log verbosely. If true, this will log a message for every partition; otherwise it
   *                will only log a report of how many partitions were added / could not be added.
   */
@Beta
@SuppressWarnings("unused")
public static void fixPartitions(Transactional transactional, final String datasetName, boolean doDisable, final int partitionsPerTx, final boolean verbose) {
    if (doDisable) {
        try {
            transactional.execute(new TxRunnable() {

                @Override
                public void run(co.cask.cdap.api.data.DatasetContext context) throws Exception {
                    PartitionedFileSetDataset pfs = context.getDataset(datasetName);
                    pfs.disableExplore();
                    // truncating = true, because this is like truncating
                    pfs.enableExplore(true);
                }
            });
        } catch (TransactionFailureException e) {
            throw new DataSetException("Unable to disable and enable Explore", e.getCause());
        } catch (RuntimeException e) {
            if (e.getCause() instanceof TransactionFailureException) {
                throw new DataSetException("Unable to disable and enable Explore", e.getCause().getCause());
            }
            throw e;
        }
    }
    final AtomicReference<PartitionKey> startKey = new AtomicReference<>();
    final AtomicLong errorCount = new AtomicLong(0L);
    final AtomicLong successCount = new AtomicLong(0L);
    do {
        try {
            transactional.execute(new TxRunnable() {

                @Override
                public void run(co.cask.cdap.api.data.DatasetContext context) throws Exception {
                    final PartitionedFileSetDataset pfs = context.getDataset(datasetName);
                    // compute start row for the scan, reset remembered start key to null
                    byte[] startRow = startKey.get() == null ? null : generateRowKey(startKey.get(), pfs.getPartitioning());
                    startKey.set(null);
                    PartitionConsumer consumer = new PartitionConsumer() {

                        int count = 0;

                        @Override
                        public void consume(PartitionKey key, String path, @Nullable PartitionMetadata metadata) {
                            if (count >= partitionsPerTx) {
                                // reached the limit: remember this key as the start for the next round
                                startKey.set(key);
                                return;
                            }
                            try {
                                pfs.addPartitionToExplore(key, path);
                                successCount.incrementAndGet();
                                if (verbose) {
                                    LOG.info("Added partition {} with path {}", key, path);
                                }
                            } catch (DataSetException e) {
                                errorCount.incrementAndGet();
                                if (verbose) {
                                    LOG.warn(e.getMessage(), e);
                                }
                            }
                            count++;
                        }
                    };
                    pfs.getPartitions(null, consumer, false, startRow, null, partitionsPerTx + 1);
                }
            });
        } catch (TransactionConflictException e) {
            throw new DataSetException("Transaction conflict while reading partitions. This should never happen. " + "Make sure that no other programs are using this dataset at the same time.");
        } catch (TransactionFailureException e) {
            throw new DataSetException("Transaction failure: " + e.getMessage(), e.getCause());
        } catch (RuntimeException e) {
            // this looks like duplication but is needed in case this is run from a worker: see CDAP-6837
            if (e.getCause() instanceof TransactionConflictException) {
                throw new DataSetException("Transaction conflict while reading partitions. This should never happen. " + "Make sure that no other programs are using this dataset at the same time.");
            } else if (e.getCause() instanceof TransactionFailureException) {
                throw new DataSetException("Transaction failure: " + e.getMessage(), e.getCause().getCause());
            } else {
                throw e;
            }
        }
    } while (// if it is null, then we consumed less than the limit in this round -> done
    startKey.get() != null);
    LOG.info("Added {} partitions, failed to add {} partitions.", successCount.get(), errorCount.get());
}
Also used : PartitionMetadata(co.cask.cdap.api.dataset.lib.PartitionMetadata) TransactionConflictException(org.apache.tephra.TransactionConflictException) AtomicReference(java.util.concurrent.atomic.AtomicReference) TransactionFailureException(org.apache.tephra.TransactionFailureException) PartitionNotFoundException(co.cask.cdap.api.dataset.PartitionNotFoundException) TransactionConflictException(org.apache.tephra.TransactionConflictException) IOException(java.io.IOException) DataSetException(co.cask.cdap.api.dataset.DataSetException) TransactionFailureException(org.apache.tephra.TransactionFailureException) AtomicLong(java.util.concurrent.atomic.AtomicLong) DataSetException(co.cask.cdap.api.dataset.DataSetException) TxRunnable(co.cask.cdap.api.TxRunnable) PartitionKey(co.cask.cdap.api.dataset.lib.PartitionKey) PartitionMetadata(co.cask.cdap.api.dataset.lib.PartitionMetadata) Beta(co.cask.cdap.api.annotation.Beta)

Example 28 with TxRunnable

use of co.cask.cdap.api.TxRunnable in project cdap by caskdata.

the class AppWithCustomTx method attemptNestedTransaction.

/**
   * Attempt to nest transactions. we expect this to fail, and if it does, we write the value "failed"
   * to the table, for the test case to validate.
   */
static void attemptNestedTransaction(Transactional txnl, final String row, final String key) {
    try {
        txnl.execute(new TxRunnable() {

            @Override
            public void run(DatasetContext ctext) throws Exception {
                recordTransaction(ctext, row, key);
            }
        });
        LOG.error("Nested transaction should not have succeeded for {}:{}", row, key);
    } catch (TransactionFailureException e) {
        // expected: starting nested transaction should fail
        LOG.info("Nested transaction failed as expected for {}:{}", row, key);
    } catch (RuntimeException e) {
        // TODO (CDAP-6837): this is needed because worker's execute() propagates the tx failure as a runtime exception
        if (e.getCause() instanceof TransactionFailureException) {
            // expected: starting nested transaction should fail
            LOG.info("Nested transaction failed as expected for {}:{}", row, key);
        } else {
            throw e;
        }
    }
    // we know that the transactional is a program context and hence implement DatasetContext
    TransactionCapturingTable capture = ((DatasetContext) txnl).getDataset(CAPTURE);
    capture.getTable().put(new Put(row, key, FAILED));
}
Also used : TransactionFailureException(org.apache.tephra.TransactionFailureException) TxRunnable(co.cask.cdap.api.TxRunnable) DatasetContext(co.cask.cdap.api.data.DatasetContext) TransactionFailureException(org.apache.tephra.TransactionFailureException) IOException(java.io.IOException) DataSetException(co.cask.cdap.api.dataset.DataSetException) Put(co.cask.cdap.api.dataset.table.Put)

Example 29 with TxRunnable

use of co.cask.cdap.api.TxRunnable in project cdap by caskdata.

the class CharCountProgram method run.

@Override
public void run(final JavaSparkExecutionContext sec) throws Exception {
    JavaSparkContext sc = new JavaSparkContext();
    // Verify the codec is being set
    Preconditions.checkArgument("org.apache.spark.io.LZFCompressionCodec".equals(sc.getConf().get("spark.io.compression.codec")));
    // read the dataset
    JavaPairRDD<byte[], String> inputData = sec.fromDataset("keys");
    // create a new RDD with the same key but with a new value which is the length of the string
    final JavaPairRDD<byte[], byte[]> stringLengths = inputData.mapToPair(new PairFunction<Tuple2<byte[], String>, byte[], byte[]>() {

        @Override
        public Tuple2<byte[], byte[]> call(Tuple2<byte[], String> stringTuple2) throws Exception {
            return new Tuple2<>(stringTuple2._1(), Bytes.toBytes(stringTuple2._2().length()));
        }
    });
    // write a total count to a table (that emits a metric we can validate in the test case)
    sec.execute(new TxRunnable() {

        @Override
        public void run(DatasetContext context) throws Exception {
            long count = stringLengths.count();
            Table totals = context.getDataset("totals");
            totals.increment(new Increment("total").add("total", count));
            // write the character count to dataset
            sec.saveAsDataset(stringLengths, "count");
        }
    });
}
Also used : Table(co.cask.cdap.api.dataset.table.Table) Tuple2(scala.Tuple2) TxRunnable(co.cask.cdap.api.TxRunnable) Increment(co.cask.cdap.api.dataset.table.Increment) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) DatasetContext(co.cask.cdap.api.data.DatasetContext)

Example 30 with TxRunnable

use of co.cask.cdap.api.TxRunnable in project cdap by caskdata.

the class SparkCSVToSpaceProgram method run.

@Override
public void run(final JavaSparkExecutionContext sec) throws Exception {
    JavaSparkContext jsc = new JavaSparkContext();
    Map<String, String> fileSetArgs = new HashMap<>();
    final Metrics metrics = sec.getMetrics();
    FileSetArguments.addInputPath(fileSetArgs, sec.getRuntimeArguments().get("input.path"));
    JavaPairRDD<LongWritable, Text> input = sec.fromDataset(WorkflowAppWithLocalDatasets.CSV_FILESET_DATASET, fileSetArgs);
    final List<String> converted = input.values().map(new Function<Text, String>() {

        @Override
        public String call(Text input) throws Exception {
            String line = input.toString();
            metrics.count("num.lines", 1);
            return line.replaceAll(",", " ");
        }
    }).collect();
    sec.execute(new TxRunnable() {

        @Override
        public void run(DatasetContext context) throws Exception {
            Map<String, String> args = sec.getRuntimeArguments();
            String outputPath = args.get("output.path");
            Map<String, String> fileSetArgs = new HashMap<>();
            FileSetArguments.setOutputPath(fileSetArgs, outputPath);
            FileSet fileSet = context.getDataset(WorkflowAppWithLocalDatasets.CSV_FILESET_DATASET, fileSetArgs);
            try (PrintWriter writer = new PrintWriter(fileSet.getOutputLocation().getOutputStream())) {
                for (String line : converted) {
                    writer.write(line);
                    writer.println();
                }
            }
        }
    });
}
Also used : FileSet(co.cask.cdap.api.dataset.lib.FileSet) HashMap(java.util.HashMap) Text(org.apache.hadoop.io.Text) Function(org.apache.spark.api.java.function.Function) Metrics(co.cask.cdap.api.metrics.Metrics) TxRunnable(co.cask.cdap.api.TxRunnable) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) LongWritable(org.apache.hadoop.io.LongWritable) DatasetContext(co.cask.cdap.api.data.DatasetContext) HashMap(java.util.HashMap) Map(java.util.Map) PrintWriter(java.io.PrintWriter)

Aggregations

TxRunnable (co.cask.cdap.api.TxRunnable)38 DatasetContext (co.cask.cdap.api.data.DatasetContext)37 IOException (java.io.IOException)18 TransactionFailureException (org.apache.tephra.TransactionFailureException)17 TransactionConflictException (org.apache.tephra.TransactionConflictException)11 DatasetManagementException (co.cask.cdap.api.dataset.DatasetManagementException)10 TransactionControl (co.cask.cdap.api.annotation.TransactionControl)8 Table (co.cask.cdap.api.dataset.table.Table)7 ApplicationNotFoundException (co.cask.cdap.common.ApplicationNotFoundException)6 ProgramNotFoundException (co.cask.cdap.common.ProgramNotFoundException)6 NoSuchElementException (java.util.NoSuchElementException)5 AtomicReference (java.util.concurrent.atomic.AtomicReference)5 TransactionNotInProgressException (org.apache.tephra.TransactionNotInProgressException)5 ProgramLifecycle (co.cask.cdap.api.ProgramLifecycle)4 KeyValueTable (co.cask.cdap.api.dataset.lib.KeyValueTable)4 Put (co.cask.cdap.api.dataset.table.Put)3 SparkExecutionPluginContext (co.cask.cdap.etl.api.batch.SparkExecutionPluginContext)3 ApplicationSpecification (co.cask.cdap.api.app.ApplicationSpecification)2 DataSetException (co.cask.cdap.api.dataset.DataSetException)2 CloseableIterator (co.cask.cdap.api.dataset.lib.CloseableIterator)2