Search in sources :

Example 21 with ApplicationWithPrograms

use of io.cdap.cdap.internal.app.deploy.pipeline.ApplicationWithPrograms in project cdap by caskdata.

the class MapReduceProgramRunnerTest method testFailure.

// TODO: this tests failure in Map tasks. We also need to test: failure in Reduce task, kill of a job by user.
private void testFailure(boolean frequentFlushing) throws Exception {
    // We want to verify that when mapreduce job fails:
    // * things written in initialize() remains and visible to others
    // * things written in tasks not visible to others TODO AAA: do invalidate
    // * things written in onfinish() remains and visible to others
    // NOTE: the code of this test is similar to testTimeSeriesRecordsCount() test. We put some "bad data" intentionally
    // here to be recognized by map tasks as a message to emulate failure
    final ApplicationWithPrograms app = deployApp(AppWithMapReduce.class);
    // we need to start a tx context and do a "get" on all datasets so that they are in datasetCache
    datasetCache.newTransactionContext();
    final TimeseriesTable table = datasetCache.getDataset("timeSeries");
    final KeyValueTable beforeSubmitTable = datasetCache.getDataset("beforeSubmit");
    final KeyValueTable onFinishTable = datasetCache.getDataset("onFinish");
    final Table counters = datasetCache.getDataset("counters");
    final Table countersFromContext = datasetCache.getDataset("countersFromContext");
    // 1) fill test data
    fillTestInputData(txExecutorFactory, table, true);
    // 2) run job
    final long start = System.currentTimeMillis();
    runProgram(app, AppWithMapReduce.AggregateTimeseriesByTag.class, frequentFlushing, false);
    final long stop = System.currentTimeMillis();
    // 3) verify results
    Transactions.createTransactionExecutor(txExecutorFactory, datasetCache.getTransactionAwares()).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() {
            // data should be rolled back todo: test that partially written is rolled back too
            Assert.assertFalse(table.read(AggregateMetricsByTag.BY_TAGS, start, stop).hasNext());
            // but written beforeSubmit and onFinish is available to others
            Assert.assertArrayEquals(Bytes.toBytes("beforeSubmit:done"), beforeSubmitTable.read(Bytes.toBytes("beforeSubmit")));
            Assert.assertArrayEquals(Bytes.toBytes("onFinish:done"), onFinishTable.read(Bytes.toBytes("onFinish")));
            Assert.assertEquals(0, counters.get(new Get("mapper")).getLong("records", 0));
            Assert.assertEquals(0, counters.get(new Get("reducer")).getLong("records", 0));
            Assert.assertEquals(0, countersFromContext.get(new Get("mapper")).getLong("records", 0));
            Assert.assertEquals(0, countersFromContext.get(new Get("reducer")).getLong("records", 0));
        }
    });
    datasetCache.dismissTransactionContext();
}
Also used : Table(io.cdap.cdap.api.dataset.table.Table) KeyValueTable(io.cdap.cdap.api.dataset.lib.KeyValueTable) TimeseriesTable(io.cdap.cdap.api.dataset.lib.TimeseriesTable) ApplicationWithPrograms(io.cdap.cdap.internal.app.deploy.pipeline.ApplicationWithPrograms) KeyValueTable(io.cdap.cdap.api.dataset.lib.KeyValueTable) Get(io.cdap.cdap.api.dataset.table.Get) TransactionExecutor(org.apache.tephra.TransactionExecutor) TimeseriesTable(io.cdap.cdap.api.dataset.lib.TimeseriesTable)

Example 22 with ApplicationWithPrograms

use of io.cdap.cdap.internal.app.deploy.pipeline.ApplicationWithPrograms in project cdap by caskdata.

the class MapReduceProgramRunnerTest method testMapReduceDriverResources.

@Test
public void testMapReduceDriverResources() throws Exception {
    final ApplicationWithPrograms app = deployApp(AppWithMapReduce.class);
    MapReduceSpecification mrSpec = app.getSpecification().getMapReduce().get(AppWithMapReduce.ClassicWordCount.class.getSimpleName());
    Assert.assertEquals(AppWithMapReduce.ClassicWordCount.MEMORY_MB, mrSpec.getDriverResources().getMemoryMB());
}
Also used : ApplicationWithPrograms(io.cdap.cdap.internal.app.deploy.pipeline.ApplicationWithPrograms) MapReduceSpecification(io.cdap.cdap.api.mapreduce.MapReduceSpecification) Test(org.junit.Test)

Example 23 with ApplicationWithPrograms

use of io.cdap.cdap.internal.app.deploy.pipeline.ApplicationWithPrograms in project cdap by caskdata.

the class MapReduceProgramRunnerTest method testMapReduceMetricsControl.

@Test
public void testMapReduceMetricsControl() throws Exception {
    final ApplicationWithPrograms app = deployApp(Id.Namespace.fromEntityId(new NamespaceId("metrics_ns")), AppWithMapReduce.class);
    Map<String, String> runtimeArguments = Maps.newHashMap();
    // do not emit metrics for this app
    runtimeArguments.put("metric", "metric");
    runtimeArguments.put("startTs", "1");
    runtimeArguments.put("stopTs", "3");
    runtimeArguments.put("tag", "tag1");
    // Do not emit metrics for mapreduce
    runtimeArguments.put(SystemArguments.METRICS_ENABLED, "false");
    runProgram(app, AppWithMapReduce.AggregateTimeseriesByTag.class, new BasicArguments(runtimeArguments));
    Collection<MetricTimeSeries> metrics = getMetricTimeSeries();
    Assert.assertEquals(0, metrics.size());
    // emit metrics for mapreduce
    runtimeArguments.put(SystemArguments.METRICS_ENABLED, "true");
    runProgram(app, AppWithMapReduce.AggregateTimeseriesByTag.class, new BasicArguments(runtimeArguments));
    metrics = getMetricTimeSeries();
    Assert.assertTrue(metrics.size() > 0);
}
Also used : ApplicationWithPrograms(io.cdap.cdap.internal.app.deploy.pipeline.ApplicationWithPrograms) MetricTimeSeries(io.cdap.cdap.api.metrics.MetricTimeSeries) NamespaceId(io.cdap.cdap.proto.id.NamespaceId) BasicArguments(io.cdap.cdap.internal.app.runtime.BasicArguments) Test(org.junit.Test)

Example 24 with ApplicationWithPrograms

use of io.cdap.cdap.internal.app.deploy.pipeline.ApplicationWithPrograms in project cdap by caskdata.

the class MapReduceWithPartitionedTest method testTimePartitionedWithMR.

@Test
public void testTimePartitionedWithMR() throws Exception {
    final ApplicationWithPrograms app = deployApp(AppWithTimePartitionedFileSet.class);
    // write a value to the input table
    final Table table = datasetCache.getDataset(AppWithTimePartitionedFileSet.INPUT);
    Transactions.createTransactionExecutor(txExecutorFactory, (TransactionAware) table).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() {
            table.put(Bytes.toBytes("x"), AppWithTimePartitionedFileSet.ONLY_COLUMN, Bytes.toBytes("1"));
        }
    });
    final long time = DATE_FORMAT.parse("1/15/15 11:15 am").getTime();
    final long time5 = time + TimeUnit.MINUTES.toMillis(5);
    // run the partition writer m/r with this output partition time
    Map<String, String> runtimeArguments = Maps.newHashMap();
    Map<String, String> outputArgs = Maps.newHashMap();
    TimePartitionedFileSetArguments.setOutputPartitionTime(outputArgs, time);
    final ImmutableMap<String, String> assignedMetadata = ImmutableMap.of("region", "13", "data.source.name", "input", "data.source.type", "table");
    TimePartitionedFileSetArguments.setOutputPartitionMetadata(outputArgs, assignedMetadata);
    runtimeArguments.putAll(RuntimeArguments.addScope(Scope.DATASET, TIME_PARTITIONED, outputArgs));
    Assert.assertTrue(runProgram(app, AppWithTimePartitionedFileSet.PartitionWriter.class, new BasicArguments(runtimeArguments)));
    // this should have created a partition in the tpfs
    final TimePartitionedFileSet tpfs = datasetCache.getDataset(TIME_PARTITIONED);
    Transactions.createTransactionExecutor(txExecutorFactory, (TransactionAware) tpfs).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() {
            TimePartitionDetail partition = tpfs.getPartitionByTime(time);
            Assert.assertNotNull(partition);
            String path = partition.getRelativePath();
            Assert.assertNotNull(path);
            Assert.assertTrue(path.contains("2015-01-15/11-15"));
            Assert.assertEquals(assignedMetadata, partition.getMetadata().asMap());
        }
    });
    // delete the data in the input table and write a new row
    Transactions.createTransactionExecutor(txExecutorFactory, (TransactionAware) table).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() {
            table.delete(Bytes.toBytes("x"));
            table.put(Bytes.toBytes("y"), AppWithTimePartitionedFileSet.ONLY_COLUMN, Bytes.toBytes("2"));
        }
    });
    // now run the m/r again with a new partition time, say 5 minutes later
    TimePartitionedFileSetArguments.setOutputPartitionTime(outputArgs, time5);
    runtimeArguments.putAll(RuntimeArguments.addScope(Scope.DATASET, TIME_PARTITIONED, outputArgs));
    // make the mapreduce add the partition in destroy, to validate that this does not fail the job
    runtimeArguments.put(AppWithTimePartitionedFileSet.COMPAT_ADD_PARTITION, "true");
    Assert.assertTrue(runProgram(app, AppWithTimePartitionedFileSet.PartitionWriter.class, new BasicArguments(runtimeArguments)));
    // this should have created a partition in the tpfs
    Transactions.createTransactionExecutor(txExecutorFactory, (TransactionAware) tpfs).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() {
            Partition partition = tpfs.getPartitionByTime(time5);
            Assert.assertNotNull(partition);
            String path = partition.getRelativePath();
            Assert.assertNotNull(path);
            Assert.assertTrue(path.contains("2015-01-15/11-20"));
        }
    });
    // now run a map/reduce that reads all the partitions
    runtimeArguments = Maps.newHashMap();
    Map<String, String> inputArgs = Maps.newHashMap();
    TimePartitionedFileSetArguments.setInputStartTime(inputArgs, time - TimeUnit.MINUTES.toMillis(5));
    TimePartitionedFileSetArguments.setInputEndTime(inputArgs, time5 + TimeUnit.MINUTES.toMillis(5));
    runtimeArguments.putAll(RuntimeArguments.addScope(Scope.DATASET, TIME_PARTITIONED, inputArgs));
    runtimeArguments.put(AppWithTimePartitionedFileSet.ROW_TO_WRITE, "a");
    Assert.assertTrue(runProgram(app, AppWithTimePartitionedFileSet.PartitionReader.class, new BasicArguments(runtimeArguments)));
    // this should have read both partitions - and written both x and y to row a
    final Table output = datasetCache.getDataset(AppWithTimePartitionedFileSet.OUTPUT);
    Transactions.createTransactionExecutor(txExecutorFactory, (TransactionAware) output).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() {
            Row row = output.get(Bytes.toBytes("a"));
            Assert.assertEquals("1", row.getString("x"));
            Assert.assertEquals("2", row.getString("y"));
        }
    });
    // now run a map/reduce that reads a range of the partitions, namely the first one
    TimePartitionedFileSetArguments.setInputStartTime(inputArgs, time - TimeUnit.MINUTES.toMillis(5));
    TimePartitionedFileSetArguments.setInputEndTime(inputArgs, time + TimeUnit.MINUTES.toMillis(2));
    runtimeArguments.putAll(RuntimeArguments.addScope(Scope.DATASET, TIME_PARTITIONED, inputArgs));
    runtimeArguments.put(AppWithTimePartitionedFileSet.ROW_TO_WRITE, "b");
    Assert.assertTrue(runProgram(app, AppWithTimePartitionedFileSet.PartitionReader.class, new BasicArguments(runtimeArguments)));
    // this should have read the first partition only - and written only x to row b
    Transactions.createTransactionExecutor(txExecutorFactory, (TransactionAware) output).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() {
            Row row = output.get(Bytes.toBytes("b"));
            Assert.assertEquals("1", row.getString("x"));
            Assert.assertNull(row.get("y"));
        }
    });
    // now run a map/reduce that reads no partitions (because the range matches nothing)
    TimePartitionedFileSetArguments.setInputStartTime(inputArgs, time - TimeUnit.MINUTES.toMillis(10));
    TimePartitionedFileSetArguments.setInputEndTime(inputArgs, time - TimeUnit.MINUTES.toMillis(9));
    runtimeArguments.putAll(RuntimeArguments.addScope(Scope.DATASET, TIME_PARTITIONED, inputArgs));
    runtimeArguments.put(AppWithTimePartitionedFileSet.ROW_TO_WRITE, "n");
    Assert.assertTrue(runProgram(app, AppWithTimePartitionedFileSet.PartitionReader.class, new BasicArguments(runtimeArguments)));
    // this should have read no partitions - and written nothing to row n
    Transactions.createTransactionExecutor(txExecutorFactory, (TransactionAware) output).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() {
            Row row = output.get(Bytes.toBytes("n"));
            Assert.assertTrue(row.isEmpty());
        }
    });
}
Also used : Partition(io.cdap.cdap.api.dataset.lib.Partition) Table(io.cdap.cdap.api.dataset.table.Table) TransactionExecutor(org.apache.tephra.TransactionExecutor) ApplicationWithPrograms(io.cdap.cdap.internal.app.deploy.pipeline.ApplicationWithPrograms) TransactionAware(org.apache.tephra.TransactionAware) BasicArguments(io.cdap.cdap.internal.app.runtime.BasicArguments) TimePartitionDetail(io.cdap.cdap.api.dataset.lib.TimePartitionDetail) Row(io.cdap.cdap.api.dataset.table.Row) TimePartitionedFileSet(io.cdap.cdap.api.dataset.lib.TimePartitionedFileSet) Test(org.junit.Test)

Example 25 with ApplicationWithPrograms

use of io.cdap.cdap.internal.app.deploy.pipeline.ApplicationWithPrograms in project cdap by caskdata.

the class MapReduceWithMultipleInputsTest method testSimpleJoin.

@Test
public void testSimpleJoin() throws Exception {
    ApplicationWithPrograms app = deployApp(AppWithMapReduceUsingMultipleInputs.class);
    FileSet fileSet = datasetCache.getDataset(AppWithMapReduceUsingMultipleInputs.PURCHASES);
    Location inputFile = fileSet.getBaseLocation().append("inputFile");
    inputFile.createNew();
    try (PrintWriter writer = new PrintWriter(inputFile.getOutputStream())) {
        // the PURCHASES dataset consists of purchase records in the format: <customerId> <spend>
        writer.println("1 20");
        writer.println("1 25");
        writer.println("1 30");
        writer.println("2 5");
    }
    // write some of the purchases to the second input file set
    fileSet = datasetCache.getDataset(AppWithMapReduceUsingMultipleInputs.PURCHASES2);
    inputFile = fileSet.getBaseLocation().append("inputFile");
    inputFile.createNew();
    try (PrintWriter writer = new PrintWriter(inputFile.getOutputStream())) {
        // the PURCHASES dataset consists of purchase records in the format: <customerId> <spend>
        writer.println("2 13");
        writer.println("3 60");
    }
    FileSet fileSet2 = datasetCache.getDataset(AppWithMapReduceUsingMultipleInputs.CUSTOMERS);
    inputFile = fileSet2.getBaseLocation().append("inputFile");
    inputFile.createNew();
    // the CUSTOMERS dataset consists of records in the format: <customerId> <customerName>
    try (PrintWriter writer = new PrintWriter(inputFile.getOutputStream())) {
        writer.println("1 Bob");
        writer.println("2 Samuel");
        writer.println("3 Joe");
    }
    // Using multiple inputs, this MapReduce will join on the two above datasets to get aggregate results.
    // The records are expected to be in the form: <customerId> <customerName> <totalSpend>
    runProgram(app, AppWithMapReduceUsingMultipleInputs.ComputeSum.class, new BasicArguments());
    FileSet outputFileSet = datasetCache.getDataset(AppWithMapReduceUsingMultipleInputs.OUTPUT_DATASET);
    // will only be 1 part file, due to the small amount of data
    Location outputLocation = outputFileSet.getBaseLocation().append("output").append("part-r-00000");
    List<String> lines = CharStreams.readLines(CharStreams.newReaderSupplier(Locations.newInputSupplier(outputLocation), Charsets.UTF_8));
    Assert.assertEquals(ImmutableList.of("1 Bob 75", "2 Samuel 18", "3 Joe 60"), lines);
    // assert that the mapper was initialized and destroyed (this doesn't happen when using hadoop's MultipleOutputs).
    Assert.assertEquals("true", System.getProperty("mapper.initialized"));
    Assert.assertEquals("true", System.getProperty("mapper.destroyed"));
}
Also used : FileSet(io.cdap.cdap.api.dataset.lib.FileSet) ApplicationWithPrograms(io.cdap.cdap.internal.app.deploy.pipeline.ApplicationWithPrograms) BasicArguments(io.cdap.cdap.internal.app.runtime.BasicArguments) Location(org.apache.twill.filesystem.Location) PrintWriter(java.io.PrintWriter) Test(org.junit.Test)

Aggregations

ApplicationWithPrograms (io.cdap.cdap.internal.app.deploy.pipeline.ApplicationWithPrograms)32 Test (org.junit.Test)23 BasicArguments (io.cdap.cdap.internal.app.runtime.BasicArguments)16 TransactionExecutor (org.apache.tephra.TransactionExecutor)11 KeyValueTable (io.cdap.cdap.api.dataset.lib.KeyValueTable)10 IOException (java.io.IOException)8 File (java.io.File)7 Location (org.apache.twill.filesystem.Location)7 NamespaceId (io.cdap.cdap.proto.id.NamespaceId)6 Table (io.cdap.cdap.api.dataset.table.Table)5 ProgramDescriptor (io.cdap.cdap.app.program.ProgramDescriptor)5 ProgramController (io.cdap.cdap.app.runtime.ProgramController)5 ProgramId (io.cdap.cdap.proto.id.ProgramId)5 ImmutableMap (com.google.common.collect.ImmutableMap)4 AppDeploymentInfo (io.cdap.cdap.internal.app.deploy.pipeline.AppDeploymentInfo)4 ProgramType (io.cdap.cdap.proto.ProgramType)4 ApplicationClass (io.cdap.cdap.api.artifact.ApplicationClass)3 Id (io.cdap.cdap.common.id.Id)3 ApplicationId (io.cdap.cdap.proto.id.ApplicationId)3 ExecutionException (java.util.concurrent.ExecutionException)3