Search in sources :

Example 16 with TimePartitionedFileSet

use of co.cask.cdap.api.dataset.lib.TimePartitionedFileSet in project cdap by caskdata.

the class PartitionCorrectorTestRun method testPartitionCorrector.

@Test
public void testPartitionCorrector() throws Exception {
    ApplicationManager appManager = deployApplication(PartitionExploreCorrectorTestApp.class);
    final int numPartitions = 10;
    addDatasetInstance(TimePartitionedFileSet.class.getName(), "tpfs", PartitionedFileSetProperties.builder().setExploreFormat("csv").setExploreSchema("key int, value string").setEnableExploreOnCreate(true).build());
    DataSetManager<TimePartitionedFileSet> tpfsManager = getDataset("tpfs");
    Date date = DATE_FORMAT.parse("6/4/12 10:00 am");
    long baseTime = date.getTime();
    for (int i = 0; i < numPartitions; i++) {
        createPartition(tpfsManager, baseTime + TimeUnit.MINUTES.toMillis(1) * i, i);
    }
    validateAllPartitions(numPartitions);
    dropAllPartitions();
    validateAllPartitions(0);
    // all partitions are missing. drop/recrete Hive table and add all partitions
    WorkerManager workerManager = appManager.getWorkerManager("PartitionWorker").start(ImmutableMap.of("dataset.name", "tpfs", "batch.size", "5", "verbose", "true"));
    workerManager.waitForRun(ProgramRunStatus.COMPLETED, 60, TimeUnit.SECONDS);
    validateAllPartitions(numPartitions);
    dropAllPartitions();
    for (int i = numPartitions; i < 2 * numPartitions; i++) {
        createPartition(tpfsManager, baseTime + TimeUnit.MINUTES.toMillis(1) * i, i);
    }
    validateAllPartitions(numPartitions);
    // some partitions are missing, some present keep the Hive table and try to add all partitions
    workerManager = appManager.getWorkerManager("PartitionWorker").start(ImmutableMap.of("dataset.name", "tpfs", "batch.size", "8", "verbose", "false", "disable.explore", "false"));
    workerManager.waitForRuns(ProgramRunStatus.COMPLETED, 2, 60, TimeUnit.SECONDS);
    validateAllPartitions(2 * numPartitions);
}
Also used : WorkerManager(co.cask.cdap.test.WorkerManager) ApplicationManager(co.cask.cdap.test.ApplicationManager) TimePartitionedFileSet(co.cask.cdap.api.dataset.lib.TimePartitionedFileSet) Date(java.util.Date) Test(org.junit.Test)

Aggregations

TimePartitionedFileSet (co.cask.cdap.api.dataset.lib.TimePartitionedFileSet)16 TransactionAware (org.apache.tephra.TransactionAware)9 Test (org.junit.Test)9 DataSetException (co.cask.cdap.api.dataset.DataSetException)7 TransactionExecutor (org.apache.tephra.TransactionExecutor)7 DatasetManagementException (co.cask.cdap.api.dataset.DatasetManagementException)6 IOException (java.io.IOException)6 TransactionFailureException (org.apache.tephra.TransactionFailureException)6 ImmutableMap (com.google.common.collect.ImmutableMap)4 Date (java.util.Date)4 Map (java.util.Map)4 TimePartitionDetail (co.cask.cdap.api.dataset.lib.TimePartitionDetail)2 ApplicationManager (co.cask.cdap.test.ApplicationManager)2 Location (org.apache.twill.filesystem.Location)2 Partition (co.cask.cdap.api.dataset.lib.Partition)1 PartitionDetail (co.cask.cdap.api.dataset.lib.PartitionDetail)1 PartitionFilter (co.cask.cdap.api.dataset.lib.PartitionFilter)1 PartitionKey (co.cask.cdap.api.dataset.lib.PartitionKey)1 PartitionOutput (co.cask.cdap.api.dataset.lib.PartitionOutput)1 TimePartitionOutput (co.cask.cdap.api.dataset.lib.TimePartitionOutput)1