Search in sources :

Example 1 with PartitionAlreadyExistsException

use of co.cask.cdap.api.dataset.lib.PartitionAlreadyExistsException in project cdap by caskdata.

the class PartitionedFileSetDataset method assertNotExists.

// Throws PartitionAlreadyExistsException if the partition key already exists.
// Otherwise, returns the rowkey corresponding to the PartitionKey.
@ReadOnly
byte[] assertNotExists(PartitionKey key, boolean supportNonTransactional) {
    byte[] rowKey = generateRowKey(key, partitioning);
    if (tx == null && supportNonTransactional) {
        if (LOG.isWarnEnabled()) {
            StringBuilder sb = new StringBuilder();
            for (StackTraceElement stackTraceElement : Thread.currentThread().getStackTrace()) {
                sb.append("\n\tat ").append(stackTraceElement.toString());
            }
            SAMPLING_LOG.warn("Operation should be performed within a transaction. " + "This operation may require a transaction in the future. {}", sb);
        }
        // to handle backwards compatibility (user might have called PartitionedFileSet#getPartitionOutput outside
        // of a transaction), we can't check partition existence via the partitionsTable. As an fallback approach,
        // check the filesystem.
        Location partitionLocation = files.getLocation(getOutputPath(key));
        if (exists(partitionLocation)) {
            throw new DataSetException(String.format("Location %s for partition key %s already exists: ", partitionLocation, key));
        }
    } else {
        Row row = partitionsTable.get(rowKey);
        if (!row.isEmpty()) {
            throw new PartitionAlreadyExistsException(getName(), key);
        }
    }
    return rowKey;
}
Also used : DataSetException(co.cask.cdap.api.dataset.DataSetException) Row(co.cask.cdap.api.dataset.table.Row) PartitionAlreadyExistsException(co.cask.cdap.api.dataset.lib.PartitionAlreadyExistsException) Location(org.apache.twill.filesystem.Location) ReadOnly(co.cask.cdap.api.annotation.ReadOnly)

Example 2 with PartitionAlreadyExistsException

use of co.cask.cdap.api.dataset.lib.PartitionAlreadyExistsException in project cdap by caskdata.

the class PartitionedFileSetDataset method addPartition.

public void addPartition(PartitionKey key, String path, Map<String, String> metadata, boolean filesCreated, boolean allowAppend) {
    byte[] rowKey = generateRowKey(key, partitioning);
    Row row = partitionsTable.get(rowKey);
    boolean appending = !row.isEmpty();
    if (appending && !allowAppend) {
        throw new PartitionAlreadyExistsException(getName(), key);
    }
    if (appending) {
        // this can happen if user originally created the partition with a custom relative path
        String existingPath = Bytes.toString(row.get(RELATIVE_PATH));
        if (!path.equals(existingPath)) {
            throw new DataSetException(String.format("Attempting to append to Dataset '%s', to partition '%s' with a " + "different path. Original path: '%s'. New path: '%s'", getName(), key.toString(), existingPath, path));
        }
    }
    LOG.debug("{} partition with key {} and path {} to dataset {}", appending ? "Appending to" : "Creating", key, path, getName());
    AddPartitionOperation operation = new AddPartitionOperation(key, path, filesCreated);
    operationsInThisTx.add(operation);
    Put put = new Put(rowKey);
    byte[] nowInMillis = Bytes.toBytes(System.currentTimeMillis());
    if (!appending) {
        put.add(RELATIVE_PATH, Bytes.toBytes(path));
        put.add(CREATION_TIME_COL, nowInMillis);
    }
    put.add(LAST_MODIFICATION_TIME_COL, nowInMillis);
    // we allow updates, because an update will only happen if its an append
    addMetadataToPut(row, metadata, put, true);
    // index each row by its transaction's write pointer
    put.add(WRITE_PTR_COL, tx.getWritePointer());
    partitionsTable.put(put);
    if (!appending) {
        addPartitionToExplore(key, path);
        operation.setExplorePartitionCreated();
    }
}
Also used : DataSetException(co.cask.cdap.api.dataset.DataSetException) Row(co.cask.cdap.api.dataset.table.Row) PartitionAlreadyExistsException(co.cask.cdap.api.dataset.lib.PartitionAlreadyExistsException) Put(co.cask.cdap.api.dataset.table.Put)

Example 3 with PartitionAlreadyExistsException

use of co.cask.cdap.api.dataset.lib.PartitionAlreadyExistsException in project cdap by caskdata.

the class PartitionedFileSetTest method testRollbackOfPartitionCreateWhereItAlreadyExisted.

@Test
public void testRollbackOfPartitionCreateWhereItAlreadyExisted() throws Exception {
    PartitionedFileSet pfs = dsFrameworkUtil.getInstance(pfsInstance);
    TransactionContext txContext = new TransactionContext(txClient, (TransactionAware) pfs);
    txContext.start();
    Assert.assertNull(pfs.getPartition(PARTITION_KEY));
    Location file1Location = createPartition(pfs, PARTITION_KEY, "file1");
    Assert.assertNotNull(pfs.getPartition(PARTITION_KEY));
    txContext.finish();
    // the file should exist because the transaction completed successfully
    Assert.assertTrue(file1Location.exists());
    // if you attempt to add a partition X, and it already existed, the transaction rollback should not remove
    // the files of the original, existing partition X.
    txContext.start();
    Assert.assertNotNull(pfs.getPartition(PARTITION_KEY));
    try {
        // PartitionedFileSet#getPartitionOutput should fail
        createPartition(pfs, PARTITION_KEY, "file2");
        Assert.fail("Expected PartitionAlreadyExistsException");
    } catch (PartitionAlreadyExistsException expected) {
    }
    // because of the above failure, we want to abort and rollback the transaction
    txContext.abort();
    // the file should still exist because the aborted transaction should've failed before even needing to rollback
    // the partition's files
    Assert.assertTrue(file1Location.exists());
    // file2 shouldn't exist
    txContext.start();
    Assert.assertFalse(pfs.getPartition(PARTITION_KEY).getLocation().append("file2").exists());
    txContext.finish();
}
Also used : TransactionContext(org.apache.tephra.TransactionContext) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) PartitionAlreadyExistsException(co.cask.cdap.api.dataset.lib.PartitionAlreadyExistsException) Location(org.apache.twill.filesystem.Location) Test(org.junit.Test)

Aggregations

PartitionAlreadyExistsException (co.cask.cdap.api.dataset.lib.PartitionAlreadyExistsException)3 DataSetException (co.cask.cdap.api.dataset.DataSetException)2 Row (co.cask.cdap.api.dataset.table.Row)2 Location (org.apache.twill.filesystem.Location)2 ReadOnly (co.cask.cdap.api.annotation.ReadOnly)1 PartitionedFileSet (co.cask.cdap.api.dataset.lib.PartitionedFileSet)1 Put (co.cask.cdap.api.dataset.table.Put)1 TransactionContext (org.apache.tephra.TransactionContext)1 Test (org.junit.Test)1