Search in sources :

Example 11 with Location

use of org.apache.twill.filesystem.Location in project cdap by caskdata.

the class FileStreamAdmin method create.

@Override
@Nullable
public StreamConfig create(final StreamId streamId, @Nullable final Properties props) throws Exception {
    final Properties properties = (props == null) ? new Properties() : props;
    String specifiedOwnerPrincipal = properties.containsKey(Constants.Security.PRINCIPAL) ? properties.getProperty(Constants.Security.PRINCIPAL) : null;
    if (exists(streamId)) {
        // if stream exists then make sure owner for this create request is same
        SecurityUtil.verifyOwnerPrincipal(streamId, specifiedOwnerPrincipal, ownerAdmin);
        // stream create is an idempotent operation as of now so just return null and don't do anything
        return null;
    }
    // if the stream didn't exist then add the owner information
    if (specifiedOwnerPrincipal != null) {
        ownerAdmin.add(streamId, new KerberosPrincipalId(specifiedOwnerPrincipal));
    }
    try {
        final Location streamLocation = impersonator.doAs(streamId, new Callable<Location>() {

            @Override
            public Location call() throws Exception {
                assertNamespaceHomeExists(streamId.getParent());
                Location streamLocation = getStreamLocation(streamId);
                Locations.mkdirsIfNotExists(streamLocation);
                return streamLocation;
            }
        });
        return createStream(streamId, properties, streamLocation);
    } catch (Exception e) {
        // there was a problem creating the stream so delete owner information
        // safe to call even if entry doesn't exists
        ownerAdmin.delete(streamId);
        throw e;
    }
}
Also used : StreamProperties(co.cask.cdap.proto.StreamProperties) CoordinatorStreamProperties(co.cask.cdap.data.stream.CoordinatorStreamProperties) Properties(java.util.Properties) KerberosPrincipalId(co.cask.cdap.proto.id.KerberosPrincipalId) NotificationFeedException(co.cask.cdap.notifications.feeds.NotificationFeedException) FileNotFoundException(java.io.FileNotFoundException) IOException(java.io.IOException) NotFoundException(co.cask.cdap.common.NotFoundException) StreamNotFoundException(co.cask.cdap.common.StreamNotFoundException) Location(org.apache.twill.filesystem.Location) Nullable(javax.annotation.Nullable)

Example 12 with Location

use of org.apache.twill.filesystem.Location in project cdap by caskdata.

the class FileStreamAdmin method dropAllInNamespace.

@Override
public void dropAllInNamespace(final NamespaceId namespace) throws Exception {
    Iterable<Location> locations = impersonator.doAs(namespace, new Callable<Iterable<Location>>() {

        @Override
        public Iterable<Location> call() throws Exception {
            try {
                return StreamUtils.listAllStreams(getStreamBaseLocation(namespace));
            } catch (FileNotFoundException e) {
                // If the stream base doesn't exists, nothing need to be deleted
                return ImmutableList.of();
            }
        }
    });
    for (Location location : locations) {
        StreamId streamId = namespace.stream(StreamUtils.getStreamNameFromLocation(location));
        doDrop(streamId, location);
    }
    // Also drop the state table if all the streams are deleted
    impersonator.doAs(namespace, new Callable<Void>() {

        @Override
        public Void call() throws Exception {
            stateStoreFactory.dropAllInNamespace(namespace);
            return null;
        }
    });
}
Also used : StreamId(co.cask.cdap.proto.id.StreamId) FileNotFoundException(java.io.FileNotFoundException) NotificationFeedException(co.cask.cdap.notifications.feeds.NotificationFeedException) FileNotFoundException(java.io.FileNotFoundException) IOException(java.io.IOException) NotFoundException(co.cask.cdap.common.NotFoundException) StreamNotFoundException(co.cask.cdap.common.StreamNotFoundException) Location(org.apache.twill.filesystem.Location)

Example 13 with Location

use of org.apache.twill.filesystem.Location in project cdap by caskdata.

the class FileStreamAdmin method assertNamespaceHomeExists.

private void assertNamespaceHomeExists(NamespaceId namespaceId) throws IOException {
    Location namespaceHomeLocation = Locations.getParent(getStreamBaseLocation(namespaceId));
    Preconditions.checkArgument(namespaceHomeLocation != null && namespaceHomeLocation.exists(), "Home directory %s for namespace %s not found", namespaceHomeLocation, namespaceId);
}
Also used : Location(org.apache.twill.filesystem.Location)

Example 14 with Location

use of org.apache.twill.filesystem.Location in project cdap by caskdata.

the class PartitionRollbackTestRun method testPFSRollback.

/*
   * This tests all the following cases:
   *
   *  1. addPartition(location) fails because partition already exists
   *  2. addPartition(location) fails because Hive partition already exists
   *  3. addPartition(location) succeeds but transaction fails
   *  4. getPartitionOutput() fails because partition already exists
   *  5. partitionOutput.addPartition() fails because Hive partition already exists
   *  6. partitionOutput.addPartition() succeeds but transaction fails
   *  7. mapreduce writing partition fails because location already exists
   *  8. mapreduce writing partition fails because partition already exists
   *  9. mapreduce writing partition fails because Hive partition already exists
   *  10. mapreduce writing dynamic partition fails because location already exists
   *  11. mapreduce writing dynamic partition fails because partition already exists
   *  12. mapreduce writing dynamic partition fails because Hive partition already exists
   *  13. multi-output mapreduce writing partition fails because location already exists
   *  13a. first output fails, other output must rollback 0 and 5
   *  13b. second output fails, first output must rollback 0 and 5
   *  14. multi-output mapreduce writing partition fails because partition already exists
   *  14a. first output fails, other output must rollback partition 5
   *  14b. second output fails, first output must rollback partition 5
   *  15. multi-output mapreduce writing partition fails because Hive partition already exists
   *  15a. first output fails, other output must rollback partitions 0 and 5
   *  15b. second output fails, first output must rollback partitions 0 and 5
   *
   * For all these cases, we validate that existing files and partitions are preserved, and newly
   * added files and partitions are rolled back.
   */
@Test
public void testPFSRollback() throws Exception {
    ApplicationManager appManager = deployApplication(AppWritingToPartitioned.class);
    MapReduceManager mrManager = appManager.getMapReduceManager(MAPREDUCE);
    int numRuns = 0;
    Validator pfsValidator = new Validator(PFS);
    Validator otherValidator = new Validator(OTHER);
    final UnitTestManager.UnitTestDatasetManager<PartitionedFileSet> pfsManager = pfsValidator.getPfsManager();
    final PartitionedFileSet pfs = pfsManager.get();
    final PartitionedFileSet other = otherValidator.getPfsManager().get();
    final String path3 = pfsValidator.getRelativePath3();
    // 1. addPartition(location) fails because partition already exists
    try {
        pfsManager.execute(new Runnable() {

            @Override
            public void run() {
                pfs.addPartition(KEY_1, path3);
            }
        });
        Assert.fail("Expected tx to fail because partition for number=1 already exists");
    } catch (TransactionFailureException e) {
    // expected
    }
    pfsValidator.validate();
    // 2. addPartition(location) fails because Hive partition already exists
    try {
        pfsManager.execute(new Runnable() {

            @Override
            public void run() {
                pfs.addPartition(KEY_4, path3);
            }
        });
        Assert.fail("Expected tx to fail because hive partition for number=1 already exists");
    } catch (TransactionFailureException e) {
    // expected
    }
    pfsValidator.validate();
    // 3. addPartition(location) succeeds but transaction fails
    try {
        pfsManager.execute(new Runnable() {

            @Override
            public void run() {
                pfs.addPartition(KEY_3, path3);
                throw new RuntimeException("fail the tx");
            }
        });
        Assert.fail("Expected tx to fail because it threw a runtime exception");
    } catch (TransactionFailureException e) {
    // expected
    }
    pfsValidator.validate();
    // 4. partitionOutput.getPartitionOutput() fails because partition already exists
    try {
        pfs.getPartitionOutput(KEY_1);
        Assert.fail("Expected getPartitionOutput to fail, because the partition already exists.");
    } catch (DataSetException expected) {
    }
    pfsValidator.validate();
    // 5. partitionOutput.addPartition() fails because Hive partition already exists
    final PartitionOutput output4x = pfs.getPartitionOutput(KEY_4);
    final Location location4x = output4x.getLocation();
    try (Writer writer = new OutputStreamWriter(location4x.append("file").getOutputStream())) {
        writer.write("4x,4x\n");
    }
    try {
        pfsManager.execute(new Runnable() {

            @Override
            public void run() {
                output4x.addPartition();
            }
        });
        Assert.fail("Expected tx to fail because hive partition for number=4 already exists");
    } catch (TransactionFailureException e) {
    // expected
    }
    pfsValidator.validate();
    Assert.assertFalse(location4x.exists());
    // 6. partitionOutput.addPartition() succeeds but transaction fails
    final PartitionOutput output5x = pfs.getPartitionOutput(KEY_5);
    final Location location5x = output5x.getLocation();
    try (Writer writer = new OutputStreamWriter(location5x.append("file").getOutputStream())) {
        writer.write("5x,5x\n");
    }
    try {
        pfsManager.execute(new Runnable() {

            @Override
            public void run() {
                output5x.addPartition();
                throw new RuntimeException("fail the tx");
            }
        });
        Assert.fail("Expected tx to fail because it threw a runtime exception");
    } catch (TransactionFailureException e) {
    // expected
    }
    pfsValidator.validate();
    Assert.assertFalse(location5x.exists());
    // 7. mapreduce writing partition fails because location already exists
    mrManager.start(ImmutableMap.of(PFS_OUT, "1", "input.text", "1x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    // 8. mapreduce writing partition fails because partition already exists
    mrManager.start(ImmutableMap.of(PFS_OUT, "2", "input.text", "2x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    Assert.assertFalse(pfs.getPartitionOutput(KEY_2).getLocation().exists());
    // 9. mapreduce writing partition fails because Hive partition already exists
    mrManager.start(ImmutableMap.of(PFS_OUT, "4", "input.text", "4x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    Assert.assertFalse(pfs.getPartitionOutput(KEY_4).getLocation().exists());
    // 10. mapreduce writing dynamic partition fails because location already exists
    mrManager.start(ImmutableMap.of("input.text", "3x 5x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    Assert.assertFalse(pfs.getPartitionOutput(KEY_5).getLocation().exists());
    // 11. mapreduce writing dynamic partition fails because partition already exists
    mrManager.start(ImmutableMap.of("input.text", "2x 5x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    Assert.assertFalse(pfs.getPartitionOutput(KEY_2).getLocation().exists());
    Assert.assertFalse(pfs.getPartitionOutput(KEY_5).getLocation().exists());
    // 12. mapreduce writing dynamic partition fails because Hive partition already exists
    mrManager.start(ImmutableMap.of("input.text", "0x 4x 5x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    Assert.assertFalse(pfs.getPartitionOutput(KEY_0).getLocation().exists());
    Assert.assertFalse(pfs.getPartitionOutput(KEY_4).getLocation().exists());
    Assert.assertFalse(pfs.getPartitionOutput(KEY_5).getLocation().exists());
    // 13. multi-output mapreduce writing partition fails because location already exists
    // 13a. first output fails, other output must rollback 0 and 5
    mrManager.start(ImmutableMap.of("output.datasets", BOTH, PFS_OUT, "1", "input.text", "0x 5x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    otherValidator.validate();
    Assert.assertFalse(other.getPartitionOutput(KEY_0).getLocation().exists());
    Assert.assertFalse(other.getPartitionOutput(KEY_5).getLocation().exists());
    // 13b. second output fails, first output must rollback 0 and 5
    mrManager.start(ImmutableMap.of("output.datasets", BOTH, OTHER_OUT, "1", "input.text", "0x 5x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    otherValidator.validate();
    Assert.assertFalse(pfs.getPartitionOutput(KEY_0).getLocation().exists());
    Assert.assertFalse(pfs.getPartitionOutput(KEY_5).getLocation().exists());
    // 14. multi-output mapreduce writing partition fails because partition already exists
    // 14a. first output fails, other output must rollback partition 5
    mrManager.start(ImmutableMap.of("output.datasets", BOTH, PFS_OUT, "2", OTHER_OUT, "5", "input.text", "2x 5x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    otherValidator.validate();
    Assert.assertFalse(other.getPartitionOutput(KEY_5).getLocation().exists());
    // 14b. second output fails, first output must rollback partition 5
    mrManager.start(ImmutableMap.of("output.datasets", BOTH, PFS_OUT, "5", OTHER_OUT, "2", "input.text", "2x 5x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    otherValidator.validate();
    Assert.assertFalse(pfs.getPartitionOutput(KEY_5).getLocation().exists());
    // 15. multi-output mapreduce writing partition fails because Hive partition already exists
    // 15a. first output fails, other output must rollback partitions 0 and 5
    mrManager.start(ImmutableMap.of("output.datasets", BOTH, PFS_OUT, "4", "input.text", "0x 5x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    otherValidator.validate();
    Assert.assertFalse(pfs.getPartitionOutput(KEY_4).getLocation().exists());
    Assert.assertFalse(other.getPartitionOutput(KEY_0).getLocation().exists());
    Assert.assertFalse(other.getPartitionOutput(KEY_5).getLocation().exists());
    // 15b. second output fails, first output must rollback partitions 0 and 5
    mrManager.start(ImmutableMap.of("output.datasets", BOTH, OTHER_OUT, "4", "input.text", "0x 5x"));
    mrManager.waitForRuns(ProgramRunStatus.FAILED, ++numRuns, 2, TimeUnit.MINUTES);
    pfsValidator.validate();
    otherValidator.validate();
    Assert.assertFalse(other.getPartitionOutput(KEY_4).getLocation().exists());
    Assert.assertFalse(pfs.getPartitionOutput(KEY_0).getLocation().exists());
    Assert.assertFalse(pfs.getPartitionOutput(KEY_5).getLocation().exists());
}
Also used : ApplicationManager(co.cask.cdap.test.ApplicationManager) MapReduceManager(co.cask.cdap.test.MapReduceManager) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) TransactionFailureException(org.apache.tephra.TransactionFailureException) DataSetException(co.cask.cdap.api.dataset.DataSetException) PartitionOutput(co.cask.cdap.api.dataset.lib.PartitionOutput) UnitTestManager(co.cask.cdap.test.UnitTestManager) OutputStreamWriter(java.io.OutputStreamWriter) OutputStreamWriter(java.io.OutputStreamWriter) Writer(java.io.Writer) Location(org.apache.twill.filesystem.Location) Test(org.junit.Test)

Example 15 with Location

use of org.apache.twill.filesystem.Location in project cdap by caskdata.

the class SparkFileSetTestRun method addTimePartition.

private void addTimePartition(DataSetManager<TimePartitionedFileSet> tpfsManager, long inputTime) throws IOException, TransactionFailureException, InterruptedException {
    TimePartitionedFileSet tpfs = tpfsManager.get();
    PartitionOutput partitionOutput = tpfs.getPartitionOutput(inputTime);
    Location location = partitionOutput.getLocation();
    prepareFileInput(location);
    partitionOutput.addPartition();
    tpfsManager.flush();
}
Also used : PartitionOutput(co.cask.cdap.api.dataset.lib.PartitionOutput) TimePartitionedFileSet(co.cask.cdap.api.dataset.lib.TimePartitionedFileSet) Location(org.apache.twill.filesystem.Location)

Aggregations

Location (org.apache.twill.filesystem.Location)272 Test (org.junit.Test)110 IOException (java.io.IOException)67 File (java.io.File)45 FileSet (co.cask.cdap.api.dataset.lib.FileSet)32 LocationFactory (org.apache.twill.filesystem.LocationFactory)32 LocalLocationFactory (org.apache.twill.filesystem.LocalLocationFactory)31 PartitionedFileSet (co.cask.cdap.api.dataset.lib.PartitionedFileSet)27 StreamEvent (co.cask.cdap.api.flow.flowlet.StreamEvent)27 CConfiguration (co.cask.cdap.common.conf.CConfiguration)20 HashMap (java.util.HashMap)20 NamespaceId (co.cask.cdap.proto.id.NamespaceId)19 Manifest (java.util.jar.Manifest)18 StreamId (co.cask.cdap.proto.id.StreamId)17 ArrayList (java.util.ArrayList)15 DatasetFramework (co.cask.cdap.data2.dataset2.DatasetFramework)13 OutputStream (java.io.OutputStream)13 TimePartitionedFileSet (co.cask.cdap.api.dataset.lib.TimePartitionedFileSet)11 ApplicationManager (co.cask.cdap.test.ApplicationManager)11 HashSet (java.util.HashSet)11