Search in sources :

Example 1 with CopyableDatasetMetadata

use of org.apache.gobblin.data.management.copy.CopyableDatasetMetadata in project incubator-gobblin by apache.

the class DeletingCopyDataPublisherTest method testDeleteOnSource.

@Test
public void testDeleteOnSource() throws Exception {
    State state = getTestState("testDeleteOnSource");
    Path testMethodTempPath = new Path(testClassTempPath, "testDeleteOnSource");
    DeletingCopyDataPublisher copyDataPublisher = closer.register(new DeletingCopyDataPublisher(state));
    File outputDir = new File(testMethodTempPath.toString(), "task-output/jobid/1f042f494d1fe2198e0e71a17faa233f33b5099b");
    outputDir.mkdirs();
    outputDir.deleteOnExit();
    WorkUnitState wus = new WorkUnitState();
    CopyableDataset copyableDataset = new TestCopyableDataset(new Path("origin"));
    CopyableDatasetMetadata metadata = new CopyableDatasetMetadata(copyableDataset);
    CopyEntity cf = CopyableFileUtils.createTestCopyableFile(new Path(testMethodTempPath, "test.txt").toString());
    CopySource.serializeCopyableDataset(wus, metadata);
    CopySource.serializeCopyEntity(wus, cf);
    Assert.assertTrue(fs.exists(new Path(testMethodTempPath, "test.txt")));
    wus.setWorkingState(WorkingState.SUCCESSFUL);
    copyDataPublisher.publishData(ImmutableList.of(wus));
    Assert.assertFalse(fs.exists(new Path(testMethodTempPath, "test.txt")));
}
Also used : Path(org.apache.hadoop.fs.Path) TestCopyableDataset(org.apache.gobblin.data.management.copy.TestCopyableDataset) CopyableDataset(org.apache.gobblin.data.management.copy.CopyableDataset) TestCopyableDataset(org.apache.gobblin.data.management.copy.TestCopyableDataset) CopyEntity(org.apache.gobblin.data.management.copy.CopyEntity) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) State(org.apache.gobblin.configuration.State) WorkingState(org.apache.gobblin.configuration.WorkUnitState.WorkingState) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) CopyableDatasetMetadata(org.apache.gobblin.data.management.copy.CopyableDatasetMetadata) File(java.io.File) Test(org.testng.annotations.Test)

Example 2 with CopyableDatasetMetadata

use of org.apache.gobblin.data.management.copy.CopyableDatasetMetadata in project incubator-gobblin by apache.

the class FileAwareInputStreamDataWriterTest method testWrite.

@Test
public void testWrite() throws Exception {
    String streamString = "testContents";
    FileStatus status = fs.getFileStatus(testTempPath);
    OwnerAndPermission ownerAndPermission = new OwnerAndPermission(status.getOwner(), status.getGroup(), new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.ALL));
    CopyableFile cf = CopyableFileUtils.getTestCopyableFile(ownerAndPermission);
    CopyableDatasetMetadata metadata = new CopyableDatasetMetadata(new TestCopyableDataset(new Path("/source")));
    WorkUnitState state = TestUtils.createTestWorkUnitState();
    state.setProp(ConfigurationKeys.WRITER_STAGING_DIR, new Path(testTempPath, "staging").toString());
    state.setProp(ConfigurationKeys.WRITER_OUTPUT_DIR, new Path(testTempPath, "output").toString());
    state.setProp(ConfigurationKeys.WRITER_FILE_PATH, RandomStringUtils.randomAlphabetic(5));
    CopySource.serializeCopyEntity(state, cf);
    CopySource.serializeCopyableDataset(state, metadata);
    FileAwareInputStreamDataWriter dataWriter = new FileAwareInputStreamDataWriter(state, 1, 0);
    FileAwareInputStream fileAwareInputStream = new FileAwareInputStream(cf, StreamUtils.convertStream(IOUtils.toInputStream(streamString)));
    dataWriter.write(fileAwareInputStream);
    dataWriter.commit();
    Path writtenFilePath = new Path(new Path(state.getProp(ConfigurationKeys.WRITER_OUTPUT_DIR), cf.getDatasetAndPartition(metadata).identifier()), cf.getDestination());
    Assert.assertEquals(IOUtils.toString(new FileInputStream(writtenFilePath.toString())), streamString);
}
Also used : TestCopyableDataset(org.apache.gobblin.data.management.copy.TestCopyableDataset) Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) CopyableDatasetMetadata(org.apache.gobblin.data.management.copy.CopyableDatasetMetadata) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) OwnerAndPermission(org.apache.gobblin.data.management.copy.OwnerAndPermission) FileAwareInputStream(org.apache.gobblin.data.management.copy.FileAwareInputStream) CopyableFile(org.apache.gobblin.data.management.copy.CopyableFile) FsPermission(org.apache.hadoop.fs.permission.FsPermission) FileInputStream(java.io.FileInputStream) Test(org.testng.annotations.Test)

Example 3 with CopyableDatasetMetadata

use of org.apache.gobblin.data.management.copy.CopyableDatasetMetadata in project incubator-gobblin by apache.

the class TarArchiveInputStreamDataWriterTest method testWrite.

@Test(dataProvider = "testFileDataProvider")
public void testWrite(final String filePath, final String newFileName, final String expectedText) throws Exception {
    String expectedFileContents = "text";
    String fileNameInArchive = "text.txt";
    WorkUnitState state = TestUtils.createTestWorkUnitState();
    state.setProp(ConfigurationKeys.WRITER_STAGING_DIR, new Path(testTempPath, "staging").toString());
    state.setProp(ConfigurationKeys.WRITER_OUTPUT_DIR, new Path(testTempPath, "output").toString());
    state.setProp(ConfigurationKeys.WRITER_FILE_PATH, "writer_file_path_" + RandomStringUtils.randomAlphabetic(5));
    CopyableDatasetMetadata metadata = new CopyableDatasetMetadata(new TestCopyableDataset(new Path("/source")));
    CopySource.serializeCopyableDataset(state, metadata);
    FileAwareInputStream fileAwareInputStream = getCompressedInputStream(filePath, newFileName);
    CopySource.serializeCopyEntity(state, fileAwareInputStream.getFile());
    TarArchiveInputStreamDataWriter dataWriter = new TarArchiveInputStreamDataWriter(state, 1, 0);
    dataWriter.write(fileAwareInputStream);
    dataWriter.commit();
    // the archive file contains file test.txt
    Path unArchivedFilePath = new Path(fileAwareInputStream.getFile().getDestination(), fileNameInArchive);
    // Path at which the writer writes text.txt
    Path taskOutputFilePath = new Path(new Path(state.getProp(ConfigurationKeys.WRITER_OUTPUT_DIR), fileAwareInputStream.getFile().getDatasetAndPartition(metadata).identifier()), PathUtils.withoutLeadingSeparator(unArchivedFilePath));
    Assert.assertEquals(IOUtils.toString(new FileInputStream(taskOutputFilePath.toString())).trim(), expectedFileContents);
}
Also used : Path(org.apache.hadoop.fs.Path) TestCopyableDataset(org.apache.gobblin.data.management.copy.TestCopyableDataset) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) CopyableDatasetMetadata(org.apache.gobblin.data.management.copy.CopyableDatasetMetadata) FileAwareInputStream(org.apache.gobblin.data.management.copy.FileAwareInputStream) FileInputStream(java.io.FileInputStream) Test(org.testng.annotations.Test)

Example 4 with CopyableDatasetMetadata

use of org.apache.gobblin.data.management.copy.CopyableDatasetMetadata in project incubator-gobblin by apache.

the class FileAwareInputStreamDataWriterTest method testCommit.

@Test
public void testCommit() throws IOException {
    String destinationExistingToken = "destination";
    String destinationAdditionalTokens = "path";
    String fileName = "file";
    // Asemble destination paths
    Path destination = new Path(new Path(new Path("/", destinationExistingToken), destinationAdditionalTokens), fileName);
    Path destinationWithoutLeadingSeparator = new Path(new Path(destinationExistingToken, destinationAdditionalTokens), fileName);
    // Create temp directory
    File tmpFile = Files.createTempDir();
    tmpFile.deleteOnExit();
    Path tmpPath = new Path(tmpFile.getAbsolutePath());
    // create origin file
    Path originFile = new Path(tmpPath, fileName);
    this.fs.createNewFile(originFile);
    // create stating dir
    Path stagingDir = new Path(tmpPath, "staging");
    this.fs.mkdirs(stagingDir);
    // create output dir
    Path outputDir = new Path(tmpPath, "output");
    this.fs.mkdirs(outputDir);
    // create copyable file
    FileStatus status = this.fs.getFileStatus(originFile);
    FsPermission readWrite = new FsPermission(FsAction.READ_WRITE, FsAction.READ_WRITE, FsAction.READ_WRITE);
    FsPermission dirReadWrite = new FsPermission(FsAction.ALL, FsAction.READ_WRITE, FsAction.READ_WRITE);
    OwnerAndPermission ownerAndPermission = new OwnerAndPermission(status.getOwner(), status.getGroup(), readWrite);
    List<OwnerAndPermission> ancestorOwnerAndPermissions = Lists.newArrayList();
    ancestorOwnerAndPermissions.add(ownerAndPermission);
    ancestorOwnerAndPermissions.add(ownerAndPermission);
    ancestorOwnerAndPermissions.add(ownerAndPermission);
    ancestorOwnerAndPermissions.add(ownerAndPermission);
    Properties properties = new Properties();
    properties.setProperty(ConfigurationKeys.DATA_PUBLISHER_FINAL_DIR, "/publisher");
    CopyableFile cf = CopyableFile.fromOriginAndDestination(this.fs, status, destination, CopyConfiguration.builder(FileSystem.getLocal(new Configuration()), properties).publishDir(new Path("/target")).preserve(PreserveAttributes.fromMnemonicString("")).build()).destinationOwnerAndPermission(ownerAndPermission).ancestorsOwnerAndPermission(ancestorOwnerAndPermissions).build();
    // create work unit state
    WorkUnitState state = TestUtils.createTestWorkUnitState();
    state.setProp(ConfigurationKeys.WRITER_STAGING_DIR, stagingDir.toUri().getPath());
    state.setProp(ConfigurationKeys.WRITER_OUTPUT_DIR, outputDir.toUri().getPath());
    state.setProp(ConfigurationKeys.WRITER_FILE_PATH, RandomStringUtils.randomAlphabetic(5));
    CopyableDatasetMetadata metadata = new CopyableDatasetMetadata(new TestCopyableDataset(new Path("/source")));
    CopySource.serializeCopyEntity(state, cf);
    CopySource.serializeCopyableDataset(state, metadata);
    // create writer
    FileAwareInputStreamDataWriter writer = new FileAwareInputStreamDataWriter(state, 1, 0);
    // create output of writer.write
    Path writtenFile = writer.getStagingFilePath(cf);
    this.fs.mkdirs(writtenFile.getParent());
    this.fs.createNewFile(writtenFile);
    // create existing directories in writer output
    Path outputRoot = FileAwareInputStreamDataWriter.getPartitionOutputRoot(outputDir, cf.getDatasetAndPartition(metadata));
    Path existingOutputPath = new Path(outputRoot, destinationExistingToken);
    this.fs.mkdirs(existingOutputPath);
    FileStatus fileStatus = this.fs.getFileStatus(existingOutputPath);
    FsPermission existingPathPermission = fileStatus.getPermission();
    // check initial state of the relevant directories
    Assert.assertTrue(this.fs.exists(existingOutputPath));
    Assert.assertEquals(this.fs.listStatus(existingOutputPath).length, 0);
    writer.actualProcessedCopyableFile = Optional.of(cf);
    // commit
    writer.commit();
    // check state of relevant paths after commit
    Path expectedOutputPath = new Path(outputRoot, destinationWithoutLeadingSeparator);
    Assert.assertTrue(this.fs.exists(expectedOutputPath));
    fileStatus = this.fs.getFileStatus(expectedOutputPath);
    Assert.assertEquals(fileStatus.getOwner(), ownerAndPermission.getOwner());
    Assert.assertEquals(fileStatus.getGroup(), ownerAndPermission.getGroup());
    Assert.assertEquals(fileStatus.getPermission(), readWrite);
    // parent should have permissions set correctly
    fileStatus = this.fs.getFileStatus(expectedOutputPath.getParent());
    Assert.assertEquals(fileStatus.getPermission(), dirReadWrite);
    // previously existing paths should not have permissions changed
    fileStatus = this.fs.getFileStatus(existingOutputPath);
    Assert.assertEquals(fileStatus.getPermission(), existingPathPermission);
    Assert.assertFalse(this.fs.exists(writer.stagingDir));
}
Also used : Path(org.apache.hadoop.fs.Path) TestCopyableDataset(org.apache.gobblin.data.management.copy.TestCopyableDataset) FileStatus(org.apache.hadoop.fs.FileStatus) Configuration(org.apache.hadoop.conf.Configuration) CopyConfiguration(org.apache.gobblin.data.management.copy.CopyConfiguration) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) CopyableDatasetMetadata(org.apache.gobblin.data.management.copy.CopyableDatasetMetadata) Properties(java.util.Properties) OwnerAndPermission(org.apache.gobblin.data.management.copy.OwnerAndPermission) CopyableFile(org.apache.gobblin.data.management.copy.CopyableFile) FsPermission(org.apache.hadoop.fs.permission.FsPermission) File(java.io.File) CopyableFile(org.apache.gobblin.data.management.copy.CopyableFile) Test(org.testng.annotations.Test)

Example 5 with CopyableDatasetMetadata

use of org.apache.gobblin.data.management.copy.CopyableDatasetMetadata in project incubator-gobblin by apache.

the class FileAwareInputStreamDataWriterTest method testWriteWithEncryption.

@Test
public void testWriteWithEncryption() throws Exception {
    byte[] streamString = "testEncryptedContents".getBytes("UTF-8");
    byte[] expectedContents = new byte[streamString.length];
    for (int i = 0; i < streamString.length; i++) {
        expectedContents[i] = (byte) ((streamString[i] + 1) % 256);
    }
    FileStatus status = fs.getFileStatus(testTempPath);
    OwnerAndPermission ownerAndPermission = new OwnerAndPermission(status.getOwner(), status.getGroup(), new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.ALL));
    CopyableFile cf = CopyableFileUtils.getTestCopyableFile(ownerAndPermission);
    CopyableDatasetMetadata metadata = new CopyableDatasetMetadata(new TestCopyableDataset(new Path("/source")));
    WorkUnitState state = TestUtils.createTestWorkUnitState();
    state.setProp(ConfigurationKeys.WRITER_STAGING_DIR, new Path(testTempPath, "staging").toString());
    state.setProp(ConfigurationKeys.WRITER_OUTPUT_DIR, new Path(testTempPath, "output").toString());
    state.setProp(ConfigurationKeys.WRITER_FILE_PATH, RandomStringUtils.randomAlphabetic(5));
    state.setProp("writer.encrypt." + EncryptionConfigParser.ENCRYPTION_ALGORITHM_KEY, "insecure_shift");
    CopySource.serializeCopyEntity(state, cf);
    CopySource.serializeCopyableDataset(state, metadata);
    FileAwareInputStreamDataWriter dataWriter = new FileAwareInputStreamDataWriter(state, 1, 0);
    FileAwareInputStream fileAwareInputStream = new FileAwareInputStream(cf, StreamUtils.convertStream(new ByteArrayInputStream(streamString)));
    dataWriter.write(fileAwareInputStream);
    dataWriter.commit();
    Path writtenFilePath = new Path(new Path(state.getProp(ConfigurationKeys.WRITER_OUTPUT_DIR), cf.getDatasetAndPartition(metadata).identifier()), cf.getDestination());
    Assert.assertTrue(writtenFilePath.getName().endsWith("insecure_shift"), "Expected encryption name to be appended to destination");
    Assert.assertEquals(IOUtils.toByteArray(new FileInputStream(writtenFilePath.toString())), expectedContents);
}
Also used : TestCopyableDataset(org.apache.gobblin.data.management.copy.TestCopyableDataset) Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) CopyableDatasetMetadata(org.apache.gobblin.data.management.copy.CopyableDatasetMetadata) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) FileInputStream(java.io.FileInputStream) ByteArrayInputStream(java.io.ByteArrayInputStream) OwnerAndPermission(org.apache.gobblin.data.management.copy.OwnerAndPermission) FileAwareInputStream(org.apache.gobblin.data.management.copy.FileAwareInputStream) CopyableFile(org.apache.gobblin.data.management.copy.CopyableFile) FsPermission(org.apache.hadoop.fs.permission.FsPermission) Test(org.testng.annotations.Test)

Aggregations

WorkUnitState (org.apache.gobblin.configuration.WorkUnitState)7 CopyableDatasetMetadata (org.apache.gobblin.data.management.copy.CopyableDatasetMetadata)7 Path (org.apache.hadoop.fs.Path)7 CopyableFile (org.apache.gobblin.data.management.copy.CopyableFile)5 TestCopyableDataset (org.apache.gobblin.data.management.copy.TestCopyableDataset)5 Test (org.testng.annotations.Test)5 FileInputStream (java.io.FileInputStream)3 CopyEntity (org.apache.gobblin.data.management.copy.CopyEntity)3 FileAwareInputStream (org.apache.gobblin.data.management.copy.FileAwareInputStream)3 OwnerAndPermission (org.apache.gobblin.data.management.copy.OwnerAndPermission)3 FileStatus (org.apache.hadoop.fs.FileStatus)3 FsPermission (org.apache.hadoop.fs.permission.FsPermission)3 File (java.io.File)2 CommitStepCopyEntity (org.apache.gobblin.data.management.copy.entities.CommitStepCopyEntity)2 ByteArrayInputStream (java.io.ByteArrayInputStream)1 Properties (java.util.Properties)1 CommitStep (org.apache.gobblin.commit.CommitStep)1 State (org.apache.gobblin.configuration.State)1 WorkingState (org.apache.gobblin.configuration.WorkUnitState.WorkingState)1 CopyConfiguration (org.apache.gobblin.data.management.copy.CopyConfiguration)1