Search in sources :

Example 1 with GobblinOrcWriter

use of org.apache.gobblin.writer.GobblinOrcWriter in project incubator-gobblin by apache.

the class GobblinMCEPublisherTest method setUp.

@BeforeClass
public void setUp() throws Exception {
    tmpDir = Files.createTempDir();
    datasetDir = new File(tmpDir, "/data/tracking/testTable");
    dataFile = new File(datasetDir, "/hourly/2020/03/17/08/data.avro");
    Files.createParentDirs(dataFile);
    dataDir = new File(dataFile.getParent());
    Assert.assertTrue(dataDir.exists());
    writeRecord();
    _avroPartitionSchema = SchemaBuilder.record("partitionTest").fields().name("ds").type().optional().stringType().endRecord();
    // Write ORC file for test
    Schema schema = new Schema.Parser().parse(this.getClass().getClassLoader().getResourceAsStream("publisherTest/schema.avsc"));
    orcSchema = schema.toString();
    List<GenericRecord> recordList = deserializeAvroRecords(this.getClass(), schema, "publisherTest/data.json");
    // Mock WriterBuilder, bunch of mocking behaviors to work-around precondition checks in writer builder
    FsDataWriterBuilder<Schema, GenericRecord> mockBuilder = (FsDataWriterBuilder<Schema, GenericRecord>) Mockito.mock(FsDataWriterBuilder.class);
    when(mockBuilder.getSchema()).thenReturn(schema);
    State dummyState = new WorkUnit();
    String stagingDir = new File(tmpDir, "/orc/staging").getAbsolutePath();
    String outputDir = new File(tmpDir, "/orc/output").getAbsolutePath();
    dummyState.setProp(ConfigurationKeys.WRITER_STAGING_DIR, stagingDir);
    dummyState.setProp(ConfigurationKeys.WRITER_FILE_PATH, "simple");
    dummyState.setProp(ConfigurationKeys.WRITER_OUTPUT_DIR, outputDir);
    dummyState.setProp(ConfigurationKeys.WRITER_STAGING_DIR, stagingDir);
    when(mockBuilder.getFileName(dummyState)).thenReturn("file.orc");
    orcFilePath = new Path(outputDir, "simple/file.orc");
    // Having a closer to manage the life-cycle of the writer object.
    // Will verify if scenarios like double-close could survive.
    Closer closer = Closer.create();
    GobblinOrcWriter orcWriter = closer.register(new GobblinOrcWriter(mockBuilder, dummyState));
    for (GenericRecord record : recordList) {
        orcWriter.write(record);
    }
    orcWriter.commit();
    orcWriter.close();
    // Verify ORC file contains correct records.
    FileSystem fs = FileSystem.getLocal(new Configuration());
    Assert.assertTrue(fs.exists(orcFilePath));
}
Also used : Path(org.apache.hadoop.fs.Path) Closer(com.google.common.io.Closer) FsDataWriterBuilder(org.apache.gobblin.writer.FsDataWriterBuilder) Configuration(org.apache.hadoop.conf.Configuration) GobblinOrcWriter(org.apache.gobblin.writer.GobblinOrcWriter) Schema(org.apache.avro.Schema) WorkUnitState(gobblin.configuration.WorkUnitState) State(org.apache.gobblin.configuration.State) FileSystem(org.apache.hadoop.fs.FileSystem) WorkUnit(org.apache.gobblin.source.workunit.WorkUnit) GenericRecord(org.apache.avro.generic.GenericRecord) File(java.io.File) BeforeClass(org.testng.annotations.BeforeClass)

Aggregations

Closer (com.google.common.io.Closer)1 WorkUnitState (gobblin.configuration.WorkUnitState)1 File (java.io.File)1 Schema (org.apache.avro.Schema)1 GenericRecord (org.apache.avro.generic.GenericRecord)1 State (org.apache.gobblin.configuration.State)1 WorkUnit (org.apache.gobblin.source.workunit.WorkUnit)1 FsDataWriterBuilder (org.apache.gobblin.writer.FsDataWriterBuilder)1 GobblinOrcWriter (org.apache.gobblin.writer.GobblinOrcWriter)1 Configuration (org.apache.hadoop.conf.Configuration)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 Path (org.apache.hadoop.fs.Path)1 BeforeClass (org.testng.annotations.BeforeClass)1