Search in sources :

Example 1 with OrcMapreduceRecordWriter

use of org.apache.orc.mapreduce.OrcMapreduceRecordWriter in project incubator-gobblin by apache.

the class OrcCompactionTaskTest method writeOrcRecordsInFile.

private void writeOrcRecordsInFile(Path path, TypeDescription schema, List<OrcStruct> orcStructs) throws Exception {
    Configuration configuration = new Configuration();
    OrcFile.WriterOptions options = OrcFile.writerOptions(configuration).setSchema(schema);
    Writer writer = OrcFile.createWriter(path, options);
    OrcMapreduceRecordWriter recordWriter = new OrcMapreduceRecordWriter(writer);
    for (OrcStruct orcRecord : orcStructs) {
        recordWriter.write(NullWritable.get(), orcRecord);
    }
    recordWriter.close(new TaskAttemptContextImpl(configuration, new TaskAttemptID()));
}
Also used : OrcStruct(org.apache.orc.mapred.OrcStruct) Configuration(org.apache.hadoop.conf.Configuration) OrcMapreduceRecordWriter(org.apache.orc.mapreduce.OrcMapreduceRecordWriter) OrcFile(org.apache.orc.OrcFile) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) TaskAttemptContextImpl(org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl) OrcMapreduceRecordWriter(org.apache.orc.mapreduce.OrcMapreduceRecordWriter) Writer(org.apache.orc.Writer)

Example 2 with OrcMapreduceRecordWriter

use of org.apache.orc.mapreduce.OrcMapreduceRecordWriter in project incubator-gobblin by apache.

the class OrcKeyCompactorOutputFormat method getRecordWriter.

/**
 * Required for extension since super method hard-coded file extension as ".orc". To keep flexibility
 * of extension name, we made it configuration driven.
 * @param taskAttemptContext The source of configuration that determines the file extension
 * @return The {@link RecordWriter} that write out Orc object.
 * @throws IOException
 */
@Override
public RecordWriter getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException {
    Configuration conf = taskAttemptContext.getConfiguration();
    String extension = "." + conf.get(COMPACTION_OUTPUT_EXTENSION, "orc");
    Path filename = getDefaultWorkFile(taskAttemptContext, extension);
    Writer writer = OrcFile.createWriter(filename, org.apache.orc.mapred.OrcOutputFormat.buildOptions(conf).memory(new GobblinOrcMemoryManager(conf)));
    int rowBatchSize = conf.getInt(GobblinOrcWriter.ORC_WRITER_BATCH_SIZE, GobblinOrcWriter.DEFAULT_ORC_WRITER_BATCH_SIZE);
    log.info("Creating OrcMapreduceRecordWriter with row batch size = {}", rowBatchSize);
    return new OrcMapreduceRecordWriter(writer, rowBatchSize);
}
Also used : Path(org.apache.hadoop.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) OrcMapreduceRecordWriter(org.apache.orc.mapreduce.OrcMapreduceRecordWriter) GobblinOrcMemoryManager(org.apache.gobblin.writer.GobblinOrcMemoryManager) RecordWriter(org.apache.hadoop.mapreduce.RecordWriter) Writer(org.apache.orc.Writer) GobblinOrcWriter(org.apache.gobblin.writer.GobblinOrcWriter) OrcMapreduceRecordWriter(org.apache.orc.mapreduce.OrcMapreduceRecordWriter)

Aggregations

Configuration (org.apache.hadoop.conf.Configuration)2 Writer (org.apache.orc.Writer)2 OrcMapreduceRecordWriter (org.apache.orc.mapreduce.OrcMapreduceRecordWriter)2 GobblinOrcMemoryManager (org.apache.gobblin.writer.GobblinOrcMemoryManager)1 GobblinOrcWriter (org.apache.gobblin.writer.GobblinOrcWriter)1 Path (org.apache.hadoop.fs.Path)1 RecordWriter (org.apache.hadoop.mapreduce.RecordWriter)1 TaskAttemptID (org.apache.hadoop.mapreduce.TaskAttemptID)1 TaskAttemptContextImpl (org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl)1 OrcFile (org.apache.orc.OrcFile)1 OrcStruct (org.apache.orc.mapred.OrcStruct)1