Search in sources :

Example 1 with GobblinOrcMemoryManager

use of org.apache.gobblin.writer.GobblinOrcMemoryManager in project incubator-gobblin by apache.

the class OrcKeyCompactorOutputFormat method getRecordWriter.

/**
 * Required for extension since super method hard-coded file extension as ".orc". To keep flexibility
 * of extension name, we made it configuration driven.
 * @param taskAttemptContext The source of configuration that determines the file extension
 * @return The {@link RecordWriter} that write out Orc object.
 * @throws IOException
 */
@Override
public RecordWriter getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException {
    Configuration conf = taskAttemptContext.getConfiguration();
    String extension = "." + conf.get(COMPACTION_OUTPUT_EXTENSION, "orc");
    Path filename = getDefaultWorkFile(taskAttemptContext, extension);
    Writer writer = OrcFile.createWriter(filename, org.apache.orc.mapred.OrcOutputFormat.buildOptions(conf).memory(new GobblinOrcMemoryManager(conf)));
    int rowBatchSize = conf.getInt(GobblinOrcWriter.ORC_WRITER_BATCH_SIZE, GobblinOrcWriter.DEFAULT_ORC_WRITER_BATCH_SIZE);
    log.info("Creating OrcMapreduceRecordWriter with row batch size = {}", rowBatchSize);
    return new OrcMapreduceRecordWriter(writer, rowBatchSize);
}
Also used : Path(org.apache.hadoop.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) OrcMapreduceRecordWriter(org.apache.orc.mapreduce.OrcMapreduceRecordWriter) GobblinOrcMemoryManager(org.apache.gobblin.writer.GobblinOrcMemoryManager) RecordWriter(org.apache.hadoop.mapreduce.RecordWriter) Writer(org.apache.orc.Writer) GobblinOrcWriter(org.apache.gobblin.writer.GobblinOrcWriter) OrcMapreduceRecordWriter(org.apache.orc.mapreduce.OrcMapreduceRecordWriter)

Aggregations

GobblinOrcMemoryManager (org.apache.gobblin.writer.GobblinOrcMemoryManager)1 GobblinOrcWriter (org.apache.gobblin.writer.GobblinOrcWriter)1 Configuration (org.apache.hadoop.conf.Configuration)1 Path (org.apache.hadoop.fs.Path)1 RecordWriter (org.apache.hadoop.mapreduce.RecordWriter)1 Writer (org.apache.orc.Writer)1 OrcMapreduceRecordWriter (org.apache.orc.mapreduce.OrcMapreduceRecordWriter)1