Search in sources :

Example 1 with OrcOutputFormat

use of org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat in project presto by prestodb.

the class OrcTester method createOrcRecordWriter.

static RecordWriter createOrcRecordWriter(File outputFile, Format format, Compression compression, ObjectInspector columnObjectInspector) throws IOException {
    JobConf jobConf = new JobConf();
    jobConf.set("hive.exec.orc.write.format", format == ORC_12 ? "0.12" : "0.11");
    jobConf.set("hive.exec.orc.default.compress", compression.name());
    return new OrcOutputFormat().getHiveRecordWriter(jobConf, new Path(outputFile.toURI()), Text.class, compression != NONE, createTableProperties("test", columnObjectInspector.getTypeName()), () -> {
    });
}
Also used : Path(org.apache.hadoop.fs.Path) JobConf(org.apache.hadoop.mapred.JobConf) OrcOutputFormat(org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat)

Example 2 with OrcOutputFormat

use of org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat in project presto by prestodb.

the class TestCachingOrcDataSource method createOrcRecordWriter.

private static FileSinkOperator.RecordWriter createOrcRecordWriter(File outputFile, Format format, Compression compression, ObjectInspector columnObjectInspector) throws IOException {
    JobConf jobConf = new JobConf();
    jobConf.set("hive.exec.orc.write.format", format == ORC_12 ? "0.12" : "0.11");
    jobConf.set("hive.exec.orc.default.compress", compression.name());
    Properties tableProperties = new Properties();
    tableProperties.setProperty("columns", "test");
    tableProperties.setProperty("columns.types", columnObjectInspector.getTypeName());
    tableProperties.setProperty("orc.stripe.size", "1200000");
    return new OrcOutputFormat().getHiveRecordWriter(jobConf, new Path(outputFile.toURI()), Text.class, compression != NONE, tableProperties, () -> {
    });
}
Also used : Path(org.apache.hadoop.fs.Path) Properties(java.util.Properties) JobConf(org.apache.hadoop.mapred.JobConf) OrcOutputFormat(org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat)

Example 3 with OrcOutputFormat

use of org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat in project presto by prestodb.

the class OrcTester method createOrcRecordWriter.

static RecordWriter createOrcRecordWriter(File outputFile, Format format, CompressionKind compression, List<Type> types) throws IOException {
    JobConf jobConf = new JobConf();
    OrcConf.WRITE_FORMAT.setString(jobConf, format == ORC_12 ? "0.12" : "0.11");
    OrcConf.COMPRESS.setString(jobConf, compression.name());
    return new OrcOutputFormat().getHiveRecordWriter(jobConf, new Path(outputFile.toURI()), Text.class, compression != NONE, createTableProperties(types), () -> {
    });
}
Also used : Path(org.apache.hadoop.fs.Path) JobConf(org.apache.hadoop.mapred.JobConf) OrcOutputFormat(org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat)

Example 4 with OrcOutputFormat

use of org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat in project presto by prestodb.

the class TestCachingOrcDataSource method createOrcRecordWriter.

private static FileSinkOperator.RecordWriter createOrcRecordWriter(File outputFile, Format format, CompressionKind compression, ObjectInspector columnObjectInspector) throws IOException {
    JobConf jobConf = new JobConf();
    OrcConf.WRITE_FORMAT.setString(jobConf, format == ORC_12 ? "0.12" : "0.11");
    OrcConf.COMPRESS.setString(jobConf, compression.name());
    Properties tableProperties = new Properties();
    tableProperties.setProperty(IOConstants.COLUMNS, "test");
    tableProperties.setProperty(IOConstants.COLUMNS_TYPES, columnObjectInspector.getTypeName());
    tableProperties.setProperty(OrcConf.STRIPE_SIZE.getAttribute(), "120000");
    return new OrcOutputFormat().getHiveRecordWriter(jobConf, new Path(outputFile.toURI()), Text.class, compression != NONE, tableProperties, () -> {
    });
}
Also used : Path(org.apache.hadoop.fs.Path) Properties(java.util.Properties) JobConf(org.apache.hadoop.mapred.JobConf) OrcOutputFormat(org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat)

Example 5 with OrcOutputFormat

use of org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat in project DataX by alibaba.

the class HdfsHelper method orcFileStartWrite.

/**
 * 写orcfile类型文件
 * @param lineReceiver
 * @param config
 * @param fileName
 * @param taskPluginCollector
 */
public void orcFileStartWrite(RecordReceiver lineReceiver, Configuration config, String fileName, TaskPluginCollector taskPluginCollector) {
    List<Configuration> columns = config.getListConfiguration(Key.COLUMN);
    String compress = config.getString(Key.COMPRESS, null);
    List<String> columnNames = getColumnNames(columns);
    List<ObjectInspector> columnTypeInspectors = getColumnTypeInspectors(columns);
    StructObjectInspector inspector = (StructObjectInspector) ObjectInspectorFactory.getStandardStructObjectInspector(columnNames, columnTypeInspectors);
    OrcSerde orcSerde = new OrcSerde();
    FileOutputFormat outFormat = new OrcOutputFormat();
    if (!"NONE".equalsIgnoreCase(compress) && null != compress) {
        Class<? extends CompressionCodec> codecClass = getCompressCodec(compress);
        if (null != codecClass) {
            outFormat.setOutputCompressorClass(conf, codecClass);
        }
    }
    try {
        RecordWriter writer = outFormat.getRecordWriter(fileSystem, conf, fileName, Reporter.NULL);
        Record record = null;
        while ((record = lineReceiver.getFromReader()) != null) {
            MutablePair<List<Object>, Boolean> transportResult = transportOneRecord(record, columns, taskPluginCollector);
            if (!transportResult.getRight()) {
                writer.write(NullWritable.get(), orcSerde.serialize(transportResult.getLeft(), inspector));
            }
        }
        writer.close(Reporter.NULL);
    } catch (Exception e) {
        String message = String.format("写文件文件[%s]时发生IO异常,请检查您的网络是否正常!", fileName);
        LOG.error(message);
        Path path = new Path(fileName);
        deleteDir(path.getParent());
        throw DataXException.asDataXException(HdfsWriterErrorCode.Write_FILE_IO_ERROR, e);
    }
}
Also used : ObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) StructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector) Configuration(com.alibaba.datax.common.util.Configuration) OrcOutputFormat(org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat) IOException(java.io.IOException) DataXException(com.alibaba.datax.common.exception.DataXException) OrcSerde(org.apache.hadoop.hive.ql.io.orc.OrcSerde) Record(com.alibaba.datax.common.element.Record) StructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)

Aggregations

OrcOutputFormat (org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat)5 Path (org.apache.hadoop.fs.Path)4 JobConf (org.apache.hadoop.mapred.JobConf)4 Properties (java.util.Properties)2 Record (com.alibaba.datax.common.element.Record)1 DataXException (com.alibaba.datax.common.exception.DataXException)1 Configuration (com.alibaba.datax.common.util.Configuration)1 IOException (java.io.IOException)1 OrcSerde (org.apache.hadoop.hive.ql.io.orc.OrcSerde)1 ObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector)1 StructObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)1