Search in sources :

Example 1 with MapredParquetOutputFormat

use of org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat in project presto by prestodb.

the class ParquetRecordWriterUtil method createParquetWriter.

public static RecordWriter createParquetWriter(Path target, JobConf conf, Properties properties, boolean compress, ConnectorSession session) throws IOException, ReflectiveOperationException {
    conf.setLong(ParquetOutputFormat.BLOCK_SIZE, getParquetWriterBlockSize(session).toBytes());
    conf.setLong(ParquetOutputFormat.PAGE_SIZE, getParquetWriterPageSize(session).toBytes());
    RecordWriter recordWriter = new MapredParquetOutputFormat().getHiveRecordWriter(conf, target, Text.class, compress, properties, Reporter.NULL);
    Object realWriter = REAL_WRITER_FIELD.get(recordWriter);
    Object internalWriter = INTERNAL_WRITER_FIELD.get(realWriter);
    ParquetFileWriter fileWriter = (ParquetFileWriter) FILE_WRITER_FIELD.get(internalWriter);
    return new ExtendedRecordWriter() {

        private long length;

        @Override
        public long getWrittenBytes() {
            return length;
        }

        @Override
        public void write(Writable value) throws IOException {
            recordWriter.write(value);
            length = fileWriter.getPos();
        }

        @Override
        public void close(boolean abort) throws IOException {
            recordWriter.close(abort);
            if (!abort) {
                length = target.getFileSystem(conf).getFileStatus(target).getLen();
            }
        }
    };
}
Also used : RecordWriter(org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter) ParquetRecordWriter(org.apache.parquet.hadoop.ParquetRecordWriter) ExtendedRecordWriter(com.facebook.presto.hive.RecordFileWriter.ExtendedRecordWriter) MapredParquetOutputFormat(org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat) ParquetFileWriter(org.apache.parquet.hadoop.ParquetFileWriter) ExtendedRecordWriter(com.facebook.presto.hive.RecordFileWriter.ExtendedRecordWriter) Writable(org.apache.hadoop.io.Writable)

Example 2 with MapredParquetOutputFormat

use of org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat in project presto by prestodb.

the class ParquetTester method writeParquetColumn.

private static DataSize writeParquetColumn(JobConf jobConf, File outputFile, CompressionCodecName compressionCodecName, ObjectInspector columnObjectInspector, Iterator<?> values) throws Exception {
    RecordWriter recordWriter = new MapredParquetOutputFormat().getHiveRecordWriter(jobConf, new Path(outputFile.toURI()), Text.class, compressionCodecName != UNCOMPRESSED, createTableProperties("test", columnObjectInspector.getTypeName()), () -> {
    });
    SettableStructObjectInspector objectInspector = createSettableStructObjectInspector("test", columnObjectInspector);
    Object row = objectInspector.create();
    List<StructField> fields = ImmutableList.copyOf(objectInspector.getAllStructFieldRefs());
    int i = 0;
    while (values.hasNext()) {
        Object value = values.next();
        objectInspector.setStructFieldData(row, fields.get(0), value);
        ParquetHiveSerDe serde = new ParquetHiveSerDe();
        serde.initialize(jobConf, createTableProperties("test", columnObjectInspector.getTypeName()), null);
        Writable record = serde.serialize(row, objectInspector);
        recordWriter.write(record);
        i++;
    }
    recordWriter.close(false);
    return succinctBytes(outputFile.length());
}
Also used : Path(org.apache.hadoop.fs.Path) SettableStructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector) RecordWriter(org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter) StructField(org.apache.hadoop.hive.serde2.objectinspector.StructField) MapredParquetOutputFormat(org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat) ParquetHiveSerDe(org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe) Writable(org.apache.hadoop.io.Writable)

Aggregations

RecordWriter (org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter)2 MapredParquetOutputFormat (org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat)2 Writable (org.apache.hadoop.io.Writable)2 ExtendedRecordWriter (com.facebook.presto.hive.RecordFileWriter.ExtendedRecordWriter)1 Path (org.apache.hadoop.fs.Path)1 ParquetHiveSerDe (org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)1 SettableStructObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector)1 StructField (org.apache.hadoop.hive.serde2.objectinspector.StructField)1 ParquetFileWriter (org.apache.parquet.hadoop.ParquetFileWriter)1 ParquetRecordWriter (org.apache.parquet.hadoop.ParquetRecordWriter)1