Search in sources :

Example 1 with TestMapredParquetOutputFormat

use of com.facebook.presto.hive.parquet.write.TestMapredParquetOutputFormat in project presto by prestodb.

the class ParquetTester method writeParquetColumn.

private static DataSize writeParquetColumn(JobConf jobConf, File outputFile, CompressionCodecName compressionCodecName, Properties tableProperties, SettableStructObjectInspector objectInspector, Iterator<?>[] valuesByField, Optional<MessageType> parquetSchema, boolean singleLevelArray) throws Exception {
    RecordWriter recordWriter = new TestMapredParquetOutputFormat(parquetSchema, singleLevelArray).getHiveRecordWriter(jobConf, new Path(outputFile.toURI()), Text.class, compressionCodecName != UNCOMPRESSED, tableProperties, () -> {
    });
    Object row = objectInspector.create();
    List<StructField> fields = ImmutableList.copyOf(objectInspector.getAllStructFieldRefs());
    while (stream(valuesByField).allMatch(Iterator::hasNext)) {
        for (int field = 0; field < fields.size(); field++) {
            Object value = valuesByField[field].next();
            objectInspector.setStructFieldData(row, fields.get(field), value);
        }
        ParquetHiveSerDe serde = new ParquetHiveSerDe();
        serde.initialize(jobConf, tableProperties, null);
        Writable record = serde.serialize(row, objectInspector);
        recordWriter.write(record);
    }
    recordWriter.close(false);
    return succinctBytes(outputFile.length());
}
Also used : Path(org.apache.hadoop.fs.Path) RecordWriter(org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter) TestMapredParquetOutputFormat(com.facebook.presto.hive.parquet.write.TestMapredParquetOutputFormat) StructField(org.apache.hadoop.hive.serde2.objectinspector.StructField) ParquetHiveSerDe(org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe) AbstractIterator(com.google.common.collect.AbstractIterator) Iterator(java.util.Iterator) Writable(org.apache.hadoop.io.Writable)

Aggregations

TestMapredParquetOutputFormat (com.facebook.presto.hive.parquet.write.TestMapredParquetOutputFormat)1 AbstractIterator (com.google.common.collect.AbstractIterator)1 Iterator (java.util.Iterator)1 Path (org.apache.hadoop.fs.Path)1 RecordWriter (org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter)1 ParquetHiveSerDe (org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)1 StructField (org.apache.hadoop.hive.serde2.objectinspector.StructField)1 Writable (org.apache.hadoop.io.Writable)1