Search in sources :

Example 1 with ExtendedRecordWriter

use of com.facebook.presto.hive.RecordFileWriter.ExtendedRecordWriter in project presto by prestodb.

the class ParquetRecordWriterUtil method createParquetWriter.

public static RecordWriter createParquetWriter(Path target, JobConf conf, Properties properties, boolean compress, ConnectorSession session) throws IOException, ReflectiveOperationException {
    conf.setLong(ParquetOutputFormat.BLOCK_SIZE, getParquetWriterBlockSize(session).toBytes());
    conf.setLong(ParquetOutputFormat.PAGE_SIZE, getParquetWriterPageSize(session).toBytes());
    RecordWriter recordWriter = new MapredParquetOutputFormat().getHiveRecordWriter(conf, target, Text.class, compress, properties, Reporter.NULL);
    Object realWriter = REAL_WRITER_FIELD.get(recordWriter);
    Object internalWriter = INTERNAL_WRITER_FIELD.get(realWriter);
    ParquetFileWriter fileWriter = (ParquetFileWriter) FILE_WRITER_FIELD.get(internalWriter);
    return new ExtendedRecordWriter() {

        private long length;

        @Override
        public long getWrittenBytes() {
            return length;
        }

        @Override
        public void write(Writable value) throws IOException {
            recordWriter.write(value);
            length = fileWriter.getPos();
        }

        @Override
        public void close(boolean abort) throws IOException {
            recordWriter.close(abort);
            if (!abort) {
                length = target.getFileSystem(conf).getFileStatus(target).getLen();
            }
        }
    };
}
Also used : RecordWriter(org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter) ParquetRecordWriter(org.apache.parquet.hadoop.ParquetRecordWriter) ExtendedRecordWriter(com.facebook.presto.hive.RecordFileWriter.ExtendedRecordWriter) MapredParquetOutputFormat(org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat) ParquetFileWriter(org.apache.parquet.hadoop.ParquetFileWriter) ExtendedRecordWriter(com.facebook.presto.hive.RecordFileWriter.ExtendedRecordWriter) Writable(org.apache.hadoop.io.Writable)

Example 2 with ExtendedRecordWriter

use of com.facebook.presto.hive.RecordFileWriter.ExtendedRecordWriter in project presto by prestodb.

the class HiveWriteUtils method createRcFileWriter.

private static RecordWriter createRcFileWriter(Path target, JobConf conf, Properties properties, boolean compress) throws IOException {
    int columns = properties.getProperty(META_TABLE_COLUMNS).split(",").length;
    RCFileOutputFormat.setColumnNumber(conf, columns);
    CompressionCodec codec = null;
    if (compress) {
        codec = ReflectionUtil.newInstance(getOutputCompressorClass(conf, DefaultCodec.class), conf);
    }
    RCFile.Writer writer = new RCFile.Writer(target.getFileSystem(conf), conf, target, () -> {
    }, codec);
    return new ExtendedRecordWriter() {

        private long length;

        @Override
        public long getWrittenBytes() {
            return length;
        }

        @Override
        public void write(Writable value) throws IOException {
            writer.append(value);
            length = writer.getLength();
        }

        @Override
        public void close(boolean abort) throws IOException {
            writer.close();
            if (!abort) {
                length = target.getFileSystem(conf).getFileStatus(target).getLen();
            }
        }
    };
}
Also used : RCFile(org.apache.hadoop.hive.ql.io.RCFile) ExtendedRecordWriter(com.facebook.presto.hive.RecordFileWriter.ExtendedRecordWriter) DateWritable(org.apache.hadoop.hive.serde2.io.DateWritable) Writable(org.apache.hadoop.io.Writable) IntWritable(org.apache.hadoop.io.IntWritable) BooleanWritable(org.apache.hadoop.io.BooleanWritable) DoubleWritable(org.apache.hadoop.hive.serde2.io.DoubleWritable) FloatWritable(org.apache.hadoop.io.FloatWritable) LongWritable(org.apache.hadoop.io.LongWritable) ShortWritable(org.apache.hadoop.hive.serde2.io.ShortWritable) ByteWritable(org.apache.hadoop.io.ByteWritable) BytesWritable(org.apache.hadoop.io.BytesWritable) TimestampWritable(org.apache.hadoop.hive.serde2.io.TimestampWritable) HiveDecimalWritable(org.apache.hadoop.hive.serde2.io.HiveDecimalWritable) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec) ExtendedRecordWriter(com.facebook.presto.hive.RecordFileWriter.ExtendedRecordWriter) RecordWriter(org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter) ParquetRecordWriterUtil.createParquetWriter(com.facebook.presto.hive.ParquetRecordWriterUtil.createParquetWriter)

Aggregations

ExtendedRecordWriter (com.facebook.presto.hive.RecordFileWriter.ExtendedRecordWriter)2 RecordWriter (org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter)2 Writable (org.apache.hadoop.io.Writable)2 ParquetRecordWriterUtil.createParquetWriter (com.facebook.presto.hive.ParquetRecordWriterUtil.createParquetWriter)1 RCFile (org.apache.hadoop.hive.ql.io.RCFile)1 MapredParquetOutputFormat (org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat)1 DateWritable (org.apache.hadoop.hive.serde2.io.DateWritable)1 DoubleWritable (org.apache.hadoop.hive.serde2.io.DoubleWritable)1 HiveDecimalWritable (org.apache.hadoop.hive.serde2.io.HiveDecimalWritable)1 ShortWritable (org.apache.hadoop.hive.serde2.io.ShortWritable)1 TimestampWritable (org.apache.hadoop.hive.serde2.io.TimestampWritable)1 BooleanWritable (org.apache.hadoop.io.BooleanWritable)1 ByteWritable (org.apache.hadoop.io.ByteWritable)1 BytesWritable (org.apache.hadoop.io.BytesWritable)1 FloatWritable (org.apache.hadoop.io.FloatWritable)1 IntWritable (org.apache.hadoop.io.IntWritable)1 LongWritable (org.apache.hadoop.io.LongWritable)1 CompressionCodec (org.apache.hadoop.io.compress.CompressionCodec)1 ParquetFileWriter (org.apache.parquet.hadoop.ParquetFileWriter)1 ParquetRecordWriter (org.apache.parquet.hadoop.ParquetRecordWriter)1