Search in sources :

Example 1 with WriterImpl

use of org.apache.orc.impl.WriterImpl in project flink by apache.

the class OrcNoHiveBulkWriterFactory method create.

@Override
public BulkWriter<RowData> create(FSDataOutputStream out) throws IOException {
    OrcFile.WriterOptions opts = OrcFile.writerOptions(new Properties(), conf);
    TypeDescription description = TypeDescription.fromString(schema);
    opts.setSchema(description);
    opts.physicalWriter(new NoHivePhysicalWriterImpl(out, opts));
    WriterImpl writer = new WriterImpl(null, new Path("."), opts);
    VectorizedRowBatch rowBatch = description.createRowBatch();
    return new BulkWriter<RowData>() {

        @Override
        public void addElement(RowData row) throws IOException {
            int rowId = rowBatch.size++;
            for (int i = 0; i < row.getArity(); ++i) {
                setColumn(rowId, rowBatch.cols[i], fieldTypes[i], row, i);
            }
            if (rowBatch.size == rowBatch.getMaxSize()) {
                writer.addRowBatch(rowBatch);
                rowBatch.reset();
            }
        }

        @Override
        public void flush() throws IOException {
            if (rowBatch.size != 0) {
                writer.addRowBatch(rowBatch);
                rowBatch.reset();
            }
        }

        @Override
        public void finish() throws IOException {
            flush();
            writer.close();
        }
    };
}
Also used : Path(org.apache.hadoop.fs.Path) VectorizedRowBatch(org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch) RowData(org.apache.flink.table.data.RowData) NoHivePhysicalWriterImpl(org.apache.flink.orc.nohive.writer.NoHivePhysicalWriterImpl) OrcFile(org.apache.orc.OrcFile) BulkWriter(org.apache.flink.api.common.serialization.BulkWriter) TypeDescription(org.apache.orc.TypeDescription) Properties(java.util.Properties) NoHivePhysicalWriterImpl(org.apache.flink.orc.nohive.writer.NoHivePhysicalWriterImpl) WriterImpl(org.apache.orc.impl.WriterImpl)

Example 2 with WriterImpl

use of org.apache.orc.impl.WriterImpl in project flink by apache.

the class OrcBulkWriterFactory method create.

@Override
public BulkWriter<T> create(FSDataOutputStream out) throws IOException {
    OrcFile.WriterOptions opts = getWriterOptions();
    opts.physicalWriter(new PhysicalWriterImpl(out, opts));
    // The path of the Writer is not used to indicate the destination file
    // in this case since we have used a dedicated physical writer to write
    // to the give output stream directly. However, the path would be used as
    // the key of writer in the ORC memory manager, thus we need to make it unique.
    Path unusedPath = new Path(UUID.randomUUID().toString());
    return new OrcBulkWriter<>(vectorizer, new WriterImpl(null, unusedPath, opts));
}
Also used : Path(org.apache.hadoop.fs.Path) OrcFile(org.apache.orc.OrcFile) WriterImpl(org.apache.orc.impl.WriterImpl)

Aggregations

Path (org.apache.hadoop.fs.Path)2 OrcFile (org.apache.orc.OrcFile)2 WriterImpl (org.apache.orc.impl.WriterImpl)2 Properties (java.util.Properties)1 BulkWriter (org.apache.flink.api.common.serialization.BulkWriter)1 NoHivePhysicalWriterImpl (org.apache.flink.orc.nohive.writer.NoHivePhysicalWriterImpl)1 RowData (org.apache.flink.table.data.RowData)1 TypeDescription (org.apache.orc.TypeDescription)1 VectorizedRowBatch (org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch)1