Search in sources :

Example 1 with Reader

use of org.apache.orc.Reader in project flink by apache.

the class OrcNoHiveShim method createRecordReader.

@Override
public RecordReader createRecordReader(Configuration conf, TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, org.apache.flink.core.fs.Path path, long splitStart, long splitLength) throws IOException {
    // open ORC file and create reader
    org.apache.hadoop.fs.Path hPath = new org.apache.hadoop.fs.Path(path.toUri());
    Reader orcReader = OrcFile.createReader(hPath, OrcFile.readerOptions(conf));
    // get offset and length for the stripes that start in the split
    Tuple2<Long, Long> offsetAndLength = getOffsetAndLengthForSplit(splitStart, splitLength, orcReader.getStripes());
    // create ORC row reader configuration
    Reader.Options options = new Reader.Options().schema(schema).range(offsetAndLength.f0, offsetAndLength.f1).useZeroCopy(OrcConf.USE_ZEROCOPY.getBoolean(conf)).skipCorruptRecords(OrcConf.SKIP_CORRUPT_DATA.getBoolean(conf)).tolerateMissingSchema(OrcConf.TOLERATE_MISSING_SCHEMA.getBoolean(conf));
    // TODO configure filters
    // configure selected fields
    options.include(computeProjectionMask(schema, selectedFields));
    // create ORC row reader
    RecordReader orcRowsReader = orcReader.rows(options);
    // assign ids
    schema.getId();
    return orcRowsReader;
}
Also used : RecordReader(org.apache.orc.RecordReader) RecordReader(org.apache.orc.RecordReader) Reader(org.apache.orc.Reader)

Example 2 with Reader

use of org.apache.orc.Reader in project flink by apache.

the class OrcShimV200 method createReader.

protected Reader createReader(Path path, Configuration conf) throws IOException {
    try {
        Class orcFileClass = Class.forName("org.apache.hadoop.hive.ql.io.orc.OrcFile");
        Object readerOptions = invokeStaticMethod(orcFileClass, "readerOptions", conf);
        Class readerClass = Class.forName("org.apache.hadoop.hive.ql.io.orc.ReaderImpl");
        // noinspection unchecked
        return (Reader) invokeConstructor(readerClass, path, readerOptions);
    } catch (ClassNotFoundException | NoSuchMethodException | IllegalAccessException | InstantiationException | InvocationTargetException e) {
        throw new IOException(e);
    }
}
Also used : RecordReader(org.apache.orc.RecordReader) Reader(org.apache.orc.Reader) IOException(java.io.IOException) InvocationTargetException(java.lang.reflect.InvocationTargetException)

Example 3 with Reader

use of org.apache.orc.Reader in project flink by apache.

the class OrcFileSystemITCase method testNonPartition.

@Override
public void testNonPartition() {
    super.testNonPartition();
    // test configure success
    File directory = new File(URI.create(resultPath()).getPath());
    File[] files = directory.listFiles((dir, name) -> !name.startsWith(".") && !name.startsWith("_"));
    Assert.assertNotNull(files);
    Path path = new Path(URI.create(files[0].getAbsolutePath()));
    try {
        Reader reader = OrcFile.createReader(path, OrcFile.readerOptions(new Configuration()));
        if (configure) {
            Assert.assertEquals("SNAPPY", reader.getCompressionKind().toString());
        } else {
            Assert.assertEquals("ZLIB", reader.getCompressionKind().toString());
        }
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) Reader(org.apache.orc.Reader) IOException(java.io.IOException) OrcFile(org.apache.orc.OrcFile) File(java.io.File)

Example 4 with Reader

use of org.apache.orc.Reader in project flink by apache.

the class OrcShimV200 method createRecordReader.

@Override
public RecordReader createRecordReader(Configuration conf, TypeDescription schema, int[] selectedFields, List<Predicate> conjunctPredicates, org.apache.flink.core.fs.Path path, long splitStart, long splitLength) throws IOException {
    // open ORC file and create reader
    Path hPath = new Path(path.toUri());
    Reader orcReader = createReader(hPath, conf);
    // get offset and length for the stripes that start in the split
    Tuple2<Long, Long> offsetAndLength = getOffsetAndLengthForSplit(splitStart, splitLength, orcReader.getStripes());
    // create ORC row reader configuration
    Reader.Options options = readOrcConf(new Reader.Options().schema(schema).range(offsetAndLength.f0, offsetAndLength.f1), conf);
    // configure filters
    if (!conjunctPredicates.isEmpty()) {
        SearchArgument.Builder b = SearchArgumentFactory.newBuilder();
        b = b.startAnd();
        for (Predicate predicate : conjunctPredicates) {
            predicate.add(b);
        }
        b = b.end();
        options.searchArgument(b.build(), new String[] {});
    }
    // configure selected fields
    options.include(computeProjectionMask(schema, selectedFields));
    // create ORC row reader
    RecordReader orcRowsReader = createRecordReader(orcReader, options);
    // assign ids
    schema.getId();
    return orcRowsReader;
}
Also used : Path(org.apache.hadoop.fs.Path) RecordReader(org.apache.orc.RecordReader) RecordReader(org.apache.orc.RecordReader) Reader(org.apache.orc.Reader) SearchArgument(org.apache.hadoop.hive.ql.io.sarg.SearchArgument) Predicate(org.apache.flink.orc.OrcFilters.Predicate)

Example 5 with Reader

use of org.apache.orc.Reader in project flink by apache.

the class OrcBulkRowDataWriterTest method validate.

private void validate(File files, List<RowData> expected) throws IOException {
    final File[] buckets = files.listFiles();
    assertNotNull(buckets);
    assertEquals(1, buckets.length);
    final File[] partFiles = buckets[0].listFiles();
    assertNotNull(partFiles);
    for (File partFile : partFiles) {
        assertTrue(partFile.length() > 0);
        OrcFile.ReaderOptions readerOptions = OrcFile.readerOptions(new Configuration());
        Reader reader = OrcFile.createReader(new org.apache.hadoop.fs.Path(partFile.toURI()), readerOptions);
        assertEquals(2, reader.getNumberOfRows());
        assertEquals(4, reader.getSchema().getFieldNames().size());
        assertSame(reader.getCompressionKind(), CompressionKind.LZ4);
        List<RowData> results = getResults(reader);
        assertEquals(2, results.size());
        assertEquals(results, expected);
    }
}
Also used : GenericRowData(org.apache.flink.table.data.GenericRowData) RowData(org.apache.flink.table.data.RowData) Configuration(org.apache.hadoop.conf.Configuration) OrcFile(org.apache.orc.OrcFile) Reader(org.apache.orc.Reader) RecordReader(org.apache.orc.RecordReader) OrcFile(org.apache.orc.OrcFile) File(java.io.File)

Aggregations

Reader (org.apache.orc.Reader)10 RecordReader (org.apache.orc.RecordReader)8 Path (org.apache.hadoop.fs.Path)5 OrcFile (org.apache.orc.OrcFile)4 File (java.io.File)3 IOException (java.io.IOException)3 Configuration (org.apache.hadoop.conf.Configuration)3 TypeDescription (org.apache.orc.TypeDescription)3 ProtoMessageReader (org.apache.tez.dag.history.logging.proto.ProtoMessageReader)2 InvocationTargetException (java.lang.reflect.InvocationTargetException)1 NoSuchElementException (java.util.NoSuchElementException)1 CleanableFile (org.apache.druid.data.input.InputEntity.CleanableFile)1 IntermediateRowParsingReader (org.apache.druid.data.input.IntermediateRowParsingReader)1 Closer (org.apache.druid.java.util.common.io.Closer)1 CloseableIterator (org.apache.druid.java.util.common.parsers.CloseableIterator)1 Predicate (org.apache.flink.orc.OrcFilters.Predicate)1 Record (org.apache.flink.orc.data.Record)1 GenericRowData (org.apache.flink.table.data.GenericRowData)1 RowData (org.apache.flink.table.data.RowData)1 FileStatus (org.apache.hadoop.fs.FileStatus)1