Search in sources :

Example 1 with Predicate

use of org.apache.flink.orc.OrcFilters.Predicate in project flink by apache.

the class OrcColumnarRowInputFormatTest method testReadFileAndRestoreWithFilter.

@Test
public void testReadFileAndRestoreWithFilter() throws IOException {
    List<Predicate> filter = Collections.singletonList(new Or(new Between("_col0", PredicateLeaf.Type.LONG, 0L, 975000L), new Equals("_col0", PredicateLeaf.Type.LONG, 980001L), new Between("_col0", PredicateLeaf.Type.LONG, 990000L, 1800000L)));
    OrcColumnarRowInputFormat<?, FileSourceSplit> format = createFormat(FLAT_FILE_TYPE, new int[] { 0, 1 }, filter);
    // pick a middle split
    FileSourceSplit split = createSplits(flatFile, 1).get(0);
    int breakCnt = 975001;
    int expectedCnt = 1795000;
    long expectedTotalF0 = 1615113397500L;
    innerTestRestore(format, split, breakCnt, expectedCnt, expectedTotalF0);
}
Also used : Equals(org.apache.flink.orc.OrcFilters.Equals) Assert.assertEquals(org.junit.Assert.assertEquals) Or(org.apache.flink.orc.OrcFilters.Or) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) Between(org.apache.flink.orc.OrcFilters.Between) Predicate(org.apache.flink.orc.OrcFilters.Predicate) Test(org.junit.Test)

Example 2 with Predicate

use of org.apache.flink.orc.OrcFilters.Predicate in project flink by apache.

the class OrcShimV200 method createRecordReader.

@Override
public RecordReader createRecordReader(Configuration conf, TypeDescription schema, int[] selectedFields, List<Predicate> conjunctPredicates, org.apache.flink.core.fs.Path path, long splitStart, long splitLength) throws IOException {
    // open ORC file and create reader
    Path hPath = new Path(path.toUri());
    Reader orcReader = createReader(hPath, conf);
    // get offset and length for the stripes that start in the split
    Tuple2<Long, Long> offsetAndLength = getOffsetAndLengthForSplit(splitStart, splitLength, orcReader.getStripes());
    // create ORC row reader configuration
    Reader.Options options = readOrcConf(new Reader.Options().schema(schema).range(offsetAndLength.f0, offsetAndLength.f1), conf);
    // configure filters
    if (!conjunctPredicates.isEmpty()) {
        SearchArgument.Builder b = SearchArgumentFactory.newBuilder();
        b = b.startAnd();
        for (Predicate predicate : conjunctPredicates) {
            predicate.add(b);
        }
        b = b.end();
        options.searchArgument(b.build(), new String[] {});
    }
    // configure selected fields
    options.include(computeProjectionMask(schema, selectedFields));
    // create ORC row reader
    RecordReader orcRowsReader = createRecordReader(orcReader, options);
    // assign ids
    schema.getId();
    return orcRowsReader;
}
Also used : Path(org.apache.hadoop.fs.Path) RecordReader(org.apache.orc.RecordReader) RecordReader(org.apache.orc.RecordReader) Reader(org.apache.orc.Reader) SearchArgument(org.apache.hadoop.hive.ql.io.sarg.SearchArgument) Predicate(org.apache.flink.orc.OrcFilters.Predicate)

Aggregations

Predicate (org.apache.flink.orc.OrcFilters.Predicate)2 FileSourceSplit (org.apache.flink.connector.file.src.FileSourceSplit)1 Between (org.apache.flink.orc.OrcFilters.Between)1 Equals (org.apache.flink.orc.OrcFilters.Equals)1 Or (org.apache.flink.orc.OrcFilters.Or)1 Path (org.apache.hadoop.fs.Path)1 SearchArgument (org.apache.hadoop.hive.ql.io.sarg.SearchArgument)1 Reader (org.apache.orc.Reader)1 RecordReader (org.apache.orc.RecordReader)1 Assert.assertEquals (org.junit.Assert.assertEquals)1 Test (org.junit.Test)1