Search in sources :

Example 11 with Filter

use of org.apache.parquet.filter2.compat.FilterCompat.Filter in project parquet-mr by apache.

the class ParquetLoader method setPushdownPredicate.

@Override
public void setPushdownPredicate(Expression e) throws IOException {
    LOG.info("Pig pushdown expression: {}", e);
    FilterPredicate pred = buildFilter(e);
    LOG.info("Parquet filter predicate expression: {}", pred);
    storeInUDFContext(ParquetInputFormat.FILTER_PREDICATE, pred);
}
Also used : FilterPredicate(org.apache.parquet.filter2.predicate.FilterPredicate)

Example 12 with Filter

use of org.apache.parquet.filter2.compat.FilterCompat.Filter in project parquet-mr by apache.

the class MessageColumnIO method getRecordReader.

public <T> RecordReader<T> getRecordReader(final PageReadStore columns, final RecordMaterializer<T> recordMaterializer, final Filter filter) {
    checkNotNull(columns, "columns");
    checkNotNull(recordMaterializer, "recordMaterializer");
    checkNotNull(filter, "filter");
    if (leaves.isEmpty()) {
        return new EmptyRecordReader<T>(recordMaterializer);
    }
    return filter.accept(new Visitor<RecordReader<T>>() {

        @Override
        public RecordReader<T> visit(FilterPredicateCompat filterPredicateCompat) {
            FilterPredicate predicate = filterPredicateCompat.getFilterPredicate();
            IncrementallyUpdatedFilterPredicateBuilder builder = new IncrementallyUpdatedFilterPredicateBuilder(leaves);
            IncrementallyUpdatedFilterPredicate streamingPredicate = builder.build(predicate);
            RecordMaterializer<T> filteringRecordMaterializer = new FilteringRecordMaterializer<T>(recordMaterializer, leaves, builder.getValueInspectorsByColumn(), streamingPredicate);
            return new RecordReaderImplementation<T>(MessageColumnIO.this, filteringRecordMaterializer, validating, new ColumnReadStoreImpl(columns, filteringRecordMaterializer.getRootConverter(), getType(), createdBy));
        }

        @Override
        public RecordReader<T> visit(UnboundRecordFilterCompat unboundRecordFilterCompat) {
            return new FilteredRecordReader<T>(MessageColumnIO.this, recordMaterializer, validating, new ColumnReadStoreImpl(columns, recordMaterializer.getRootConverter(), getType(), createdBy), unboundRecordFilterCompat.getUnboundRecordFilter(), columns.getRowCount());
        }

        @Override
        public RecordReader<T> visit(NoOpFilter noOpFilter) {
            return new RecordReaderImplementation<T>(MessageColumnIO.this, recordMaterializer, validating, new ColumnReadStoreImpl(columns, recordMaterializer.getRootConverter(), getType(), createdBy));
        }
    });
}
Also used : ColumnReadStoreImpl(org.apache.parquet.column.impl.ColumnReadStoreImpl) NoOpFilter(org.apache.parquet.filter2.compat.FilterCompat.NoOpFilter) FilteringRecordMaterializer(org.apache.parquet.filter2.recordlevel.FilteringRecordMaterializer) RecordMaterializer(org.apache.parquet.io.api.RecordMaterializer) FilterPredicateCompat(org.apache.parquet.filter2.compat.FilterCompat.FilterPredicateCompat) IncrementallyUpdatedFilterPredicateBuilder(org.apache.parquet.filter2.recordlevel.IncrementallyUpdatedFilterPredicateBuilder) FilterPredicate(org.apache.parquet.filter2.predicate.FilterPredicate) IncrementallyUpdatedFilterPredicate(org.apache.parquet.filter2.recordlevel.IncrementallyUpdatedFilterPredicate) UnboundRecordFilterCompat(org.apache.parquet.filter2.compat.FilterCompat.UnboundRecordFilterCompat) IncrementallyUpdatedFilterPredicate(org.apache.parquet.filter2.recordlevel.IncrementallyUpdatedFilterPredicate)

Example 13 with Filter

use of org.apache.parquet.filter2.compat.FilterCompat.Filter in project parquet-mr by apache.

the class FilterCompat method get.

/**
 * Given a FilterPredicate, return a Filter that wraps it.
 * This method also logs the filter being used and rewrites
 * the predicate to not include the not() operator.
 */
public static Filter get(FilterPredicate filterPredicate) {
    checkNotNull(filterPredicate, "filterPredicate");
    LOG.info("Filtering using predicate: {}", filterPredicate);
    // rewrite the predicate to not include the not() operator
    FilterPredicate collapsedPredicate = LogicalInverseRewriter.rewrite(filterPredicate);
    if (!filterPredicate.equals(collapsedPredicate)) {
        LOG.info("Predicate has been collapsed to: {}", collapsedPredicate);
    }
    return new FilterPredicateCompat(collapsedPredicate);
}
Also used : FilterPredicate(org.apache.parquet.filter2.predicate.FilterPredicate)

Aggregations

FilterPredicate (org.apache.parquet.filter2.predicate.FilterPredicate)9 BlockMetaData (org.apache.parquet.hadoop.metadata.BlockMetaData)5 ArrayList (java.util.ArrayList)4 ParquetMetadata (org.apache.parquet.hadoop.metadata.ParquetMetadata)4 Path (org.apache.hadoop.fs.Path)3 FilterCompat (org.apache.parquet.filter2.compat.FilterCompat)3 MessageType (org.apache.parquet.schema.MessageType)3 HashSet (java.util.HashSet)2 FileSystem (org.apache.hadoop.fs.FileSystem)2 SearchArgument (org.apache.hadoop.hive.ql.io.sarg.SearchArgument)2 UnboundRecordFilter (org.apache.parquet.filter.UnboundRecordFilter)2 Filter (org.apache.parquet.filter2.compat.FilterCompat.Filter)2 FilterPredicateCompat (org.apache.parquet.filter2.compat.FilterCompat.FilterPredicateCompat)2 ParquetFileReader (org.apache.parquet.hadoop.ParquetFileReader)2 ParquetInputSplit (org.apache.parquet.hadoop.ParquetInputSplit)2 Test (org.junit.Test)2 Configuration (org.apache.hadoop.conf.Configuration)1 BlockLocation (org.apache.hadoop.fs.BlockLocation)1 FileStatus (org.apache.hadoop.fs.FileStatus)1 PathFilter (org.apache.hadoop.fs.PathFilter)1