Search in sources :

Example 1 with IntStatistics

use of org.apache.parquet.column.statistics.IntStatistics in project drill by apache.

the class ParquetMetaStatCollector method getStat.

private ColumnStatistics getStat(Object min, Object max, Long numNull, PrimitiveType.PrimitiveTypeName primitiveType, OriginalType originalType, Integer repetitionLevel) {
    Statistics stat = Statistics.getStatsBasedOnType(primitiveType);
    Statistics convertedStat = stat;
    TypeProtos.MajorType type = ParquetGroupScan.getType(primitiveType, originalType);
    // Change to repeated if repetitionLevel > 0
    if (repetitionLevel != null && repetitionLevel > 0) {
        type = TypeProtos.MajorType.newBuilder().setMinorType(type.getMinorType()).setMode(TypeProtos.DataMode.REPEATED).build();
    }
    if (numNull != null) {
        stat.setNumNulls(numNull.longValue());
    }
    if (min != null && max != null) {
        switch(type.getMinorType()) {
            case INT:
            case TIME:
                ((IntStatistics) stat).setMinMax(Integer.parseInt(min.toString()), Integer.parseInt(max.toString()));
                break;
            case BIGINT:
            case TIMESTAMP:
                ((LongStatistics) stat).setMinMax(Long.parseLong(min.toString()), Long.parseLong(max.toString()));
                break;
            case FLOAT4:
                ((FloatStatistics) stat).setMinMax(Float.parseFloat(min.toString()), Float.parseFloat(max.toString()));
                break;
            case FLOAT8:
                ((DoubleStatistics) stat).setMinMax(Double.parseDouble(min.toString()), Double.parseDouble(max.toString()));
                break;
            case DATE:
                convertedStat = new LongStatistics();
                convertedStat.setNumNulls(stat.getNumNulls());
                final long minMS = convertToDrillDateValue(Integer.parseInt(min.toString()));
                final long maxMS = convertToDrillDateValue(Integer.parseInt(max.toString()));
                ((LongStatistics) convertedStat).setMinMax(minMS, maxMS);
                break;
            default:
        }
    }
    return new ColumnStatistics(convertedStat, type);
}
Also used : LongStatistics(org.apache.parquet.column.statistics.LongStatistics) FloatStatistics(org.apache.parquet.column.statistics.FloatStatistics) IntStatistics(org.apache.parquet.column.statistics.IntStatistics) DoubleStatistics(org.apache.parquet.column.statistics.DoubleStatistics) BinaryStatistics(org.apache.parquet.column.statistics.BinaryStatistics) FloatStatistics(org.apache.parquet.column.statistics.FloatStatistics) Statistics(org.apache.parquet.column.statistics.Statistics) IntStatistics(org.apache.parquet.column.statistics.IntStatistics) DoubleStatistics(org.apache.parquet.column.statistics.DoubleStatistics) LongStatistics(org.apache.parquet.column.statistics.LongStatistics) TypeProtos(org.apache.drill.common.types.TypeProtos)

Example 2 with IntStatistics

use of org.apache.parquet.column.statistics.IntStatistics in project drill by apache.

the class RangeExprEvaluator method visitUnknown.

@Override
public Statistics visitUnknown(LogicalExpression e, Void value) throws RuntimeException {
    if (e instanceof TypedFieldExpr) {
        TypedFieldExpr fieldExpr = (TypedFieldExpr) e;
        final ColumnStatistics columnStatistics = columnStatMap.get(fieldExpr.getPath());
        if (columnStatistics != null) {
            return columnStatistics.getStatistics();
        } else {
            // field does not exist.
            Preconditions.checkArgument(fieldExpr.getMajorType().equals(Types.OPTIONAL_INT));
            IntStatistics intStatistics = new IntStatistics();
            // all values are nulls
            intStatistics.setNumNulls(rowCount);
            return intStatistics;
        }
    }
    return null;
}
Also used : ColumnStatistics(org.apache.drill.exec.store.parquet.stat.ColumnStatistics) IntStatistics(org.apache.parquet.column.statistics.IntStatistics)

Example 3 with IntStatistics

use of org.apache.parquet.column.statistics.IntStatistics in project drill by apache.

the class RangeExprEvaluator method getStatistics.

private IntStatistics getStatistics(int min, int max) {
    final IntStatistics intStatistics = new IntStatistics();
    intStatistics.setMinMax(min, max);
    return intStatistics;
}
Also used : IntStatistics(org.apache.parquet.column.statistics.IntStatistics)

Example 4 with IntStatistics

use of org.apache.parquet.column.statistics.IntStatistics in project drill by apache.

the class RangeExprEvaluator method evalCastFunc.

private Statistics evalCastFunc(FunctionHolderExpression holderExpr, Statistics input) {
    try {
        DrillSimpleFuncHolder funcHolder = (DrillSimpleFuncHolder) holderExpr.getHolder();
        DrillSimpleFunc interpreter = funcHolder.createInterpreter();
        final ValueHolder minHolder, maxHolder;
        TypeProtos.MinorType srcType = holderExpr.args.get(0).getMajorType().getMinorType();
        TypeProtos.MinorType destType = holderExpr.getMajorType().getMinorType();
        if (srcType.equals(destType)) {
            // same type cast ==> NoOp.
            return input;
        } else if (!CAST_FUNC.containsKey(srcType) || !CAST_FUNC.get(srcType).contains(destType)) {
            // cast func between srcType and destType is NOT allowed.
            return null;
        }
        switch(srcType) {
            case INT:
                minHolder = ValueHolderHelper.getIntHolder(((IntStatistics) input).getMin());
                maxHolder = ValueHolderHelper.getIntHolder(((IntStatistics) input).getMax());
                break;
            case BIGINT:
                minHolder = ValueHolderHelper.getBigIntHolder(((LongStatistics) input).getMin());
                maxHolder = ValueHolderHelper.getBigIntHolder(((LongStatistics) input).getMax());
                break;
            case FLOAT4:
                minHolder = ValueHolderHelper.getFloat4Holder(((FloatStatistics) input).getMin());
                maxHolder = ValueHolderHelper.getFloat4Holder(((FloatStatistics) input).getMax());
                break;
            case FLOAT8:
                minHolder = ValueHolderHelper.getFloat8Holder(((DoubleStatistics) input).getMin());
                maxHolder = ValueHolderHelper.getFloat8Holder(((DoubleStatistics) input).getMax());
                break;
            default:
                return null;
        }
        final ValueHolder[] args1 = { minHolder };
        final ValueHolder[] args2 = { maxHolder };
        final ValueHolder minFuncHolder = InterpreterEvaluator.evaluateFunction(interpreter, args1, holderExpr.getName());
        final ValueHolder maxFuncHolder = InterpreterEvaluator.evaluateFunction(interpreter, args2, holderExpr.getName());
        switch(destType) {
            //TODO : need handle # of nulls.
            case INT:
                return getStatistics(((IntHolder) minFuncHolder).value, ((IntHolder) maxFuncHolder).value);
            case BIGINT:
                return getStatistics(((BigIntHolder) minFuncHolder).value, ((BigIntHolder) maxFuncHolder).value);
            case FLOAT4:
                return getStatistics(((Float4Holder) minFuncHolder).value, ((Float4Holder) maxFuncHolder).value);
            case FLOAT8:
                return getStatistics(((Float8Holder) minFuncHolder).value, ((Float8Holder) maxFuncHolder).value);
            default:
                return null;
        }
    } catch (Exception e) {
        throw new DrillRuntimeException("Error in evaluating function of " + holderExpr.getName());
    }
}
Also used : LongStatistics(org.apache.parquet.column.statistics.LongStatistics) FloatStatistics(org.apache.parquet.column.statistics.FloatStatistics) IntStatistics(org.apache.parquet.column.statistics.IntStatistics) DoubleStatistics(org.apache.parquet.column.statistics.DoubleStatistics) ValueHolder(org.apache.drill.exec.expr.holders.ValueHolder) DrillRuntimeException(org.apache.drill.common.exceptions.DrillRuntimeException) TypeProtos(org.apache.drill.common.types.TypeProtos) DrillRuntimeException(org.apache.drill.common.exceptions.DrillRuntimeException) DrillSimpleFuncHolder(org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder) DrillSimpleFunc(org.apache.drill.exec.expr.DrillSimpleFunc)

Example 5 with IntStatistics

use of org.apache.parquet.column.statistics.IntStatistics in project drill by apache.

the class ParquetFooterStatCollector method convertDateStatIfNecessary.

public static Statistics convertDateStatIfNecessary(Statistics stat, ParquetReaderUtility.DateCorruptionStatus containsCorruptDates) {
    IntStatistics dateStat = (IntStatistics) stat;
    LongStatistics dateMLS = new LongStatistics();
    boolean isDateCorrect = containsCorruptDates == ParquetReaderUtility.DateCorruptionStatus.META_SHOWS_NO_CORRUPTION;
    // Only do conversion when stat is NOT empty.
    if (!dateStat.isEmpty()) {
        dateMLS.setMinMax(convertToDrillDateValue(dateStat.getMin(), isDateCorrect), convertToDrillDateValue(dateStat.getMax(), isDateCorrect));
        dateMLS.setNumNulls(dateStat.getNumNulls());
    }
    return dateMLS;
}
Also used : LongStatistics(org.apache.parquet.column.statistics.LongStatistics) IntStatistics(org.apache.parquet.column.statistics.IntStatistics)

Aggregations

IntStatistics (org.apache.parquet.column.statistics.IntStatistics)5 LongStatistics (org.apache.parquet.column.statistics.LongStatistics)3 TypeProtos (org.apache.drill.common.types.TypeProtos)2 DoubleStatistics (org.apache.parquet.column.statistics.DoubleStatistics)2 FloatStatistics (org.apache.parquet.column.statistics.FloatStatistics)2 DrillRuntimeException (org.apache.drill.common.exceptions.DrillRuntimeException)1 DrillSimpleFunc (org.apache.drill.exec.expr.DrillSimpleFunc)1 DrillSimpleFuncHolder (org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder)1 ValueHolder (org.apache.drill.exec.expr.holders.ValueHolder)1 ColumnStatistics (org.apache.drill.exec.store.parquet.stat.ColumnStatistics)1 BinaryStatistics (org.apache.parquet.column.statistics.BinaryStatistics)1 Statistics (org.apache.parquet.column.statistics.Statistics)1