Search in sources :

Example 1 with Statistics

use of org.apache.parquet.format.Statistics in project parquet-mr by apache.

the class ParquetMetadataConverter method toParquetStatistics.

public static Statistics toParquetStatistics(org.apache.parquet.column.statistics.Statistics stats) {
    Statistics formatStats = new Statistics();
    // value has been truncated and is a lower bound and not in the page.
    if (!stats.isEmpty() && stats.isSmallerThan(MAX_STATS_SIZE)) {
        formatStats.setNull_count(stats.getNumNulls());
        if (stats.hasNonNullValue()) {
            byte[] min = stats.getMinBytes();
            byte[] max = stats.getMaxBytes();
            // trivially true for equal min-max values)
            if (sortOrder(stats.type()) == SortOrder.SIGNED || Arrays.equals(min, max)) {
                formatStats.setMin(min);
                formatStats.setMax(max);
            }
            if (isMinMaxStatsSupported(stats.type()) || Arrays.equals(min, max)) {
                formatStats.setMin_value(min);
                formatStats.setMax_value(max);
            }
        }
    }
    return formatStats;
}
Also used : Statistics(org.apache.parquet.format.Statistics) CorruptStatistics(org.apache.parquet.CorruptStatistics)

Example 2 with Statistics

use of org.apache.parquet.format.Statistics in project parquet-mr by apache.

the class TestParquetFileWriter method testConvertToThriftStatistics.

@Test
public void testConvertToThriftStatistics() throws Exception {
    long[] longArray = new long[] { 39L, 99L, 12L, 1000L, 65L, 542L, 2533461316L, -253346131996L, Long.MAX_VALUE, Long.MIN_VALUE };
    LongStatistics parquetMRstats = new LongStatistics();
    for (long l : longArray) {
        parquetMRstats.updateStats(l);
    }
    final String createdBy = "parquet-mr version 1.8.0 (build d4d5a07ec9bd262ca1e93c309f1d7d4a74ebda4c)";
    Statistics thriftStats = org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(parquetMRstats);
    LongStatistics convertedBackStats = (LongStatistics) org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(createdBy, thriftStats, PrimitiveTypeName.INT64);
    assertEquals(parquetMRstats.getMax(), convertedBackStats.getMax());
    assertEquals(parquetMRstats.getMin(), convertedBackStats.getMin());
    assertEquals(parquetMRstats.getNumNulls(), convertedBackStats.getNumNulls());
}
Also used : LongStatistics(org.apache.parquet.column.statistics.LongStatistics) CorruptStatistics.shouldIgnoreStatistics(org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics) BinaryStatistics(org.apache.parquet.column.statistics.BinaryStatistics) Statistics(org.apache.parquet.format.Statistics) LongStatistics(org.apache.parquet.column.statistics.LongStatistics) Test(org.junit.Test)

Example 3 with Statistics

use of org.apache.parquet.format.Statistics in project presto by prestodb.

the class AbstractTestParquetReader method testNullableNullCount.

@Test
public void testNullableNullCount() {
    PrimitiveType primitiveType = new PrimitiveType(OPTIONAL, BINARY, "testColumn");
    Statistics statistics = new Statistics();
    assertEquals(MetadataReader.readStats(statistics, primitiveType.getPrimitiveTypeName()).getNumNulls(), -1);
    statistics.setNull_count(10);
    assertEquals(MetadataReader.readStats(statistics, primitiveType.getPrimitiveTypeName()).getNumNulls(), 10);
}
Also used : PrimitiveType(org.apache.parquet.schema.PrimitiveType) Statistics(org.apache.parquet.format.Statistics) Test(org.testng.annotations.Test)

Aggregations

Statistics (org.apache.parquet.format.Statistics)3 CorruptStatistics (org.apache.parquet.CorruptStatistics)1 CorruptStatistics.shouldIgnoreStatistics (org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics)1 BinaryStatistics (org.apache.parquet.column.statistics.BinaryStatistics)1 LongStatistics (org.apache.parquet.column.statistics.LongStatistics)1 PrimitiveType (org.apache.parquet.schema.PrimitiveType)1 Test (org.junit.Test)1 Test (org.testng.annotations.Test)1