Search in sources :

Example 1 with PrimitiveColumnIO

use of parquet.io.PrimitiveColumnIO in project presto by prestodb.

the class ParquetReader method initializeColumnReaders.

private void initializeColumnReaders() {
    for (PrimitiveColumnIO columnIO : getColumns(fileSchema, requestedSchema)) {
        ColumnDescriptor descriptor = columnIO.getColumnDescriptor();
        RichColumnDescriptor column = new RichColumnDescriptor(descriptor.getPath(), columnIO.getType().asPrimitiveType(), descriptor.getMaxRepetitionLevel(), descriptor.getMaxDefinitionLevel());
        columnReadersMap.put(column, ParquetColumnReader.createReader(column));
    }
}
Also used : RichColumnDescriptor(com.facebook.presto.hive.parquet.RichColumnDescriptor) RichColumnDescriptor(com.facebook.presto.hive.parquet.RichColumnDescriptor) ColumnDescriptor(parquet.column.ColumnDescriptor) PrimitiveColumnIO(parquet.io.PrimitiveColumnIO)

Example 2 with PrimitiveColumnIO

use of parquet.io.PrimitiveColumnIO in project presto by prestodb.

the class ParquetReader method nextBatch.

public int nextBatch() {
    if (nextRowInGroup >= currentGroupRowCount && !advanceToNextRowGroup()) {
        return -1;
    }
    batchSize = toIntExact(min(MAX_VECTOR_LENGTH, currentGroupRowCount - nextRowInGroup));
    nextRowInGroup += batchSize;
    currentPosition += batchSize;
    for (PrimitiveColumnIO columnIO : getColumns(fileSchema, requestedSchema)) {
        ColumnDescriptor descriptor = columnIO.getColumnDescriptor();
        RichColumnDescriptor column = new RichColumnDescriptor(descriptor.getPath(), columnIO.getType().asPrimitiveType(), descriptor.getMaxRepetitionLevel(), descriptor.getMaxDefinitionLevel());
        ParquetColumnReader columnReader = columnReadersMap.get(column);
        columnReader.prepareNextRead(batchSize);
    }
    return batchSize;
}
Also used : RichColumnDescriptor(com.facebook.presto.hive.parquet.RichColumnDescriptor) RichColumnDescriptor(com.facebook.presto.hive.parquet.RichColumnDescriptor) ColumnDescriptor(parquet.column.ColumnDescriptor) PrimitiveColumnIO(parquet.io.PrimitiveColumnIO)

Example 3 with PrimitiveColumnIO

use of parquet.io.PrimitiveColumnIO in project presto by prestodb.

the class ParquetTypeUtils method getDescriptor.

public static Optional<RichColumnDescriptor> getDescriptor(MessageType fileSchema, MessageType requestedSchema, List<String> path) {
    checkArgument(path.size() >= 1, "Parquet nested path should have at least one component");
    int level = path.size();
    for (PrimitiveColumnIO columnIO : getColumns(fileSchema, requestedSchema)) {
        ColumnIO[] fields = columnIO.getPath();
        if (fields.length <= level) {
            continue;
        }
        if (fields[level].getName().equalsIgnoreCase(path.get(level - 1))) {
            boolean match = true;
            for (int i = 0; i < level - 1; i++) {
                if (!fields[i + 1].getName().equalsIgnoreCase(path.get(i))) {
                    match = false;
                }
            }
            if (match) {
                ColumnDescriptor descriptor = columnIO.getColumnDescriptor();
                return Optional.of(new RichColumnDescriptor(descriptor.getPath(), columnIO.getType().asPrimitiveType(), descriptor.getMaxRepetitionLevel(), descriptor.getMaxDefinitionLevel()));
            }
        }
    }
    return empty();
}
Also used : ColumnDescriptor(parquet.column.ColumnDescriptor) ColumnIO(parquet.io.ColumnIO) PrimitiveColumnIO(parquet.io.PrimitiveColumnIO) PrimitiveColumnIO(parquet.io.PrimitiveColumnIO)

Aggregations

ColumnDescriptor (parquet.column.ColumnDescriptor)3 PrimitiveColumnIO (parquet.io.PrimitiveColumnIO)3 RichColumnDescriptor (com.facebook.presto.hive.parquet.RichColumnDescriptor)2 ColumnIO (parquet.io.ColumnIO)1