Search in sources :

Example 1 with IndexReference

use of org.apache.parquet.internal.hadoop.metadata.IndexReference in project presto by prestodb.

the class HdfsParquetDataSource method readColumnIndex.

@Override
public Optional<ColumnIndex> readColumnIndex(ColumnChunkMetaData column) throws IOException {
    IndexReference indexRef = column.getColumnIndexReference();
    if (indexRef == null) {
        return Optional.empty();
    }
    inputStream.seek(indexRef.getOffset());
    return Optional.of(ParquetMetadataConverter.fromParquetColumnIndex(column.getPrimitiveType(), Util.readColumnIndex(inputStream)));
}
Also used : IndexReference(org.apache.parquet.internal.hadoop.metadata.IndexReference)

Example 2 with IndexReference

use of org.apache.parquet.internal.hadoop.metadata.IndexReference in project parquet-mr by apache.

the class ParquetFileReader method readColumnIndex.

/**
 * @param column
 *          the column chunk which the column index is to be returned for
 * @return the column index for the specified column chunk or {@code null} if there is no index
 * @throws IOException
 *           if any I/O error occurs during reading the file
 */
@Private
public ColumnIndex readColumnIndex(ColumnChunkMetaData column) throws IOException {
    IndexReference ref = column.getColumnIndexReference();
    if (ref == null) {
        return null;
    }
    f.seek(ref.getOffset());
    BlockCipher.Decryptor columnIndexDecryptor = null;
    byte[] columnIndexAAD = null;
    if (null != fileDecryptor && !fileDecryptor.plaintextFile()) {
        InternalColumnDecryptionSetup columnDecryptionSetup = fileDecryptor.getColumnSetup(column.getPath());
        if (columnDecryptionSetup.isEncrypted()) {
            columnIndexDecryptor = columnDecryptionSetup.getMetaDataDecryptor();
            columnIndexAAD = AesCipher.createModuleAAD(fileDecryptor.getFileAAD(), ModuleType.ColumnIndex, column.getRowGroupOrdinal(), columnDecryptionSetup.getOrdinal(), -1);
        }
    }
    return ParquetMetadataConverter.fromParquetColumnIndex(column.getPrimitiveType(), Util.readColumnIndex(f, columnIndexDecryptor, columnIndexAAD));
}
Also used : BlockCipher(org.apache.parquet.format.BlockCipher) InternalColumnDecryptionSetup(org.apache.parquet.crypto.InternalColumnDecryptionSetup) IndexReference(org.apache.parquet.internal.hadoop.metadata.IndexReference) Private(org.apache.yetus.audience.InterfaceAudience.Private)

Example 3 with IndexReference

use of org.apache.parquet.internal.hadoop.metadata.IndexReference in project parquet-mr by apache.

the class ParquetFileWriter method serializeOffsetIndexes.

private static void serializeOffsetIndexes(List<List<OffsetIndex>> offsetIndexes, List<BlockMetaData> blocks, PositionOutputStream out, InternalFileEncryptor fileEncryptor) throws IOException {
    LOG.debug("{}: offset indexes", out.getPos());
    for (int bIndex = 0, bSize = blocks.size(); bIndex < bSize; ++bIndex) {
        BlockMetaData block = blocks.get(bIndex);
        List<ColumnChunkMetaData> columns = block.getColumns();
        List<OffsetIndex> blockOffsetIndexes = offsetIndexes.get(bIndex);
        for (int cIndex = 0, cSize = columns.size(); cIndex < cSize; ++cIndex) {
            OffsetIndex offsetIndex = blockOffsetIndexes.get(cIndex);
            if (offsetIndex == null) {
                continue;
            }
            ColumnChunkMetaData column = columns.get(cIndex);
            BlockCipher.Encryptor offsetIndexEncryptor = null;
            byte[] offsetIndexAAD = null;
            if (null != fileEncryptor) {
                InternalColumnEncryptionSetup columnEncryptionSetup = fileEncryptor.getColumnSetup(column.getPath(), false, cIndex);
                if (columnEncryptionSetup.isEncrypted()) {
                    offsetIndexEncryptor = columnEncryptionSetup.getMetaDataEncryptor();
                    offsetIndexAAD = AesCipher.createModuleAAD(fileEncryptor.getFileAAD(), ModuleType.OffsetIndex, block.getOrdinal(), columnEncryptionSetup.getOrdinal(), -1);
                }
            }
            long offset = out.getPos();
            Util.writeOffsetIndex(ParquetMetadataConverter.toParquetOffsetIndex(offsetIndex), out, offsetIndexEncryptor, offsetIndexAAD);
            column.setOffsetIndexReference(new IndexReference(offset, (int) (out.getPos() - offset)));
        }
    }
}
Also used : BlockMetaData(org.apache.parquet.hadoop.metadata.BlockMetaData) ColumnChunkMetaData(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData) BlockCipher(org.apache.parquet.format.BlockCipher) InternalColumnEncryptionSetup(org.apache.parquet.crypto.InternalColumnEncryptionSetup) IndexReference(org.apache.parquet.internal.hadoop.metadata.IndexReference) OffsetIndex(org.apache.parquet.internal.column.columnindex.OffsetIndex)

Example 4 with IndexReference

use of org.apache.parquet.internal.hadoop.metadata.IndexReference in project parquet-mr by apache.

the class ParquetFileWriter method serializeColumnIndexes.

private static void serializeColumnIndexes(List<List<ColumnIndex>> columnIndexes, List<BlockMetaData> blocks, PositionOutputStream out, InternalFileEncryptor fileEncryptor) throws IOException {
    LOG.debug("{}: column indexes", out.getPos());
    for (int bIndex = 0, bSize = blocks.size(); bIndex < bSize; ++bIndex) {
        BlockMetaData block = blocks.get(bIndex);
        List<ColumnChunkMetaData> columns = block.getColumns();
        List<ColumnIndex> blockColumnIndexes = columnIndexes.get(bIndex);
        for (int cIndex = 0, cSize = columns.size(); cIndex < cSize; ++cIndex) {
            ColumnChunkMetaData column = columns.get(cIndex);
            org.apache.parquet.format.ColumnIndex columnIndex = ParquetMetadataConverter.toParquetColumnIndex(column.getPrimitiveType(), blockColumnIndexes.get(cIndex));
            if (columnIndex == null) {
                continue;
            }
            BlockCipher.Encryptor columnIndexEncryptor = null;
            byte[] columnIndexAAD = null;
            if (null != fileEncryptor) {
                InternalColumnEncryptionSetup columnEncryptionSetup = fileEncryptor.getColumnSetup(column.getPath(), false, cIndex);
                if (columnEncryptionSetup.isEncrypted()) {
                    columnIndexEncryptor = columnEncryptionSetup.getMetaDataEncryptor();
                    columnIndexAAD = AesCipher.createModuleAAD(fileEncryptor.getFileAAD(), ModuleType.ColumnIndex, block.getOrdinal(), columnEncryptionSetup.getOrdinal(), -1);
                }
            }
            long offset = out.getPos();
            Util.writeColumnIndex(columnIndex, out, columnIndexEncryptor, columnIndexAAD);
            column.setColumnIndexReference(new IndexReference(offset, (int) (out.getPos() - offset)));
        }
    }
}
Also used : BlockMetaData(org.apache.parquet.hadoop.metadata.BlockMetaData) ColumnChunkMetaData(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData) BlockCipher(org.apache.parquet.format.BlockCipher) ColumnIndex(org.apache.parquet.internal.column.columnindex.ColumnIndex) InternalColumnEncryptionSetup(org.apache.parquet.crypto.InternalColumnEncryptionSetup) IndexReference(org.apache.parquet.internal.hadoop.metadata.IndexReference)

Example 5 with IndexReference

use of org.apache.parquet.internal.hadoop.metadata.IndexReference in project presto by prestodb.

the class HdfsParquetDataSource method readOffsetIndex.

@Override
public Optional<OffsetIndex> readOffsetIndex(ColumnChunkMetaData column) throws IOException {
    IndexReference indexRef = column.getOffsetIndexReference();
    if (indexRef == null) {
        return Optional.empty();
    }
    inputStream.seek(indexRef.getOffset());
    return Optional.of(ParquetMetadataConverter.fromParquetOffsetIndex(Util.readOffsetIndex(inputStream)));
}
Also used : IndexReference(org.apache.parquet.internal.hadoop.metadata.IndexReference)

Aggregations

IndexReference (org.apache.parquet.internal.hadoop.metadata.IndexReference)9 BlockCipher (org.apache.parquet.format.BlockCipher)6 InternalColumnEncryptionSetup (org.apache.parquet.crypto.InternalColumnEncryptionSetup)5 ColumnChunkMetaData (org.apache.parquet.hadoop.metadata.ColumnChunkMetaData)5 BlockMetaData (org.apache.parquet.hadoop.metadata.BlockMetaData)4 InternalColumnDecryptionSetup (org.apache.parquet.crypto.InternalColumnDecryptionSetup)2 ColumnIndex (org.apache.parquet.internal.column.columnindex.ColumnIndex)2 OffsetIndex (org.apache.parquet.internal.column.columnindex.OffsetIndex)2 Private (org.apache.yetus.audience.InterfaceAudience.Private)2 ByteArrayOutputStream (java.io.ByteArrayOutputStream)1 IOException (java.io.IOException)1 ArrayList (java.util.ArrayList)1 ParquetCryptoRuntimeException (org.apache.parquet.crypto.ParquetCryptoRuntimeException)1 ColumnChunk (org.apache.parquet.format.ColumnChunk)1 ColumnMetaData (org.apache.parquet.format.ColumnMetaData)1 RowGroup (org.apache.parquet.format.RowGroup)1 Util.writeColumnMetaData (org.apache.parquet.format.Util.writeColumnMetaData)1 ColumnPath (org.apache.parquet.hadoop.metadata.ColumnPath)1