Search in sources :

Example 1 with ParquetReadOptions

use of org.apache.parquet.ParquetReadOptions in project parquet-mr by apache.

the class ParquetFileReader method readFooter.

/**
 * Reads the meta data block in the footer of the file using provided input stream
 * @param file a {@link InputFile} to read
 * @param filter the filter to apply to row groups
 * @return the metadata blocks in the footer
 * @throws IOException if an error occurs while reading the file
 * @deprecated will be removed in 2.0.0;
 *             use {@link ParquetFileReader#open(InputFile, ParquetReadOptions)}
 */
@Deprecated
public static final ParquetMetadata readFooter(InputFile file, MetadataFilter filter) throws IOException {
    ParquetReadOptions options;
    if (file instanceof HadoopInputFile) {
        HadoopInputFile hadoopFile = (HadoopInputFile) file;
        options = HadoopReadOptions.builder(hadoopFile.getConfiguration(), hadoopFile.getPath()).withMetadataFilter(filter).build();
    } else {
        options = ParquetReadOptions.builder().withMetadataFilter(filter).build();
    }
    try (SeekableInputStream in = file.newStream()) {
        return readFooter(file, options, in);
    }
}
Also used : SeekableInputStream(org.apache.parquet.io.SeekableInputStream) HadoopInputFile(org.apache.parquet.hadoop.util.HadoopInputFile) ParquetReadOptions(org.apache.parquet.ParquetReadOptions)

Example 2 with ParquetReadOptions

use of org.apache.parquet.ParquetReadOptions in project parquet-mr by apache.

the class ColumnEncryptorTest method getParquetMetadata.

private ParquetMetadata getParquetMetadata(FileDecryptionProperties decryptionProperties) throws IOException {
    ParquetMetadata metaData;
    ParquetReadOptions readOptions = ParquetReadOptions.builder().withDecryption(decryptionProperties).build();
    InputFile file = HadoopInputFile.fromPath(new Path(outputFile), conf);
    try (SeekableInputStream in = file.newStream()) {
        metaData = ParquetFileReader.readFooter(file, readOptions, in);
    }
    return metaData;
}
Also used : Path(org.apache.hadoop.fs.Path) SeekableInputStream(org.apache.parquet.io.SeekableInputStream) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) ParquetReadOptions(org.apache.parquet.ParquetReadOptions) InputFile(org.apache.parquet.io.InputFile)

Example 3 with ParquetReadOptions

use of org.apache.parquet.ParquetReadOptions in project pxf by greenplum-db.

the class ParquetWriteTest method validateFooter.

private MessageType validateFooter(Path parquetFile, int numCols, int numRows) throws IOException {
    ParquetReadOptions parquetReadOptions = HadoopReadOptions.builder(configuration).build();
    HadoopInputFile inputFile = HadoopInputFile.fromPath(parquetFile, configuration);
    try (ParquetFileReader parquetFileReader = ParquetFileReader.open(inputFile, parquetReadOptions)) {
        FileMetaData metadata = parquetFileReader.getFileMetaData();
        ParquetMetadata readFooter = parquetFileReader.getFooter();
        // one block
        assertEquals(1, readFooter.getBlocks().size());
        BlockMetaData block0 = readFooter.getBlocks().get(0);
        // one column
        assertEquals(numCols, block0.getColumns().size());
        // 10 rows in this block
        assertEquals(numRows, block0.getRowCount());
        ColumnChunkMetaData column0 = block0.getColumns().get(0);
        assertEquals(CompressionCodecName.SNAPPY, column0.getCodec());
        return metadata.getSchema();
    }
}
Also used : BlockMetaData(org.apache.parquet.hadoop.metadata.BlockMetaData) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) ColumnChunkMetaData(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData) ParquetFileReader(org.apache.parquet.hadoop.ParquetFileReader) HadoopInputFile(org.apache.parquet.hadoop.util.HadoopInputFile) ParquetReadOptions(org.apache.parquet.ParquetReadOptions) FileMetaData(org.apache.parquet.hadoop.metadata.FileMetaData)

Example 4 with ParquetReadOptions

use of org.apache.parquet.ParquetReadOptions in project drill by apache.

the class TestParquetReaderConfig method testReadOptions.

@Test
public void testReadOptions() {
    // set enableStringsSignedMinMax to true
    ParquetReaderConfig readerConfig = new ParquetReaderConfig(false, false, false, true, true);
    ParquetReadOptions readOptions = readerConfig.toReadOptions();
    assertTrue(readOptions.useSignedStringMinMax());
    // set enableStringsSignedMinMax to false
    readerConfig = new ParquetReaderConfig(false, false, false, true, false);
    readOptions = readerConfig.toReadOptions();
    assertFalse(readOptions.useSignedStringMinMax());
}
Also used : ParquetReadOptions(org.apache.parquet.ParquetReadOptions) ParquetTest(org.apache.drill.categories.ParquetTest) Test(org.junit.Test) BaseTest(org.apache.drill.test.BaseTest) UnlikelyTest(org.apache.drill.categories.UnlikelyTest)

Example 5 with ParquetReadOptions

use of org.apache.parquet.ParquetReadOptions in project parquet-mr by apache.

the class ColumnEncryptorTest method verifyOffsetIndexes.

private void verifyOffsetIndexes() throws IOException {
    ParquetReadOptions readOptions = HadoopReadOptions.builder(conf).withDecryption(EncDecProperties.getFileDecryptionProperties()).build();
    try (TransParquetFileReader inReader = createFileReader(inputFile.getFileName());
        TransParquetFileReader outReader = createFileReader(outputFile)) {
        ParquetMetadata inMetaData = getMetadata(readOptions, inputFile.getFileName(), inReader);
        ParquetMetadata outMetaData = getMetadata(readOptions, outputFile, outReader);
        compareOffsetIndexes(inReader, outReader, inMetaData, outMetaData);
    }
}
Also used : ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) TransParquetFileReader(org.apache.parquet.hadoop.util.CompressionConverter.TransParquetFileReader) ParquetReadOptions(org.apache.parquet.ParquetReadOptions)

Aggregations

ParquetReadOptions (org.apache.parquet.ParquetReadOptions)6 ParquetMetadata (org.apache.parquet.hadoop.metadata.ParquetMetadata)3 HadoopInputFile (org.apache.parquet.hadoop.util.HadoopInputFile)3 ParquetFileReader (org.apache.parquet.hadoop.ParquetFileReader)2 FileMetaData (org.apache.parquet.hadoop.metadata.FileMetaData)2 SeekableInputStream (org.apache.parquet.io.SeekableInputStream)2 IOException (java.io.IOException)1 ParquetTest (org.apache.drill.categories.ParquetTest)1 UnlikelyTest (org.apache.drill.categories.UnlikelyTest)1 BaseTest (org.apache.drill.test.BaseTest)1 Path (org.apache.hadoop.fs.Path)1 ParquetMetadataConverter (org.apache.parquet.format.converter.ParquetMetadataConverter)1 BlockMetaData (org.apache.parquet.hadoop.metadata.BlockMetaData)1 ColumnChunkMetaData (org.apache.parquet.hadoop.metadata.ColumnChunkMetaData)1 TransParquetFileReader (org.apache.parquet.hadoop.util.CompressionConverter.TransParquetFileReader)1 InputFile (org.apache.parquet.io.InputFile)1 UnsupportedTypeException (org.greenplum.pxf.api.error.UnsupportedTypeException)1 Test (org.junit.Test)1