Search in sources :

Example 6 with PageHeader

use of org.apache.parquet.format.PageHeader in project parquet-mr by apache.

the class TestParquetMetadataConverter method testPageHeader.

@Test
public void testPageHeader() throws IOException {
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    PageType type = PageType.DATA_PAGE;
    int compSize = 10;
    int uncSize = 20;
    PageHeader pageHeader = new PageHeader(type, uncSize, compSize);
    writePageHeader(pageHeader, out);
    PageHeader readPageHeader = readPageHeader(new ByteArrayInputStream(out.toByteArray()));
    assertEquals(pageHeader, readPageHeader);
}
Also used : Util.readPageHeader(org.apache.parquet.format.Util.readPageHeader) PageHeader(org.apache.parquet.format.PageHeader) Util.writePageHeader(org.apache.parquet.format.Util.writePageHeader) ByteArrayInputStream(java.io.ByteArrayInputStream) ByteArrayOutputStream(java.io.ByteArrayOutputStream) PageType(org.apache.parquet.format.PageType) ParquetMetadataConverter.filterFileMetaDataByMidpoint(org.apache.parquet.format.converter.ParquetMetadataConverter.filterFileMetaDataByMidpoint) Test(org.junit.Test)

Example 7 with PageHeader

use of org.apache.parquet.format.PageHeader in project drill by axbaretto.

the class PageReader method loadDictionaryIfExists.

protected void loadDictionaryIfExists(final org.apache.drill.exec.store.parquet.columnreaders.ColumnReader<?> parentStatus, final ColumnChunkMetaData columnChunkMetaData, final DirectBufInputStream f) throws IOException {
    Stopwatch timer = Stopwatch.createUnstarted();
    if (columnChunkMetaData.getDictionaryPageOffset() > 0) {
        long bytesToSkip = columnChunkMetaData.getDictionaryPageOffset() - dataReader.getPos();
        while (bytesToSkip > 0) {
            long skipped = dataReader.skip(bytesToSkip);
            if (skipped > 0) {
                bytesToSkip -= skipped;
            } else {
                // no good way to handle this. Guava uses InputStream.available to check
                // if EOF is reached and because available is not reliable,
                // tries to read the rest of the data.
                DrillBuf skipBuf = dataReader.getNext((int) bytesToSkip);
                if (skipBuf != null) {
                    skipBuf.release();
                } else {
                    throw new EOFException("End of File reachecd.");
                }
            }
        }
        long start = dataReader.getPos();
        timer.start();
        final PageHeader pageHeader = Util.readPageHeader(f);
        long timeToRead = timer.elapsed(TimeUnit.NANOSECONDS);
        long pageHeaderBytes = dataReader.getPos() - start;
        this.updateStats(pageHeader, "Page Header", start, timeToRead, pageHeaderBytes, pageHeaderBytes);
        assert pageHeader.type == PageType.DICTIONARY_PAGE;
        readDictionaryPage(pageHeader, parentStatus);
    }
}
Also used : PageHeader(org.apache.parquet.format.PageHeader) Stopwatch(com.google.common.base.Stopwatch) EOFException(java.io.EOFException) DrillBuf(io.netty.buffer.DrillBuf)

Aggregations

PageHeader (org.apache.parquet.format.PageHeader)7 DataPageHeader (org.apache.parquet.format.DataPageHeader)4 DictionaryPageHeader (org.apache.parquet.format.DictionaryPageHeader)4 Util.writePageHeader (org.apache.parquet.format.Util.writePageHeader)4 Stopwatch (com.google.common.base.Stopwatch)2 DrillBuf (io.netty.buffer.DrillBuf)1 ByteArrayInputStream (java.io.ByteArrayInputStream)1 ByteArrayOutputStream (java.io.ByteArrayOutputStream)1 EOFException (java.io.EOFException)1 DictionaryPage (org.apache.parquet.column.page.DictionaryPage)1 BytesInputDecompressor (org.apache.parquet.compression.CompressionCodecFactory.BytesInputDecompressor)1 DataPageHeaderV2 (org.apache.parquet.format.DataPageHeaderV2)1 PageType (org.apache.parquet.format.PageType)1 Util.readPageHeader (org.apache.parquet.format.Util.readPageHeader)1 ParquetMetadataConverter.filterFileMetaDataByMidpoint (org.apache.parquet.format.converter.ParquetMetadataConverter.filterFileMetaDataByMidpoint)1 Test (org.junit.Test)1