Search in sources :

Example 6 with OrcFileTail

use of com.facebook.presto.orc.metadata.OrcFileTail in project presto by prestodb.

the class TestStorageOrcFileTailSource method testSkipDwrfStripeCacheIfDisabled.

@Test
public void testSkipDwrfStripeCacheIfDisabled() throws IOException {
    // beef up the file size to make sure the file can fit the 100 byte long stripe cache
    FileOutputStream out = new FileOutputStream(file.getFile());
    out.write(new byte[100 * 1000]);
    // write the footer and post script
    DwrfProto.Footer.Builder footer = DwrfProto.Footer.newBuilder().addAllStripeCacheOffsets(ImmutableList.of(0, 256, 512));
    DwrfProto.PostScript.Builder postScript = DwrfProto.PostScript.newBuilder().setCompression(NONE).setCacheMode(BOTH).setCacheSize(512);
    writeTail(footer, postScript, out);
    out.close();
    int tailReadSizeInBytes = 256;
    // read the file tail with the disabled "read dwrf stripe cache" feature
    StorageOrcFileTailSource src = new StorageOrcFileTailSource(tailReadSizeInBytes, false);
    TestingOrcDataSource orcDataSource = new TestingOrcDataSource(createFileOrcDataSource());
    OrcFileTail orcFileTail = src.getOrcFileTail(orcDataSource, metadataReader, Optional.empty(), false);
    assertEquals(orcFileTail.getMetadataSize(), 0);
    DwrfProto.Footer actualFooter = readFooter(orcFileTail);
    assertEquals(actualFooter, footer.build());
    // make sure the stripe cache has not been read
    assertFalse(orcFileTail.getDwrfStripeCacheData().isPresent());
    assertEquals(orcDataSource.getReadCount(), 1);
    DiskRange lastReadRange = orcDataSource.getLastReadRanges().get(0);
    assertEquals(lastReadRange.getLength(), tailReadSizeInBytes);
}
Also used : FileOutputStream(java.io.FileOutputStream) StorageOrcFileTailSource(com.facebook.presto.orc.cache.StorageOrcFileTailSource) DwrfProto(com.facebook.presto.orc.proto.DwrfProto) OrcFileTail(com.facebook.presto.orc.metadata.OrcFileTail) Test(org.testng.annotations.Test)

Example 7 with OrcFileTail

use of com.facebook.presto.orc.metadata.OrcFileTail in project presto by prestodb.

the class TestStorageOrcFileTailSource method testReadDwrfStripeCacheIfEnabled.

@Test
public void testReadDwrfStripeCacheIfEnabled() throws IOException {
    FileOutputStream out = new FileOutputStream(file.getFile());
    // write a fake stripe cache
    byte[] stripeCache = new byte[100];
    for (int i = 0; i < stripeCache.length; i++) {
        stripeCache[i] = (byte) i;
    }
    out.write(stripeCache);
    // write the footer and post script
    DwrfProto.Footer.Builder footer = DwrfProto.Footer.newBuilder().addAllStripeCacheOffsets(ImmutableList.of(1, 2, 3));
    DwrfProto.PostScript.Builder postScript = DwrfProto.PostScript.newBuilder().setCompression(NONE).setCacheMode(BOTH).setCacheSize(stripeCache.length);
    writeTail(footer, postScript, out);
    out.close();
    // read the file tail with the enabled "read dwrf stripe cache" feature
    StorageOrcFileTailSource src = new StorageOrcFileTailSource(FOOTER_READ_SIZE_IN_BYTES, true);
    OrcDataSource orcDataSource = createFileOrcDataSource();
    OrcFileTail orcFileTail = src.getOrcFileTail(orcDataSource, metadataReader, Optional.empty(), false);
    assertEquals(orcFileTail.getMetadataSize(), 0);
    DwrfProto.Footer actualFooter = readFooter(orcFileTail);
    assertEquals(actualFooter, footer.build());
    // make sure the stripe cache is loaded correctly
    assertTrue(orcFileTail.getDwrfStripeCacheData().isPresent());
    DwrfStripeCacheData dwrfStripeCacheData = orcFileTail.getDwrfStripeCacheData().get();
    assertEquals(dwrfStripeCacheData.getDwrfStripeCacheMode(), INDEX_AND_FOOTER);
    assertEquals(dwrfStripeCacheData.getDwrfStripeCacheSize(), stripeCache.length);
    assertEquals(dwrfStripeCacheData.getDwrfStripeCacheSlice().getBytes(), stripeCache);
}
Also used : StorageOrcFileTailSource(com.facebook.presto.orc.cache.StorageOrcFileTailSource) DwrfProto(com.facebook.presto.orc.proto.DwrfProto) OrcFileTail(com.facebook.presto.orc.metadata.OrcFileTail) FileOutputStream(java.io.FileOutputStream) DwrfStripeCacheData(com.facebook.presto.orc.metadata.DwrfStripeCacheData) Test(org.testng.annotations.Test)

Example 8 with OrcFileTail

use of com.facebook.presto.orc.metadata.OrcFileTail in project presto by prestodb.

the class StorageOrcFileTailSource method getOrcFileTail.

@Override
public OrcFileTail getOrcFileTail(OrcDataSource orcDataSource, MetadataReader metadataReader, Optional<OrcWriteValidation> writeValidation, boolean cacheable) throws IOException {
    long size = orcDataSource.getSize();
    if (size <= MAGIC.length()) {
        throw new OrcCorruptionException(orcDataSource.getId(), "Invalid file size %s", size);
    }
    // Read the tail of the file
    byte[] buffer = new byte[toIntExact(min(size, expectedFooterSizeInBytes))];
    orcDataSource.readFully(size - buffer.length, buffer);
    // get length of PostScript - last byte of the file
    int postScriptSize = buffer[buffer.length - SIZE_OF_BYTE] & 0xff;
    if (postScriptSize >= buffer.length) {
        throw new OrcCorruptionException(orcDataSource.getId(), "Invalid postscript length %s", postScriptSize);
    }
    // decode the post script
    PostScript postScript;
    try {
        postScript = metadataReader.readPostScript(buffer, buffer.length - SIZE_OF_BYTE - postScriptSize, postScriptSize);
    } catch (OrcCorruptionException e) {
        // check if this is an ORC file and not an RCFile or something else
        if (!isValidHeaderMagic(orcDataSource)) {
            throw new OrcCorruptionException(orcDataSource.getId(), "Not an ORC file");
        }
        throw e;
    }
    // verify this is a supported version
    checkOrcVersion(orcDataSource, postScript.getVersion());
    validateWrite(writeValidation, orcDataSource, validation -> validation.getVersion().equals(postScript.getVersion()), "Unexpected version");
    int bufferSize = toIntExact(postScript.getCompressionBlockSize());
    // check compression codec is supported
    CompressionKind compressionKind = postScript.getCompression();
    validateWrite(writeValidation, orcDataSource, validation -> validation.getCompression() == compressionKind, "Unexpected compression");
    PostScript.HiveWriterVersion hiveWriterVersion = postScript.getHiveWriterVersion();
    int footerSize = toIntExact(postScript.getFooterLength());
    int metadataSize = toIntExact(postScript.getMetadataLength());
    if (footerSize < 0) {
        throw new OrcCorruptionException(orcDataSource.getId(), "Invalid footer length %s", footerSize);
    }
    if (metadataSize < 0) {
        throw new OrcCorruptionException(orcDataSource.getId(), "Invalid metadata length %s", metadataSize);
    }
    // read DWRF stripe cache only if this feature is enabled and it has meaningful data
    boolean readDwrfStripeCache = dwrfStripeCacheEnabled && postScript.getDwrfStripeCacheLength().isPresent() && postScript.getDwrfStripeCacheMode().isPresent() && postScript.getDwrfStripeCacheMode().get() != DwrfStripeCacheMode.NONE;
    int dwrfStripeCacheSize = 0;
    if (readDwrfStripeCache) {
        dwrfStripeCacheSize = postScript.getDwrfStripeCacheLength().getAsInt();
        checkSizes(orcDataSource, metadataSize, dwrfStripeCacheSize);
    }
    // check if extra bytes need to be read
    Slice completeFooterSlice;
    int completeFooterSize = dwrfStripeCacheSize + metadataSize + footerSize + postScriptSize + SIZE_OF_BYTE;
    if (completeFooterSize > buffer.length) {
        // allocate a new buffer large enough for the complete footer
        byte[] newBuffer = new byte[completeFooterSize];
        completeFooterSlice = Slices.wrappedBuffer(newBuffer);
        // initial read was not large enough, so read missing section
        orcDataSource.readFully(size - completeFooterSize, newBuffer, 0, completeFooterSize - buffer.length);
        // copy already read bytes into the new buffer
        completeFooterSlice.setBytes(completeFooterSize - buffer.length, buffer);
    } else {
        // footer is already in the bytes in buffer, just adjust position, length
        completeFooterSlice = Slices.wrappedBuffer(buffer, buffer.length - completeFooterSize, completeFooterSize);
    }
    // metadataSize is set only for ORC files, dwrfStripeCacheSize is set only for DWRF files
    // it should be safe to sum them up to find footer offset
    // TAIL: [ ORC_METADATA{0,1} | DWRF_STRIPE_CACHE {0,1} ] + FOOTER + POST_SCRIPT + POST_SCRIPT_SIZE (1 byte)
    int footerSliceOffset = metadataSize + dwrfStripeCacheSize;
    Slice footerSlice = completeFooterSlice.slice(footerSliceOffset, footerSize);
    Slice metadataSlice = completeFooterSlice.slice(0, metadataSize);
    // set DwrfStripeCacheData only if the stripe cache feature is enabled and the file has the stripe cache
    Optional<DwrfStripeCacheData> dwrfStripeCacheData = Optional.empty();
    if (readDwrfStripeCache) {
        Slice dwrfStripeCacheSlice = completeFooterSlice.slice(0, dwrfStripeCacheSize);
        DwrfStripeCacheMode stripeCacheMode = postScript.getDwrfStripeCacheMode().get();
        dwrfStripeCacheData = Optional.of(new DwrfStripeCacheData(dwrfStripeCacheSlice, dwrfStripeCacheSize, stripeCacheMode));
    }
    return new OrcFileTail(hiveWriterVersion, bufferSize, compressionKind, footerSlice, footerSize, metadataSlice, metadataSize, dwrfStripeCacheData);
}
Also used : CompressionKind(com.facebook.presto.orc.metadata.CompressionKind) OrcFileTail(com.facebook.presto.orc.metadata.OrcFileTail) PostScript(com.facebook.presto.orc.metadata.PostScript) DwrfStripeCacheMode(com.facebook.presto.orc.metadata.DwrfStripeCacheMode) Slice(io.airlift.slice.Slice) DwrfStripeCacheData(com.facebook.presto.orc.metadata.DwrfStripeCacheData) OrcCorruptionException(com.facebook.presto.orc.OrcCorruptionException)

Aggregations

OrcFileTail (com.facebook.presto.orc.metadata.OrcFileTail)8 StorageOrcFileTailSource (com.facebook.presto.orc.cache.StorageOrcFileTailSource)7 Slice (io.airlift.slice.Slice)5 CachingOrcFileTailSource (com.facebook.presto.orc.cache.CachingOrcFileTailSource)4 OrcFileTailSource (com.facebook.presto.orc.cache.OrcFileTailSource)4 RowGroupIndex (com.facebook.presto.orc.metadata.RowGroupIndex)4 Cache (com.google.common.cache.Cache)4 CacheBuilder (com.google.common.cache.CacheBuilder)4 Math.toIntExact (java.lang.Math.toIntExact)4 List (java.util.List)4 Optional (java.util.Optional)4 MILLISECONDS (java.util.concurrent.TimeUnit.MILLISECONDS)4 ConfigBinder.configBinder (com.facebook.airlift.configuration.ConfigBinder.configBinder)3 CachingStripeMetadataSource (com.facebook.presto.orc.CachingStripeMetadataSource)3 DwrfAwareStripeMetadataSourceFactory (com.facebook.presto.orc.DwrfAwareStripeMetadataSourceFactory)3 OrcDataSourceId (com.facebook.presto.orc.OrcDataSourceId)3 StorageStripeMetadataSource (com.facebook.presto.orc.StorageStripeMetadataSource)3 StripeMetadataSource (com.facebook.presto.orc.StripeMetadataSource)3 StripeMetadataSourceFactory (com.facebook.presto.orc.StripeMetadataSourceFactory)3 StripeId (com.facebook.presto.orc.StripeReader.StripeId)3