Search in sources :

Example 1 with InvalidArrowFileException

use of org.apache.arrow.vector.ipc.InvalidArrowFileException in project tracdap by finos.

the class ArrowFileDecoder method decodeChunk.

@Override
protected void decodeChunk(ByteBuf chunk) {
    try (var stream = new ByteSeekableChannel(chunk);
        var reader = new ArrowFileReader(stream, arrowAllocator);
        var root = reader.getVectorSchemaRoot()) {
        var schema = root.getSchema();
        emitBlock(DataBlock.forSchema(schema));
        var unloader = new VectorUnloader(root);
        while (reader.loadNextBatch()) {
            var batch = unloader.getRecordBatch();
            emitBlock(DataBlock.forRecords(batch));
        }
    } catch (InvalidArrowFileException e) {
        // A nice clean validation failure from the Arrow framework
        // E.g. missing / incorrect magic number at the start (or end) of the file
        var errorMessage = "Arrow file decoding failed, file is invalid: " + e.getMessage();
        log.error(errorMessage, e);
        throw new EDataCorruption(errorMessage, e);
    } catch (IllegalArgumentException | IndexOutOfBoundsException | IOException e) {
        // These errors occur if the data stream contains bad values for vector sizes, offsets etc.
        // This may be as a result of a corrupt data stream, or a maliciously crafted message
        // Decoders work on a stream of buffers, "real" IO exceptions should not occur
        var errorMessage = "Arrow file decoding failed, content is garbled";
        log.error(errorMessage, e);
        throw new EDataCorruption(errorMessage, e);
    } catch (Throwable e) {
        // Ensure unexpected errors are still reported to the Flow API
        log.error("Unexpected error in Arrow file decoding", e);
        throw new EUnexpected(e);
    } finally {
        chunk.release();
    }
}
Also used : VectorUnloader(org.apache.arrow.vector.VectorUnloader) ByteSeekableChannel(com.accenture.trac.common.util.ByteSeekableChannel) ArrowFileReader(org.apache.arrow.vector.ipc.ArrowFileReader) InvalidArrowFileException(org.apache.arrow.vector.ipc.InvalidArrowFileException) EDataCorruption(com.accenture.trac.common.exception.EDataCorruption) IOException(java.io.IOException) EUnexpected(com.accenture.trac.common.exception.EUnexpected)

Example 2 with InvalidArrowFileException

use of org.apache.arrow.vector.ipc.InvalidArrowFileException in project tracdap by finos.

the class ArrowFileDecoder method decodeChunk.

@Override
protected void decodeChunk(ByteBuf chunk) {
    try (var stream = new ByteSeekableChannel(chunk);
        var reader = new ArrowFileReader(stream, arrowAllocator);
        var root = reader.getVectorSchemaRoot()) {
        var schema = root.getSchema();
        emitBlock(DataBlock.forSchema(schema));
        var unloader = new VectorUnloader(root);
        while (reader.loadNextBatch()) {
            var batch = unloader.getRecordBatch();
            emitBlock(DataBlock.forRecords(batch));
        }
    } catch (InvalidArrowFileException e) {
        // A nice clean validation failure from the Arrow framework
        // E.g. missing / incorrect magic number at the start (or end) of the file
        var errorMessage = "Arrow file decoding failed, file is invalid: " + e.getMessage();
        log.error(errorMessage, e);
        throw new EDataCorruption(errorMessage, e);
    } catch (IllegalArgumentException | IndexOutOfBoundsException | IOException e) {
        // These errors occur if the data stream contains bad values for vector sizes, offsets etc.
        // This may be as a result of a corrupt data stream, or a maliciously crafted message
        // Decoders work on a stream of buffers, "real" IO exceptions should not occur
        var errorMessage = "Arrow file decoding failed, content is garbled";
        log.error(errorMessage, e);
        throw new EDataCorruption(errorMessage, e);
    } catch (Throwable e) {
        // Ensure unexpected errors are still reported to the Flow API
        log.error("Unexpected error in Arrow file decoding", e);
        throw new EUnexpected(e);
    } finally {
        chunk.release();
    }
}
Also used : VectorUnloader(org.apache.arrow.vector.VectorUnloader) ByteSeekableChannel(org.finos.tracdap.common.util.ByteSeekableChannel) ArrowFileReader(org.apache.arrow.vector.ipc.ArrowFileReader) InvalidArrowFileException(org.apache.arrow.vector.ipc.InvalidArrowFileException) EDataCorruption(org.finos.tracdap.common.exception.EDataCorruption) IOException(java.io.IOException) EUnexpected(org.finos.tracdap.common.exception.EUnexpected)

Aggregations

IOException (java.io.IOException)2 VectorUnloader (org.apache.arrow.vector.VectorUnloader)2 ArrowFileReader (org.apache.arrow.vector.ipc.ArrowFileReader)2 InvalidArrowFileException (org.apache.arrow.vector.ipc.InvalidArrowFileException)2 EDataCorruption (com.accenture.trac.common.exception.EDataCorruption)1 EUnexpected (com.accenture.trac.common.exception.EUnexpected)1 ByteSeekableChannel (com.accenture.trac.common.util.ByteSeekableChannel)1 EDataCorruption (org.finos.tracdap.common.exception.EDataCorruption)1 EUnexpected (org.finos.tracdap.common.exception.EUnexpected)1 ByteSeekableChannel (org.finos.tracdap.common.util.ByteSeekableChannel)1