Search in sources :

Example 16 with Stream

use of io.trino.orc.metadata.Stream in project trino by trinodb.

the class DoubleColumnWriter method getBloomFilters.

@Override
public List<StreamDataOutput> getBloomFilters(CompressedMetadataWriter metadataWriter) throws IOException {
    List<BloomFilter> bloomFilters = rowGroupColumnStatistics.stream().map(ColumnStatistics::getBloomFilter).filter(Objects::nonNull).collect(toImmutableList());
    if (!bloomFilters.isEmpty()) {
        Slice slice = metadataWriter.writeBloomFilters(bloomFilters);
        Stream stream = new Stream(columnId, StreamKind.BLOOM_FILTER_UTF8, slice.length(), false);
        return ImmutableList.of(new StreamDataOutput(slice, stream));
    }
    return ImmutableList.of();
}
Also used : ColumnStatistics(io.trino.orc.metadata.statistics.ColumnStatistics) Slice(io.airlift.slice.Slice) PresentOutputStream(io.trino.orc.stream.PresentOutputStream) DoubleOutputStream(io.trino.orc.stream.DoubleOutputStream) Stream(io.trino.orc.metadata.Stream) StreamDataOutput(io.trino.orc.stream.StreamDataOutput) BloomFilter(io.trino.orc.metadata.statistics.BloomFilter)

Example 17 with Stream

use of io.trino.orc.metadata.Stream in project trino by trinodb.

the class FloatColumnWriter method getIndexStreams.

@Override
public List<StreamDataOutput> getIndexStreams(CompressedMetadataWriter metadataWriter) throws IOException {
    checkState(closed);
    ImmutableList.Builder<RowGroupIndex> rowGroupIndexes = ImmutableList.builder();
    List<FloatStreamCheckpoint> dataCheckpoints = dataStream.getCheckpoints();
    Optional<List<BooleanStreamCheckpoint>> presentCheckpoints = presentStream.getCheckpoints();
    for (int i = 0; i < rowGroupColumnStatistics.size(); i++) {
        int groupId = i;
        ColumnStatistics columnStatistics = rowGroupColumnStatistics.get(groupId);
        FloatStreamCheckpoint dataCheckpoint = dataCheckpoints.get(groupId);
        Optional<BooleanStreamCheckpoint> presentCheckpoint = presentCheckpoints.map(checkpoints -> checkpoints.get(groupId));
        List<Integer> positions = createFloatColumnPositionList(compressed, dataCheckpoint, presentCheckpoint);
        rowGroupIndexes.add(new RowGroupIndex(positions, columnStatistics));
    }
    Slice slice = metadataWriter.writeRowIndexes(rowGroupIndexes.build());
    Stream stream = new Stream(columnId, StreamKind.ROW_INDEX, slice.length(), false);
    return ImmutableList.of(new StreamDataOutput(slice, stream));
}
Also used : ColumnStatistics(io.trino.orc.metadata.statistics.ColumnStatistics) BooleanStreamCheckpoint(io.trino.orc.checkpoint.BooleanStreamCheckpoint) ImmutableList(com.google.common.collect.ImmutableList) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) FloatStreamCheckpoint(io.trino.orc.checkpoint.FloatStreamCheckpoint) StreamDataOutput(io.trino.orc.stream.StreamDataOutput) BooleanStreamCheckpoint(io.trino.orc.checkpoint.BooleanStreamCheckpoint) FloatStreamCheckpoint(io.trino.orc.checkpoint.FloatStreamCheckpoint) RowGroupIndex(io.trino.orc.metadata.RowGroupIndex) Slice(io.airlift.slice.Slice) ArrayList(java.util.ArrayList) ImmutableList(com.google.common.collect.ImmutableList) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) List(java.util.List) PresentOutputStream(io.trino.orc.stream.PresentOutputStream) Stream(io.trino.orc.metadata.Stream) FloatOutputStream(io.trino.orc.stream.FloatOutputStream)

Example 18 with Stream

use of io.trino.orc.metadata.Stream in project trino by trinodb.

the class BooleanColumnWriter method getIndexStreams.

@Override
public List<StreamDataOutput> getIndexStreams(CompressedMetadataWriter metadataWriter) throws IOException {
    checkState(closed);
    ImmutableList.Builder<RowGroupIndex> rowGroupIndexes = ImmutableList.builder();
    List<BooleanStreamCheckpoint> dataCheckpoints = dataStream.getCheckpoints();
    Optional<List<BooleanStreamCheckpoint>> presentCheckpoints = presentStream.getCheckpoints();
    for (int i = 0; i < rowGroupColumnStatistics.size(); i++) {
        int groupId = i;
        ColumnStatistics columnStatistics = rowGroupColumnStatistics.get(groupId);
        BooleanStreamCheckpoint dataCheckpoint = dataCheckpoints.get(groupId);
        Optional<BooleanStreamCheckpoint> presentCheckpoint = presentCheckpoints.map(checkpoints -> checkpoints.get(groupId));
        List<Integer> positions = createBooleanColumnPositionList(compressed, dataCheckpoint, presentCheckpoint);
        rowGroupIndexes.add(new RowGroupIndex(positions, columnStatistics));
    }
    Slice slice = metadataWriter.writeRowIndexes(rowGroupIndexes.build());
    Stream stream = new Stream(columnId, StreamKind.ROW_INDEX, slice.length(), false);
    return ImmutableList.of(new StreamDataOutput(slice, stream));
}
Also used : ColumnStatistics(io.trino.orc.metadata.statistics.ColumnStatistics) BooleanStreamCheckpoint(io.trino.orc.checkpoint.BooleanStreamCheckpoint) ImmutableList(com.google.common.collect.ImmutableList) StreamDataOutput(io.trino.orc.stream.StreamDataOutput) BooleanStreamCheckpoint(io.trino.orc.checkpoint.BooleanStreamCheckpoint) RowGroupIndex(io.trino.orc.metadata.RowGroupIndex) Slice(io.airlift.slice.Slice) ArrayList(java.util.ArrayList) ImmutableList(com.google.common.collect.ImmutableList) List(java.util.List) PresentOutputStream(io.trino.orc.stream.PresentOutputStream) Stream(io.trino.orc.metadata.Stream) BooleanOutputStream(io.trino.orc.stream.BooleanOutputStream)

Example 19 with Stream

use of io.trino.orc.metadata.Stream in project trino by trinodb.

the class ByteColumnWriter method getBloomFilters.

@Override
public List<StreamDataOutput> getBloomFilters(CompressedMetadataWriter metadataWriter) throws IOException {
    List<BloomFilter> bloomFilters = rowGroupColumnStatistics.stream().map(ColumnStatistics::getBloomFilter).filter(Objects::nonNull).collect(toImmutableList());
    if (!bloomFilters.isEmpty()) {
        Slice slice = metadataWriter.writeBloomFilters(bloomFilters);
        Stream stream = new Stream(columnId, StreamKind.BLOOM_FILTER_UTF8, slice.length(), false);
        return ImmutableList.of(new StreamDataOutput(slice, stream));
    }
    return ImmutableList.of();
}
Also used : ColumnStatistics(io.trino.orc.metadata.statistics.ColumnStatistics) Slice(io.airlift.slice.Slice) PresentOutputStream(io.trino.orc.stream.PresentOutputStream) Stream(io.trino.orc.metadata.Stream) ByteOutputStream(io.trino.orc.stream.ByteOutputStream) StreamDataOutput(io.trino.orc.stream.StreamDataOutput) BloomFilter(io.trino.orc.metadata.statistics.BloomFilter)

Example 20 with Stream

use of io.trino.orc.metadata.Stream in project trino by trinodb.

the class TestOrcWriter method testWriteOutputStreamsInOrder.

@Test
public void testWriteOutputStreamsInOrder() throws IOException {
    for (OrcWriteValidationMode validationMode : OrcWriteValidationMode.values()) {
        TempFile tempFile = new TempFile();
        List<String> columnNames = ImmutableList.of("test1", "test2", "test3", "test4", "test5");
        List<Type> types = ImmutableList.of(VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR);
        OrcWriter writer = new OrcWriter(new OutputStreamOrcDataSink(new FileOutputStream(tempFile.getFile())), ImmutableList.of("test1", "test2", "test3", "test4", "test5"), types, OrcType.createRootOrcType(columnNames, types), NONE, new OrcWriterOptions().withStripeMinSize(DataSize.of(0, MEGABYTE)).withStripeMaxSize(DataSize.of(32, MEGABYTE)).withStripeMaxRowCount(ORC_STRIPE_SIZE).withRowGroupMaxRowCount(ORC_ROW_GROUP_SIZE).withDictionaryMaxMemory(DataSize.of(32, MEGABYTE)).withBloomFilterColumns(ImmutableSet.copyOf(columnNames)), ImmutableMap.of(), true, validationMode, new OrcWriterStats());
        // write down some data with unsorted streams
        String[] data = new String[] { "a", "bbbbb", "ccc", "dd", "eeee" };
        Block[] blocks = new Block[data.length];
        int entries = 65536;
        BlockBuilder blockBuilder = VARCHAR.createBlockBuilder(null, entries);
        for (int i = 0; i < data.length; i++) {
            byte[] bytes = data[i].getBytes(UTF_8);
            for (int j = 0; j < entries; j++) {
                // force to write different data
                bytes[0] = (byte) ((bytes[0] + 1) % 128);
                blockBuilder.writeBytes(Slices.wrappedBuffer(bytes, 0, bytes.length), 0, bytes.length);
                blockBuilder.closeEntry();
            }
            blocks[i] = blockBuilder.build();
            blockBuilder = blockBuilder.newBlockBuilderLike(null);
        }
        writer.write(new Page(blocks));
        writer.close();
        // read the footer and verify the streams are ordered by size
        OrcDataSource orcDataSource = new FileOrcDataSource(tempFile.getFile(), READER_OPTIONS);
        Footer footer = OrcReader.createOrcReader(orcDataSource, READER_OPTIONS).orElseThrow(() -> new RuntimeException("File is empty")).getFooter();
        // OrcReader closes the original data source because it buffers the full file, so we need to reopen
        orcDataSource = new FileOrcDataSource(tempFile.getFile(), READER_OPTIONS);
        for (StripeInformation stripe : footer.getStripes()) {
            // read the footer
            Slice tailBuffer = orcDataSource.readFully(stripe.getOffset() + stripe.getIndexLength() + stripe.getDataLength(), toIntExact(stripe.getFooterLength()));
            try (InputStream inputStream = new OrcInputStream(OrcChunkLoader.create(orcDataSource.getId(), tailBuffer, Optional.empty(), newSimpleAggregatedMemoryContext()))) {
                StripeFooter stripeFooter = new OrcMetadataReader().readStripeFooter(footer.getTypes(), inputStream, ZoneId.of("UTC"));
                int size = 0;
                boolean dataStreamStarted = false;
                for (Stream stream : stripeFooter.getStreams()) {
                    if (isIndexStream(stream)) {
                        assertFalse(dataStreamStarted);
                        continue;
                    }
                    dataStreamStarted = true;
                    // verify sizes in order
                    assertGreaterThanOrEqual(stream.getLength(), size);
                    size = stream.getLength();
                }
            }
        }
    }
}
Also used : Page(io.trino.spi.Page) OrcWriteValidationMode(io.trino.orc.OrcWriteValidation.OrcWriteValidationMode) StripeReader.isIndexStream(io.trino.orc.StripeReader.isIndexStream) Stream(io.trino.orc.metadata.Stream) OrcInputStream(io.trino.orc.stream.OrcInputStream) FileOutputStream(java.io.FileOutputStream) InputStream(java.io.InputStream) BlockBuilder(io.trino.spi.block.BlockBuilder) OrcInputStream(io.trino.orc.stream.OrcInputStream) OrcInputStream(io.trino.orc.stream.OrcInputStream) InputStream(java.io.InputStream) OrcMetadataReader(io.trino.orc.metadata.OrcMetadataReader) Type(io.trino.spi.type.Type) OrcType(io.trino.orc.metadata.OrcType) StripeFooter(io.trino.orc.metadata.StripeFooter) Slice(io.airlift.slice.Slice) FileOutputStream(java.io.FileOutputStream) StripeFooter(io.trino.orc.metadata.StripeFooter) Footer(io.trino.orc.metadata.Footer) Block(io.trino.spi.block.Block) StripeInformation(io.trino.orc.metadata.StripeInformation) Test(org.testng.annotations.Test)

Aggregations

Stream (io.trino.orc.metadata.Stream)33 Slice (io.airlift.slice.Slice)23 ColumnStatistics (io.trino.orc.metadata.statistics.ColumnStatistics)23 StreamDataOutput (io.trino.orc.stream.StreamDataOutput)20 ArrayList (java.util.ArrayList)20 List (java.util.List)20 ImmutableList (com.google.common.collect.ImmutableList)19 PresentOutputStream (io.trino.orc.stream.PresentOutputStream)18 RowGroupIndex (io.trino.orc.metadata.RowGroupIndex)16 BooleanStreamCheckpoint (io.trino.orc.checkpoint.BooleanStreamCheckpoint)14 OrcColumnId (io.trino.orc.metadata.OrcColumnId)11 BloomFilter (io.trino.orc.metadata.statistics.BloomFilter)9 OrcInputStream (io.trino.orc.stream.OrcInputStream)9 InputStream (java.io.InputStream)9 ImmutableMap (com.google.common.collect.ImmutableMap)8 LongOutputStream (io.trino.orc.stream.LongOutputStream)8 ValueInputStream (io.trino.orc.stream.ValueInputStream)8 ImmutableList.toImmutableList (com.google.common.collect.ImmutableList.toImmutableList)7 ImmutableMap.toImmutableMap (com.google.common.collect.ImmutableMap.toImmutableMap)7 StreamCheckpoint (io.trino.orc.checkpoint.StreamCheckpoint)6