Search in sources :

Example 1 with IntIterator

use of org.apache.parquet.column.values.dictionary.IntList.IntIterator in project parquet-mr by apache.

the class DictionaryValuesWriter method getBytes.

@Override
public BytesInput getBytes() {
    int maxDicId = getDictionarySize() - 1;
    LOG.debug("max dic id {}", maxDicId);
    int bitWidth = BytesUtils.getWidthFromMaxInt(maxDicId);
    int initialSlabSize = CapacityByteArrayOutputStream.initialSlabSizeHeuristic(MIN_INITIAL_SLAB_SIZE, maxDictionaryByteSize, 10);
    RunLengthBitPackingHybridEncoder encoder = new RunLengthBitPackingHybridEncoder(bitWidth, initialSlabSize, maxDictionaryByteSize, this.allocator);
    encoders.add(encoder);
    IntIterator iterator = encodedValues.iterator();
    try {
        while (iterator.hasNext()) {
            encoder.writeInt(iterator.next());
        }
        // encodes the bit width
        byte[] bytesHeader = new byte[] { (byte) bitWidth };
        BytesInput rleEncodedBytes = encoder.toBytes();
        LOG.debug("rle encoded bytes {}", rleEncodedBytes.size());
        BytesInput bytes = concat(BytesInput.from(bytesHeader), rleEncodedBytes);
        // remember size of dictionary when we last wrote a page
        lastUsedDictionarySize = getDictionarySize();
        lastUsedDictionaryByteSize = dictionaryByteSize;
        return bytes;
    } catch (IOException e) {
        throw new ParquetEncodingException("could not encode the values", e);
    }
}
Also used : IntIterator(org.apache.parquet.column.values.dictionary.IntList.IntIterator) BytesInput(org.apache.parquet.bytes.BytesInput) ParquetEncodingException(org.apache.parquet.io.ParquetEncodingException) IOException(java.io.IOException) RunLengthBitPackingHybridEncoder(org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder)

Aggregations

IOException (java.io.IOException)1 BytesInput (org.apache.parquet.bytes.BytesInput)1 IntIterator (org.apache.parquet.column.values.dictionary.IntList.IntIterator)1 RunLengthBitPackingHybridEncoder (org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder)1 ParquetEncodingException (org.apache.parquet.io.ParquetEncodingException)1