Search in sources :

Example 1 with ParquetEncodingException

use of org.apache.parquet.io.ParquetEncodingException in project parquet-mr by apache.

the class DictionaryValuesWriter method getBytes.

@Override
public BytesInput getBytes() {
    int maxDicId = getDictionarySize() - 1;
    LOG.debug("max dic id {}", maxDicId);
    int bitWidth = BytesUtils.getWidthFromMaxInt(maxDicId);
    int initialSlabSize = CapacityByteArrayOutputStream.initialSlabSizeHeuristic(MIN_INITIAL_SLAB_SIZE, maxDictionaryByteSize, 10);
    RunLengthBitPackingHybridEncoder encoder = new RunLengthBitPackingHybridEncoder(bitWidth, initialSlabSize, maxDictionaryByteSize, this.allocator);
    encoders.add(encoder);
    IntIterator iterator = encodedValues.iterator();
    try {
        while (iterator.hasNext()) {
            encoder.writeInt(iterator.next());
        }
        // encodes the bit width
        byte[] bytesHeader = new byte[] { (byte) bitWidth };
        BytesInput rleEncodedBytes = encoder.toBytes();
        LOG.debug("rle encoded bytes {}", rleEncodedBytes.size());
        BytesInput bytes = concat(BytesInput.from(bytesHeader), rleEncodedBytes);
        // remember size of dictionary when we last wrote a page
        lastUsedDictionarySize = getDictionarySize();
        lastUsedDictionaryByteSize = dictionaryByteSize;
        return bytes;
    } catch (IOException e) {
        throw new ParquetEncodingException("could not encode the values", e);
    }
}
Also used : IntIterator(org.apache.parquet.column.values.dictionary.IntList.IntIterator) BytesInput(org.apache.parquet.bytes.BytesInput) ParquetEncodingException(org.apache.parquet.io.ParquetEncodingException) IOException(java.io.IOException) RunLengthBitPackingHybridEncoder(org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder)

Example 2 with ParquetEncodingException

use of org.apache.parquet.io.ParquetEncodingException in project parquet-mr by apache.

the class PlainValuesWriter method writeBytes.

@Override
public final void writeBytes(Binary v) {
    try {
        out.writeInt(v.length());
        v.writeTo(out);
    } catch (IOException e) {
        throw new ParquetEncodingException("could not write bytes", e);
    }
}
Also used : ParquetEncodingException(org.apache.parquet.io.ParquetEncodingException) IOException(java.io.IOException)

Example 3 with ParquetEncodingException

use of org.apache.parquet.io.ParquetEncodingException in project parquet-mr by apache.

the class ParquetFileWriter method mergeFooters.

static ParquetMetadata mergeFooters(Path root, List<Footer> footers) {
    String rootPath = root.toUri().getPath();
    GlobalMetaData fileMetaData = null;
    List<BlockMetaData> blocks = new ArrayList<BlockMetaData>();
    for (Footer footer : footers) {
        String footerPath = footer.getFile().toUri().getPath();
        if (!footerPath.startsWith(rootPath)) {
            throw new ParquetEncodingException(footerPath + " invalid: all the files must be contained in the root " + root);
        }
        footerPath = footerPath.substring(rootPath.length());
        while (footerPath.startsWith("/")) {
            footerPath = footerPath.substring(1);
        }
        fileMetaData = mergeInto(footer.getParquetMetadata().getFileMetaData(), fileMetaData);
        for (BlockMetaData block : footer.getParquetMetadata().getBlocks()) {
            block.setPath(footerPath);
            blocks.add(block);
        }
    }
    return new ParquetMetadata(fileMetaData.merge(), blocks);
}
Also used : BlockMetaData(org.apache.parquet.hadoop.metadata.BlockMetaData) ParquetEncodingException(org.apache.parquet.io.ParquetEncodingException) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) ArrayList(java.util.ArrayList) GlobalMetaData(org.apache.parquet.hadoop.metadata.GlobalMetaData)

Example 4 with ParquetEncodingException

use of org.apache.parquet.io.ParquetEncodingException in project parquet-mr by apache.

the class MemPageWriter method writePage.

@Override
public void writePage(BytesInput bytesInput, int valueCount, Statistics statistics, Encoding rlEncoding, Encoding dlEncoding, Encoding valuesEncoding) throws IOException {
    if (valueCount == 0) {
        throw new ParquetEncodingException("illegal page of 0 values");
    }
    memSize += bytesInput.size();
    pages.add(new DataPageV1(BytesInput.copy(bytesInput), valueCount, (int) bytesInput.size(), statistics, rlEncoding, dlEncoding, valuesEncoding));
    totalValueCount += valueCount;
    LOG.debug("page written for {} bytes and {} records", bytesInput.size(), valueCount);
}
Also used : ParquetEncodingException(org.apache.parquet.io.ParquetEncodingException) DataPageV1(org.apache.parquet.column.page.DataPageV1)

Example 5 with ParquetEncodingException

use of org.apache.parquet.io.ParquetEncodingException in project parquet-mr by apache.

the class DataWritableWriter method writeData.

private void writeData(final ArrayWritable arr, final GroupType type) {
    if (arr == null) {
        return;
    }
    final int fieldCount = type.getFieldCount();
    Writable[] values = arr.get();
    for (int field = 0; field < fieldCount; ++field) {
        final Type fieldType = type.getType(field);
        final String fieldName = fieldType.getName();
        final Writable value = values[field];
        if (value == null) {
            continue;
        }
        recordConsumer.startField(fieldName, field);
        if (fieldType.isPrimitive()) {
            writePrimitive(value);
        } else {
            recordConsumer.startGroup();
            if (value instanceof ArrayWritable) {
                if (fieldType.asGroupType().getRepetition().equals(Type.Repetition.REPEATED)) {
                    writeArray((ArrayWritable) value, fieldType.asGroupType());
                } else {
                    writeData((ArrayWritable) value, fieldType.asGroupType());
                }
            } else if (value != null) {
                throw new ParquetEncodingException("This should be an ArrayWritable or MapWritable: " + value);
            }
            recordConsumer.endGroup();
        }
        recordConsumer.endField(fieldName, field);
    }
}
Also used : GroupType(org.apache.parquet.schema.GroupType) Type(org.apache.parquet.schema.Type) ArrayWritable(org.apache.hadoop.io.ArrayWritable) ParquetEncodingException(org.apache.parquet.io.ParquetEncodingException) ByteWritable(org.apache.hadoop.hive.serde2.io.ByteWritable) BigDecimalWritable(org.apache.hadoop.hive.ql.io.parquet.writable.BigDecimalWritable) BinaryWritable(org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable) Writable(org.apache.hadoop.io.Writable) LongWritable(org.apache.hadoop.io.LongWritable) BooleanWritable(org.apache.hadoop.io.BooleanWritable) DoubleWritable(org.apache.hadoop.hive.serde2.io.DoubleWritable) ShortWritable(org.apache.hadoop.hive.serde2.io.ShortWritable) ArrayWritable(org.apache.hadoop.io.ArrayWritable) FloatWritable(org.apache.hadoop.io.FloatWritable) IntWritable(org.apache.hadoop.io.IntWritable)

Aggregations

ParquetEncodingException (org.apache.parquet.io.ParquetEncodingException)9 IOException (java.io.IOException)5 BytesInput (org.apache.parquet.bytes.BytesInput)2 ArrayList (java.util.ArrayList)1 BigDecimalWritable (org.apache.hadoop.hive.ql.io.parquet.writable.BigDecimalWritable)1 BinaryWritable (org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable)1 ByteWritable (org.apache.hadoop.hive.serde2.io.ByteWritable)1 DoubleWritable (org.apache.hadoop.hive.serde2.io.DoubleWritable)1 ShortWritable (org.apache.hadoop.hive.serde2.io.ShortWritable)1 ArrayWritable (org.apache.hadoop.io.ArrayWritable)1 BooleanWritable (org.apache.hadoop.io.BooleanWritable)1 FloatWritable (org.apache.hadoop.io.FloatWritable)1 IntWritable (org.apache.hadoop.io.IntWritable)1 LongWritable (org.apache.hadoop.io.LongWritable)1 Writable (org.apache.hadoop.io.Writable)1 Encoding (org.apache.parquet.column.Encoding)1 DataPageV1 (org.apache.parquet.column.page.DataPageV1)1 IntIterator (org.apache.parquet.column.values.dictionary.IntList.IntIterator)1 RunLengthBitPackingHybridEncoder (org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder)1 BlockMetaData (org.apache.parquet.hadoop.metadata.BlockMetaData)1