Search in sources :

Example 1 with DictionaryPageReadStore

use of org.apache.parquet.column.page.DictionaryPageReadStore in project parquet-mr by apache.

the class ShowDictionaryCommand method run.

@Override
@SuppressWarnings("unchecked")
public int run() throws IOException {
    Preconditions.checkArgument(targets != null && targets.size() >= 1, "A Parquet file is required.");
    Preconditions.checkArgument(targets.size() == 1, "Cannot process multiple Parquet files.");
    String source = targets.get(0);
    ParquetFileReader reader = ParquetFileReader.open(getConf(), qualifiedPath(source));
    MessageType schema = reader.getFileMetaData().getSchema();
    ColumnDescriptor descriptor = Util.descriptor(column, schema);
    PrimitiveType type = Util.primitive(column, schema);
    Preconditions.checkNotNull(type);
    DictionaryPageReadStore dictionaryReader;
    int rowGroup = 0;
    while ((dictionaryReader = reader.getNextDictionaryReader()) != null) {
        DictionaryPage page = dictionaryReader.readDictionaryPage(descriptor);
        Dictionary dict = page.getEncoding().initDictionary(descriptor, page);
        console.info("\nRow group {} dictionary for \"{}\":", rowGroup, column, page.getCompressedSize());
        for (int i = 0; i <= dict.getMaxId(); i += 1) {
            switch(type.getPrimitiveTypeName()) {
                case BINARY:
                    if (type.getOriginalType() == OriginalType.UTF8) {
                        console.info("{}: {}", String.format("%6d", i), Util.humanReadable(dict.decodeToBinary(i).toStringUsingUTF8(), 70));
                    } else {
                        console.info("{}: {}", String.format("%6d", i), Util.humanReadable(dict.decodeToBinary(i).getBytesUnsafe(), 70));
                    }
                    break;
                case INT32:
                    console.info("{}: {}", String.format("%6d", i), dict.decodeToInt(i));
                    break;
                case INT64:
                    console.info("{}: {}", String.format("%6d", i), dict.decodeToLong(i));
                    break;
                case FLOAT:
                    console.info("{}: {}", String.format("%6d", i), dict.decodeToFloat(i));
                    break;
                case DOUBLE:
                    console.info("{}: {}", String.format("%6d", i), dict.decodeToDouble(i));
                    break;
                default:
                    throw new IllegalArgumentException("Unknown dictionary type: " + type.getPrimitiveTypeName());
            }
        }
        reader.skipNextRowGroup();
        rowGroup += 1;
    }
    console.info("");
    return 0;
}
Also used : Dictionary(org.apache.parquet.column.Dictionary) ParquetFileReader(org.apache.parquet.hadoop.ParquetFileReader) ColumnDescriptor(org.apache.parquet.column.ColumnDescriptor) PrimitiveType(org.apache.parquet.schema.PrimitiveType) DictionaryPageReadStore(org.apache.parquet.column.page.DictionaryPageReadStore) MessageType(org.apache.parquet.schema.MessageType) DictionaryPage(org.apache.parquet.column.page.DictionaryPage)

Example 2 with DictionaryPageReadStore

use of org.apache.parquet.column.page.DictionaryPageReadStore in project parquet-mr by apache.

the class DictionaryFilterTest method testColumnWithoutDictionary.

@Test
public void testColumnWithoutDictionary() throws Exception {
    IntColumn plain = intColumn("plain_int32_field");
    DictionaryPageReadStore dictionaryStore = mock(DictionaryPageReadStore.class);
    assertFalse("Should never drop block using plain encoding", canDrop(eq(plain, -10), ccmd, dictionaryStore));
    assertFalse("Should never drop block using plain encoding", canDrop(lt(plain, -10), ccmd, dictionaryStore));
    assertFalse("Should never drop block using plain encoding", canDrop(ltEq(plain, -10), ccmd, dictionaryStore));
    assertFalse("Should never drop block using plain encoding", canDrop(gt(plain, nElements + 10), ccmd, dictionaryStore));
    assertFalse("Should never drop block using plain encoding", canDrop(gtEq(plain, nElements + 10), ccmd, dictionaryStore));
    assertFalse("Should never drop block using plain encoding", canDrop(notEq(plain, nElements + 10), ccmd, dictionaryStore));
    verifyZeroInteractions(dictionaryStore);
}
Also used : DictionaryPageReadStore(org.apache.parquet.column.page.DictionaryPageReadStore) IntColumn(org.apache.parquet.filter2.predicate.Operators.IntColumn) Test(org.junit.Test)

Example 3 with DictionaryPageReadStore

use of org.apache.parquet.column.page.DictionaryPageReadStore in project parquet-mr by apache.

the class DictionaryFilterTest method testColumnWithDictionaryAndPlainEncodings.

@Test
public void testColumnWithDictionaryAndPlainEncodings() throws Exception {
    IntColumn plain = intColumn("fallback_binary_field");
    DictionaryPageReadStore dictionaryStore = mock(DictionaryPageReadStore.class);
    assertFalse("Should never drop block using plain encoding", canDrop(eq(plain, -10), ccmd, dictionaryStore));
    assertFalse("Should never drop block using plain encoding", canDrop(lt(plain, -10), ccmd, dictionaryStore));
    assertFalse("Should never drop block using plain encoding", canDrop(ltEq(plain, -10), ccmd, dictionaryStore));
    assertFalse("Should never drop block using plain encoding", canDrop(gt(plain, nElements + 10), ccmd, dictionaryStore));
    assertFalse("Should never drop block using plain encoding", canDrop(gtEq(plain, nElements + 10), ccmd, dictionaryStore));
    assertFalse("Should never drop block using plain encoding", canDrop(notEq(plain, nElements + 10), ccmd, dictionaryStore));
    verifyZeroInteractions(dictionaryStore);
}
Also used : DictionaryPageReadStore(org.apache.parquet.column.page.DictionaryPageReadStore) IntColumn(org.apache.parquet.filter2.predicate.Operators.IntColumn) Test(org.junit.Test)

Aggregations

DictionaryPageReadStore (org.apache.parquet.column.page.DictionaryPageReadStore)3 IntColumn (org.apache.parquet.filter2.predicate.Operators.IntColumn)2 Test (org.junit.Test)2 ColumnDescriptor (org.apache.parquet.column.ColumnDescriptor)1 Dictionary (org.apache.parquet.column.Dictionary)1 DictionaryPage (org.apache.parquet.column.page.DictionaryPage)1 ParquetFileReader (org.apache.parquet.hadoop.ParquetFileReader)1 MessageType (org.apache.parquet.schema.MessageType)1 PrimitiveType (org.apache.parquet.schema.PrimitiveType)1