Search in sources :

Example 1 with GroupType

use of org.apache.parquet.schema.GroupType in project hive by apache.

the class DataWritableReadSupport method getProjectedGroupFields.

/**
   * Searchs column names by name on a given Parquet schema, and returns its corresponded
   * Parquet schema types.
   *
   * @param schema Group schema where to search for column names.
   * @param colNames List of column names.
   * @param colTypes List of column types.
   * @return List of GroupType objects of projected columns.
   */
private static List<Type> getProjectedGroupFields(GroupType schema, List<String> colNames, List<TypeInfo> colTypes) {
    List<Type> schemaTypes = new ArrayList<Type>();
    ListIterator<String> columnIterator = colNames.listIterator();
    while (columnIterator.hasNext()) {
        TypeInfo colType = colTypes.get(columnIterator.nextIndex());
        String colName = columnIterator.next();
        Type fieldType = getFieldTypeIgnoreCase(schema, colName);
        if (fieldType == null) {
            schemaTypes.add(Types.optional(PrimitiveTypeName.BINARY).named(colName));
        } else {
            schemaTypes.add(getProjectedType(colType, fieldType));
        }
    }
    return schemaTypes;
}
Also used : OriginalType(org.apache.parquet.schema.OriginalType) GroupType(org.apache.parquet.schema.GroupType) MessageType(org.apache.parquet.schema.MessageType) Type(org.apache.parquet.schema.Type) ArrayList(java.util.ArrayList) ListTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo) StructTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo) TypeInfo(org.apache.hadoop.hive.serde2.typeinfo.TypeInfo)

Example 2 with GroupType

use of org.apache.parquet.schema.GroupType in project drill by apache.

the class Metadata method getColTypeInfo.

private ColTypeInfo getColTypeInfo(MessageType schema, Type type, String[] path, int depth) {
    if (type.isPrimitive()) {
        PrimitiveType primitiveType = (PrimitiveType) type;
        int precision = 0;
        int scale = 0;
        if (primitiveType.getDecimalMetadata() != null) {
            precision = primitiveType.getDecimalMetadata().getPrecision();
            scale = primitiveType.getDecimalMetadata().getScale();
        }
        int repetitionLevel = schema.getMaxRepetitionLevel(path);
        int definitionLevel = schema.getMaxDefinitionLevel(path);
        return new ColTypeInfo(type.getOriginalType(), precision, scale, repetitionLevel, definitionLevel);
    }
    Type t = ((GroupType) type).getType(path[depth]);
    return getColTypeInfo(schema, t, path, depth + 1);
}
Also used : PrimitiveType(org.apache.parquet.schema.PrimitiveType) GroupType(org.apache.parquet.schema.GroupType) MessageType(org.apache.parquet.schema.MessageType) Type(org.apache.parquet.schema.Type) OriginalType(org.apache.parquet.schema.OriginalType) GroupType(org.apache.parquet.schema.GroupType) PrimitiveType(org.apache.parquet.schema.PrimitiveType)

Example 3 with GroupType

use of org.apache.parquet.schema.GroupType in project drill by apache.

the class ParquetSchemaMerge method main.

public static void main(String[] args) {
    MessageType message1;
    MessageType message2;
    PrimitiveType c = new PrimitiveType(Repetition.OPTIONAL, PrimitiveTypeName.INT32, "c");
    GroupType b = new GroupType(Repetition.REQUIRED, "b");
    GroupType a = new GroupType(Repetition.OPTIONAL, "a", b);
    message1 = new MessageType("root", a);
    PrimitiveType c2 = new PrimitiveType(Repetition.OPTIONAL, PrimitiveTypeName.INT32, "d");
    GroupType b2 = new GroupType(Repetition.OPTIONAL, "b", c2);
    GroupType a2 = new GroupType(Repetition.OPTIONAL, "a", b2);
    message2 = new MessageType("root", a2);
    MessageType message3 = message1.union(message2);
    StringBuilder builder = new StringBuilder();
    message3.writeToStringBuilder(builder, "");
    System.out.println(builder);
}
Also used : GroupType(org.apache.parquet.schema.GroupType) PrimitiveType(org.apache.parquet.schema.PrimitiveType) MessageType(org.apache.parquet.schema.MessageType)

Example 4 with GroupType

use of org.apache.parquet.schema.GroupType in project hive by apache.

the class DataWritableWriter method createWriter.

/**
   * Creates a writer for the specific object inspector. The returned writer will be used
   * to call Parquet API for the specific data type.
   * @param inspector The object inspector used to get the correct value type.
   * @param type Type that contains information about the type schema.
   * @return A ParquetWriter object used to call the Parquet API fo the specific data type.
   */
private DataWriter createWriter(ObjectInspector inspector, Type type) {
    if (type.isPrimitive()) {
        checkInspectorCategory(inspector, ObjectInspector.Category.PRIMITIVE);
        PrimitiveObjectInspector primitiveInspector = (PrimitiveObjectInspector) inspector;
        switch(primitiveInspector.getPrimitiveCategory()) {
            case BOOLEAN:
                return new BooleanDataWriter((BooleanObjectInspector) inspector);
            case BYTE:
                return new ByteDataWriter((ByteObjectInspector) inspector);
            case SHORT:
                return new ShortDataWriter((ShortObjectInspector) inspector);
            case INT:
                return new IntDataWriter((IntObjectInspector) inspector);
            case LONG:
                return new LongDataWriter((LongObjectInspector) inspector);
            case FLOAT:
                return new FloatDataWriter((FloatObjectInspector) inspector);
            case DOUBLE:
                return new DoubleDataWriter((DoubleObjectInspector) inspector);
            case STRING:
                return new StringDataWriter((StringObjectInspector) inspector);
            case CHAR:
                return new CharDataWriter((HiveCharObjectInspector) inspector);
            case VARCHAR:
                return new VarcharDataWriter((HiveVarcharObjectInspector) inspector);
            case BINARY:
                return new BinaryDataWriter((BinaryObjectInspector) inspector);
            case TIMESTAMP:
                return new TimestampDataWriter((TimestampObjectInspector) inspector);
            case DECIMAL:
                return new DecimalDataWriter((HiveDecimalObjectInspector) inspector);
            case DATE:
                return new DateDataWriter((DateObjectInspector) inspector);
            default:
                throw new IllegalArgumentException("Unsupported primitive data type: " + primitiveInspector.getPrimitiveCategory());
        }
    } else {
        GroupType groupType = type.asGroupType();
        OriginalType originalType = type.getOriginalType();
        if (originalType != null && originalType.equals(OriginalType.LIST)) {
            checkInspectorCategory(inspector, ObjectInspector.Category.LIST);
            return new ListDataWriter((ListObjectInspector) inspector, groupType);
        } else if (originalType != null && originalType.equals(OriginalType.MAP)) {
            checkInspectorCategory(inspector, ObjectInspector.Category.MAP);
            return new MapDataWriter((MapObjectInspector) inspector, groupType);
        } else {
            checkInspectorCategory(inspector, ObjectInspector.Category.STRUCT);
            return new StructDataWriter((StructObjectInspector) inspector, groupType);
        }
    }
}
Also used : OriginalType(org.apache.parquet.schema.OriginalType) GroupType(org.apache.parquet.schema.GroupType) MapObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector) PrimitiveObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector) StructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)

Example 5 with GroupType

use of org.apache.parquet.schema.GroupType in project hive by apache.

the class HiveSchemaConverter method convertMapType.

// An optional group containing a repeated anonymous group "map", containing
// 2 elements: "key", "value"
private static GroupType convertMapType(final String name, final MapTypeInfo typeInfo) {
    final Type keyType = convertType(ParquetHiveSerDe.MAP_KEY.toString(), typeInfo.getMapKeyTypeInfo(), Repetition.REQUIRED);
    final Type valueType = convertType(ParquetHiveSerDe.MAP_VALUE.toString(), typeInfo.getMapValueTypeInfo());
    return ConversionPatterns.mapType(Repetition.OPTIONAL, name, keyType, valueType);
}
Also used : GroupType(org.apache.parquet.schema.GroupType) MessageType(org.apache.parquet.schema.MessageType) Type(org.apache.parquet.schema.Type) OriginalType(org.apache.parquet.schema.OriginalType)

Aggregations

GroupType (org.apache.parquet.schema.GroupType)9 MessageType (org.apache.parquet.schema.MessageType)7 OriginalType (org.apache.parquet.schema.OriginalType)7 Type (org.apache.parquet.schema.Type)7 PrimitiveType (org.apache.parquet.schema.PrimitiveType)3 ArrayList (java.util.ArrayList)2 ListTypeInfo (org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo)2 StructTypeInfo (org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo)2 TypeInfo (org.apache.hadoop.hive.serde2.typeinfo.TypeInfo)2 HashMap (java.util.HashMap)1 DataMode (org.apache.drill.common.types.TypeProtos.DataMode)1 MinorType (org.apache.drill.common.types.TypeProtos.MinorType)1 MaterializedField (org.apache.drill.exec.record.MaterializedField)1 FieldNode (org.apache.hadoop.hive.ql.optimizer.FieldNode)1 MapObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector)1 PrimitiveObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector)1 StructObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)1 Text (org.apache.hadoop.io.Text)1