Search in sources :

Example 6 with TypeDescription

use of org.apache.orc.TypeDescription in project hive by apache.

the class OrcInputFormat method genIncludedColumns.

public static boolean[] genIncludedColumns(TypeDescription readerSchema, List<Integer> included) {
    boolean[] result = new boolean[readerSchema.getMaximumId() + 1];
    if (included == null) {
        Arrays.fill(result, true);
        return result;
    }
    result[0] = true;
    List<TypeDescription> children = readerSchema.getChildren();
    for (int columnNumber = 0; columnNumber < children.size(); ++columnNumber) {
        if (included.contains(columnNumber)) {
            TypeDescription child = children.get(columnNumber);
            for (int col = child.getId(); col <= child.getMaximumId(); ++col) {
                result[col] = true;
            }
        }
    }
    return result;
}
Also used : TypeDescription(org.apache.orc.TypeDescription)

Example 7 with TypeDescription

use of org.apache.orc.TypeDescription in project hive by apache.

the class OrcInputFormat method typeDescriptionsFromHiveTypeProperty.

/**
   * Convert a Hive type property string that contains separated type names into a list of
   * TypeDescription objects.
   * @param hiveTypeProperty the desired types from hive
   * @param maxColumns the maximum number of desired columns
   * @return the list of TypeDescription objects.
   */
public static ArrayList<TypeDescription> typeDescriptionsFromHiveTypeProperty(String hiveTypeProperty, int maxColumns) {
    // CONSDIER: We need a type name parser for TypeDescription.
    ArrayList<TypeInfo> typeInfoList = TypeInfoUtils.getTypeInfosFromTypeString(hiveTypeProperty);
    ArrayList<TypeDescription> typeDescrList = new ArrayList<TypeDescription>(typeInfoList.size());
    for (TypeInfo typeInfo : typeInfoList) {
        typeDescrList.add(convertTypeInfo(typeInfo));
        if (typeDescrList.size() >= maxColumns) {
            break;
        }
    }
    return typeDescrList;
}
Also used : ArrayList(java.util.ArrayList) TypeDescription(org.apache.orc.TypeDescription) StructTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo) DecimalTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo) BaseCharTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.BaseCharTypeInfo) MapTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.MapTypeInfo) PrimitiveTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo) ListTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo) TypeInfo(org.apache.hadoop.hive.serde2.typeinfo.TypeInfo) UnionTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.UnionTypeInfo)

Example 8 with TypeDescription

use of org.apache.orc.TypeDescription in project hive by apache.

the class OrcOutputFormat method getOptions.

private OrcFile.WriterOptions getOptions(JobConf conf, Properties props) {
    OrcFile.WriterOptions result = OrcFile.writerOptions(props, conf);
    if (props != null) {
        final String columnNameProperty = props.getProperty(IOConstants.COLUMNS);
        final String columnTypeProperty = props.getProperty(IOConstants.COLUMNS_TYPES);
        if (columnNameProperty != null && !columnNameProperty.isEmpty() && columnTypeProperty != null && !columnTypeProperty.isEmpty()) {
            List<String> columnNames;
            List<TypeInfo> columnTypes;
            final String columnNameDelimiter = props.containsKey(serdeConstants.COLUMN_NAME_DELIMITER) ? props.getProperty(serdeConstants.COLUMN_NAME_DELIMITER) : String.valueOf(SerDeUtils.COMMA);
            if (columnNameProperty.length() == 0) {
                columnNames = new ArrayList<String>();
            } else {
                columnNames = Arrays.asList(columnNameProperty.split(columnNameDelimiter));
            }
            if (columnTypeProperty.length() == 0) {
                columnTypes = new ArrayList<TypeInfo>();
            } else {
                columnTypes = TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
            }
            TypeDescription schema = TypeDescription.createStruct();
            for (int i = 0; i < columnNames.size(); ++i) {
                schema.addField(columnNames.get(i), OrcInputFormat.convertTypeInfo(columnTypes.get(i)));
            }
            if (LOG.isDebugEnabled()) {
                LOG.debug("ORC schema = " + schema);
            }
            result.setSchema(schema);
        }
    }
    return result;
}
Also used : TypeDescription(org.apache.orc.TypeDescription) MapTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.MapTypeInfo) ListTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo) StructTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo) PrimitiveTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo) DecimalTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo) TypeInfo(org.apache.hadoop.hive.serde2.typeinfo.TypeInfo) UnionTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.UnionTypeInfo) BaseCharTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.BaseCharTypeInfo)

Example 9 with TypeDescription

use of org.apache.orc.TypeDescription in project hive by apache.

the class RecordReaderImpl method nextMap.

static HashMap<Object, Object> nextMap(ColumnVector vector, int row, TypeDescription schema, Object previous) {
    if (vector.isRepeating) {
        row = 0;
    }
    if (vector.noNulls || !vector.isNull[row]) {
        MapColumnVector map = (MapColumnVector) vector;
        int length = (int) map.lengths[row];
        int offset = (int) map.offsets[row];
        TypeDescription keyType = schema.getChildren().get(0);
        TypeDescription valueType = schema.getChildren().get(1);
        HashMap<Object, Object> result;
        if (previous == null || previous.getClass() != HashMap.class) {
            result = new HashMap<Object, Object>(length);
        } else {
            result = (HashMap<Object, Object>) previous;
            // I couldn't think of a good way to reuse the keys and value objects
            // without even more allocations, so take the easy and safe approach.
            result.clear();
        }
        for (int e = 0; e < length; ++e) {
            result.put(nextValue(map.keys, e + offset, keyType, null), nextValue(map.values, e + offset, valueType, null));
        }
        return result;
    } else {
        return null;
    }
}
Also used : MapColumnVector(org.apache.hadoop.hive.ql.exec.vector.MapColumnVector) HashMap(java.util.HashMap) TypeDescription(org.apache.orc.TypeDescription)

Example 10 with TypeDescription

use of org.apache.orc.TypeDescription in project hive by apache.

the class RecordReaderImpl method nextStruct.

static OrcStruct nextStruct(ColumnVector vector, int row, TypeDescription schema, Object previous) {
    if (vector.isRepeating) {
        row = 0;
    }
    if (vector.noNulls || !vector.isNull[row]) {
        OrcStruct result;
        List<TypeDescription> childrenTypes = schema.getChildren();
        int numChildren = childrenTypes.size();
        if (previous == null || previous.getClass() != OrcStruct.class) {
            result = new OrcStruct(numChildren);
        } else {
            result = (OrcStruct) previous;
            result.setNumFields(numChildren);
        }
        StructColumnVector struct = (StructColumnVector) vector;
        for (int f = 0; f < numChildren; ++f) {
            result.setFieldValue(f, nextValue(struct.fields[f], row, childrenTypes.get(f), result.getFieldValue(f)));
        }
        return result;
    } else {
        return null;
    }
}
Also used : StructColumnVector(org.apache.hadoop.hive.ql.exec.vector.StructColumnVector) TypeDescription(org.apache.orc.TypeDescription)

Aggregations

TypeDescription (org.apache.orc.TypeDescription)24 ArrayList (java.util.ArrayList)6 VectorizedRowBatch (org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch)5 Test (org.junit.Test)5 Path (org.apache.hadoop.fs.Path)4 BytesColumnVector (org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector)4 LongColumnVector (org.apache.hadoop.hive.ql.exec.vector.LongColumnVector)4 ListObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector)4 MapObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector)4 ObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector)4 StructObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)4 BinaryObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.BinaryObjectInspector)4 BooleanObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.BooleanObjectInspector)4 ByteObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.ByteObjectInspector)4 DoubleObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.DoubleObjectInspector)4 FloatObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.FloatObjectInspector)4 HiveDecimalObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.HiveDecimalObjectInspector)4 StructColumnVector (org.apache.hadoop.hive.ql.exec.vector.StructColumnVector)3 IntObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.IntObjectInspector)3 LongObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector)3