Search in sources :

Example 1 with Field

use of org.apache.arrow.vector.types.pojo.Field in project parquet-mr by apache.

the class SchemaConverter method fromParquet.

/**
 * @param type parquet type
 * @param name overrides parquet.getName)
 * @param repetition overrides parquet.getRepetition()
 * @return
 */
private TypeMapping fromParquet(Type type, String name, Repetition repetition) {
    if (repetition == REPEATED) {
        // case where we have a repeated field that is not in a List/Map
        TypeMapping child = fromParquet(type, null, REQUIRED);
        Field arrowField = new Field(name, false, new ArrowType.List(), asList(child.getArrowField()));
        return new RepeatedTypeMapping(arrowField, type, child);
    }
    if (type.isPrimitive()) {
        return fromParquetPrimitive(type.asPrimitiveType(), name);
    } else {
        return fromParquetGroup(type.asGroupType(), name);
    }
}
Also used : Field(org.apache.arrow.vector.types.pojo.Field) ArrowType(org.apache.arrow.vector.types.pojo.ArrowType) PrimitiveTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.PrimitiveTypeMapping) RepeatedTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.RepeatedTypeMapping) UnionTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.UnionTypeMapping) ListTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.ListTypeMapping) StructTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.StructTypeMapping) TypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.TypeMapping) RepeatedTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.RepeatedTypeMapping)

Example 2 with Field

use of org.apache.arrow.vector.types.pojo.Field in project parquet-mr by apache.

the class SchemaConverter method fromArrow.

/**
 * Creates a Parquet Schema from an Arrow one and returns the mapping
 * @param arrowSchema the provided Arrow Schema
 * @return the mapping between the 2
 */
public SchemaMapping fromArrow(Schema arrowSchema) {
    List<Field> fields = arrowSchema.getFields();
    List<TypeMapping> parquetFields = fromArrow(fields);
    MessageType parquetType = addToBuilder(parquetFields, Types.buildMessage()).named("root");
    return new SchemaMapping(arrowSchema, parquetType, parquetFields);
}
Also used : Field(org.apache.arrow.vector.types.pojo.Field) PrimitiveTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.PrimitiveTypeMapping) RepeatedTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.RepeatedTypeMapping) UnionTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.UnionTypeMapping) ListTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.ListTypeMapping) StructTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.StructTypeMapping) TypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.TypeMapping) MessageType(org.apache.parquet.schema.MessageType)

Example 3 with Field

use of org.apache.arrow.vector.types.pojo.Field in project parquet-mr by apache.

the class SchemaConverter method map.

private List<TypeMapping> map(List<Field> arrowFields, List<Type> parquetFields) {
    if (arrowFields.size() != parquetFields.size()) {
        throw new IllegalArgumentException("Can not map schemas as sizes differ: " + arrowFields + " != " + parquetFields);
    }
    List<TypeMapping> result = new ArrayList<>(arrowFields.size());
    for (int i = 0; i < arrowFields.size(); i++) {
        Field arrowField = arrowFields.get(i);
        Type parquetField = parquetFields.get(i);
        result.add(map(arrowField, parquetField));
    }
    return result;
}
Also used : Field(org.apache.arrow.vector.types.pojo.Field) PrimitiveType(org.apache.parquet.schema.PrimitiveType) GroupType(org.apache.parquet.schema.GroupType) MessageType(org.apache.parquet.schema.MessageType) Type(org.apache.parquet.schema.Type) ArrowType(org.apache.arrow.vector.types.pojo.ArrowType) OriginalType(org.apache.parquet.schema.OriginalType) ArrayList(java.util.ArrayList) PrimitiveTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.PrimitiveTypeMapping) RepeatedTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.RepeatedTypeMapping) UnionTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.UnionTypeMapping) ListTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.ListTypeMapping) StructTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.StructTypeMapping) TypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.TypeMapping) FloatingPoint(org.apache.arrow.vector.types.pojo.ArrowType.FloatingPoint)

Example 4 with Field

use of org.apache.arrow.vector.types.pojo.Field in project parquet-mr by apache.

the class SchemaConverter method fromParquet.

/**
 * Creates an Arrow Schema from an Parquet one and returns the mapping
 * @param parquetSchema the provided Parquet Schema
 * @return the mapping between the 2
 */
public SchemaMapping fromParquet(MessageType parquetSchema) {
    List<Type> fields = parquetSchema.getFields();
    List<TypeMapping> mappings = fromParquet(fields);
    List<Field> arrowFields = fields(mappings);
    return new SchemaMapping(new Schema(arrowFields), parquetSchema, mappings);
}
Also used : Field(org.apache.arrow.vector.types.pojo.Field) PrimitiveType(org.apache.parquet.schema.PrimitiveType) GroupType(org.apache.parquet.schema.GroupType) MessageType(org.apache.parquet.schema.MessageType) Type(org.apache.parquet.schema.Type) ArrowType(org.apache.arrow.vector.types.pojo.ArrowType) OriginalType(org.apache.parquet.schema.OriginalType) Schema(org.apache.arrow.vector.types.pojo.Schema) PrimitiveTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.PrimitiveTypeMapping) RepeatedTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.RepeatedTypeMapping) UnionTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.UnionTypeMapping) ListTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.ListTypeMapping) StructTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.StructTypeMapping) TypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.TypeMapping)

Example 5 with Field

use of org.apache.arrow.vector.types.pojo.Field in project parquet-mr by apache.

the class SchemaConverter method fromParquetGroup.

/**
 * @param type parquet types
 * @param name overrides parquet.getName()
 * @return the mapping
 */
private TypeMapping fromParquetGroup(GroupType type, String name) {
    OriginalType ot = type.getOriginalType();
    if (ot == null) {
        List<TypeMapping> typeMappings = fromParquet(type.getFields());
        Field arrowField = new Field(name, type.isRepetition(OPTIONAL), new Struct_(), fields(typeMappings));
        return new StructTypeMapping(arrowField, type, typeMappings);
    } else {
        switch(ot) {
            case LIST:
                List3Levels list3Levels = new List3Levels(type);
                TypeMapping child = fromParquet(list3Levels.getElement(), null, list3Levels.getElement().getRepetition());
                Field arrowField = new Field(name, type.isRepetition(OPTIONAL), new ArrowType.List(), asList(child.getArrowField()));
                return new ListTypeMapping(arrowField, list3Levels, child);
            default:
                throw new UnsupportedOperationException("Unsupported type " + type);
        }
    }
}
Also used : OriginalType(org.apache.parquet.schema.OriginalType) Field(org.apache.arrow.vector.types.pojo.Field) ListTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.ListTypeMapping) ArrowType(org.apache.arrow.vector.types.pojo.ArrowType) PrimitiveTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.PrimitiveTypeMapping) RepeatedTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.RepeatedTypeMapping) UnionTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.UnionTypeMapping) ListTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.ListTypeMapping) StructTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.StructTypeMapping) TypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.TypeMapping) StructTypeMapping(org.apache.parquet.arrow.schema.SchemaMapping.StructTypeMapping) Struct_(org.apache.arrow.vector.types.pojo.ArrowType.Struct_)

Aggregations

Field (org.apache.arrow.vector.types.pojo.Field)17 ArrowType (org.apache.arrow.vector.types.pojo.ArrowType)9 ArrayList (java.util.ArrayList)5 ListTypeMapping (org.apache.parquet.arrow.schema.SchemaMapping.ListTypeMapping)5 PrimitiveTypeMapping (org.apache.parquet.arrow.schema.SchemaMapping.PrimitiveTypeMapping)5 RepeatedTypeMapping (org.apache.parquet.arrow.schema.SchemaMapping.RepeatedTypeMapping)5 StructTypeMapping (org.apache.parquet.arrow.schema.SchemaMapping.StructTypeMapping)5 TypeMapping (org.apache.parquet.arrow.schema.SchemaMapping.TypeMapping)5 UnionTypeMapping (org.apache.parquet.arrow.schema.SchemaMapping.UnionTypeMapping)5 Schema (org.apache.arrow.vector.types.pojo.Schema)4 FieldVector (org.apache.arrow.vector.FieldVector)3 FieldType (org.apache.arrow.vector.types.pojo.FieldType)3 MessageType (org.apache.parquet.schema.MessageType)3 OriginalType (org.apache.parquet.schema.OriginalType)3 Twister2RuntimeException (edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException)2 TField (edu.iu.dsc.tws.common.table.TField)2 Attribute (edu.uci.ics.texera.api.schema.Attribute)2 Schema (edu.uci.ics.texera.api.schema.Schema)2 GroupType (org.apache.parquet.schema.GroupType)2 PrimitiveType (org.apache.parquet.schema.PrimitiveType)2