Search in sources :

Example 1 with SentinelUnion

use of org.apache.parquet.thrift.ConvertedField.SentinelUnion in project parquet-mr by apache.

the class ThriftSchemaConvertVisitor method visit.

@Override
public ConvertedField visit(StructType structType, State state) {
    // special care is taken when converting unions,
    // because we are actually both converting + projecting in
    // one pass, and unions need special handling when projecting.
    final boolean needsToKeepOneOfEachUnion = keepOneOfEachUnion && isUnion(structType.getStructOrUnionType());
    boolean hasSentinelUnionColumns = false;
    boolean hasNonSentinelUnionColumns = false;
    List<Type> convertedChildren = new ArrayList<Type>();
    for (ThriftField child : structType.getChildren()) {
        State childState = new State(state.path.push(child), getRepetition(child), child.getName());
        ConvertedField converted = child.getType().accept(this, childState);
        if (!converted.isKeep() && needsToKeepOneOfEachUnion) {
            // user is not keeping this "kind" of union, but we still need
            // to keep at least one of the primitives of this union around.
            // in order to know what "kind" of union each record is.
            // TODO: in the future, we should just filter these records out instead
            // re-do the recursion, with a new projection filter that keeps only
            // the first primitive it encounters
            ConvertedField firstPrimitive = child.getType().accept(new ThriftSchemaConvertVisitor(new KeepOnlyFirstPrimitiveFilter(), true, keepOneOfEachUnion), childState);
            convertedChildren.add(firstPrimitive.asKeep().getType().withId(child.getFieldId()));
            hasSentinelUnionColumns = true;
        }
        if (converted.isSentinelUnion()) {
            // child field is a sentinel union that we should drop if possible
            if (childState.repetition == REQUIRED) {
                // but this field is required, so we may still need it
                convertedChildren.add(converted.asSentinelUnion().getType().withId(child.getFieldId()));
                hasSentinelUnionColumns = true;
            }
        } else if (converted.isKeep()) {
            // user has selected this column, so we keep it.
            convertedChildren.add(converted.asKeep().getType().withId(child.getFieldId()));
            hasNonSentinelUnionColumns = true;
        }
    }
    if (!hasNonSentinelUnionColumns && hasSentinelUnionColumns) {
        // we may not be able to, so tag this as a sentinel.
        return new SentinelUnion(state.path, new GroupType(state.repetition, state.name, convertedChildren));
    }
    if (hasNonSentinelUnionColumns) {
        // user requested some of the fields of this struct, so we keep the struct
        return new Keep(state.path, new GroupType(state.repetition, state.name, convertedChildren));
    } else {
        // user requested none of the fields of this struct, so we drop it
        return new Drop(state.path);
    }
}
Also used : Keep(org.apache.parquet.thrift.ConvertedField.Keep) ArrayList(java.util.ArrayList) ThriftField(org.apache.parquet.thrift.struct.ThriftField) Drop(org.apache.parquet.thrift.ConvertedField.Drop) PrimitiveType(org.apache.parquet.schema.PrimitiveType) ThriftType(org.apache.parquet.thrift.struct.ThriftType) ConversionPatterns.listType(org.apache.parquet.schema.ConversionPatterns.listType) SetType(org.apache.parquet.thrift.struct.ThriftType.SetType) I64Type(org.apache.parquet.thrift.struct.ThriftType.I64Type) StructOrUnionType(org.apache.parquet.thrift.struct.ThriftType.StructType.StructOrUnionType) BoolType(org.apache.parquet.thrift.struct.ThriftType.BoolType) StringType(org.apache.parquet.thrift.struct.ThriftType.StringType) ByteType(org.apache.parquet.thrift.struct.ThriftType.ByteType) StructType(org.apache.parquet.thrift.struct.ThriftType.StructType) ListType(org.apache.parquet.thrift.struct.ThriftType.ListType) I32Type(org.apache.parquet.thrift.struct.ThriftType.I32Type) DoubleType(org.apache.parquet.thrift.struct.ThriftType.DoubleType) OriginalType(org.apache.parquet.schema.OriginalType) GroupType(org.apache.parquet.schema.GroupType) ConversionPatterns.mapType(org.apache.parquet.schema.ConversionPatterns.mapType) MapType(org.apache.parquet.thrift.struct.ThriftType.MapType) EnumType(org.apache.parquet.thrift.struct.ThriftType.EnumType) MessageType(org.apache.parquet.schema.MessageType) I16Type(org.apache.parquet.thrift.struct.ThriftType.I16Type) Type(org.apache.parquet.schema.Type) GroupType(org.apache.parquet.schema.GroupType) SentinelUnion(org.apache.parquet.thrift.ConvertedField.SentinelUnion)

Aggregations

ArrayList (java.util.ArrayList)1 ConversionPatterns.listType (org.apache.parquet.schema.ConversionPatterns.listType)1 ConversionPatterns.mapType (org.apache.parquet.schema.ConversionPatterns.mapType)1 GroupType (org.apache.parquet.schema.GroupType)1 MessageType (org.apache.parquet.schema.MessageType)1 OriginalType (org.apache.parquet.schema.OriginalType)1 PrimitiveType (org.apache.parquet.schema.PrimitiveType)1 Type (org.apache.parquet.schema.Type)1 Drop (org.apache.parquet.thrift.ConvertedField.Drop)1 Keep (org.apache.parquet.thrift.ConvertedField.Keep)1 SentinelUnion (org.apache.parquet.thrift.ConvertedField.SentinelUnion)1 ThriftField (org.apache.parquet.thrift.struct.ThriftField)1 ThriftType (org.apache.parquet.thrift.struct.ThriftType)1 BoolType (org.apache.parquet.thrift.struct.ThriftType.BoolType)1 ByteType (org.apache.parquet.thrift.struct.ThriftType.ByteType)1 DoubleType (org.apache.parquet.thrift.struct.ThriftType.DoubleType)1 EnumType (org.apache.parquet.thrift.struct.ThriftType.EnumType)1 I16Type (org.apache.parquet.thrift.struct.ThriftType.I16Type)1 I32Type (org.apache.parquet.thrift.struct.ThriftType.I32Type)1 I64Type (org.apache.parquet.thrift.struct.ThriftType.I64Type)1