Search in sources :

Example 1 with ThriftProjectionException

use of org.apache.parquet.thrift.projection.ThriftProjectionException in project parquet-mr by apache.

the class ThriftSchemaConvertVisitor method visit.

@Override
public ConvertedField visit(MapType mapType, State state) {
    ThriftField keyField = mapType.getKey();
    ThriftField valueField = mapType.getValue();
    State keyState = new State(state.path.push(keyField), REQUIRED, "key");
    // TODO: This is a bug! this should be REQUIRED but changing this will
    // break the the schema compatibility check against old data
    // Thrift does not support null / missing map values.
    State valueState = new State(state.path.push(valueField), OPTIONAL, "value");
    ConvertedField convertedKey = keyField.getType().accept(this, keyState);
    ConvertedField convertedValue = valueField.getType().accept(this, valueState);
    if (!convertedKey.isKeep()) {
        if (convertedValue.isKeep()) {
            throw new ThriftProjectionException("Cannot select only the values of a map, you must keep the keys as well: " + state.path);
        }
        // neither key nor value was requested
        return new Drop(state.path);
    }
    // NOTE: doProjections prevents us from infinite recursion here.
    if (doProjection) {
        ConvertedField fullConvKey = keyField.getType().accept(new ThriftSchemaConvertVisitor(FieldProjectionFilter.ALL_COLUMNS, false, keepOneOfEachUnion), keyState);
        if (!fullConvKey.asKeep().getType().equals(convertedKey.asKeep().getType())) {
            throw new ThriftProjectionException("Cannot select only a subset of the fields in a map key, " + "for path " + state.path);
        }
    }
    if (convertedValue.isKeep()) {
        // keep both key and value
        Type mapField = mapType(state.repetition, state.name, convertedKey.asKeep().getType(), convertedValue.asKeep().getType());
        return new Keep(state.path, mapField);
    }
    // keep only the key, not the value
    ConvertedField sentinelValue = valueField.getType().accept(new ThriftSchemaConvertVisitor(new KeepOnlyFirstPrimitiveFilter(), true, keepOneOfEachUnion), valueState);
    Type mapField = mapType(state.repetition, state.name, convertedKey.asKeep().getType(), // signals to mapType method to project the value
    sentinelValue.asKeep().getType());
    return new Keep(state.path, mapField);
}
Also used : PrimitiveType(org.apache.parquet.schema.PrimitiveType) ThriftType(org.apache.parquet.thrift.struct.ThriftType) ConversionPatterns.listType(org.apache.parquet.schema.ConversionPatterns.listType) SetType(org.apache.parquet.thrift.struct.ThriftType.SetType) I64Type(org.apache.parquet.thrift.struct.ThriftType.I64Type) StructOrUnionType(org.apache.parquet.thrift.struct.ThriftType.StructType.StructOrUnionType) BoolType(org.apache.parquet.thrift.struct.ThriftType.BoolType) StringType(org.apache.parquet.thrift.struct.ThriftType.StringType) ByteType(org.apache.parquet.thrift.struct.ThriftType.ByteType) StructType(org.apache.parquet.thrift.struct.ThriftType.StructType) ListType(org.apache.parquet.thrift.struct.ThriftType.ListType) I32Type(org.apache.parquet.thrift.struct.ThriftType.I32Type) DoubleType(org.apache.parquet.thrift.struct.ThriftType.DoubleType) OriginalType(org.apache.parquet.schema.OriginalType) GroupType(org.apache.parquet.schema.GroupType) ConversionPatterns.mapType(org.apache.parquet.schema.ConversionPatterns.mapType) MapType(org.apache.parquet.thrift.struct.ThriftType.MapType) EnumType(org.apache.parquet.thrift.struct.ThriftType.EnumType) MessageType(org.apache.parquet.schema.MessageType) I16Type(org.apache.parquet.thrift.struct.ThriftType.I16Type) Type(org.apache.parquet.schema.Type) Keep(org.apache.parquet.thrift.ConvertedField.Keep) ThriftProjectionException(org.apache.parquet.thrift.projection.ThriftProjectionException) ThriftField(org.apache.parquet.thrift.struct.ThriftField) Drop(org.apache.parquet.thrift.ConvertedField.Drop)

Example 2 with ThriftProjectionException

use of org.apache.parquet.thrift.projection.ThriftProjectionException in project parquet-mr by apache.

the class ThriftReadSupport method init.

@Override
public org.apache.parquet.hadoop.api.ReadSupport.ReadContext init(InitContext context) {
    final Configuration configuration = context.getConfiguration();
    final MessageType fileMessageType = context.getFileSchema();
    MessageType requestedProjection = fileMessageType;
    String partialSchemaString = configuration.get(ReadSupport.PARQUET_READ_SCHEMA);
    FieldProjectionFilter projectionFilter = getFieldProjectionFilter(configuration);
    if (partialSchemaString != null && projectionFilter != null) {
        throw new ThriftProjectionException(String.format("You cannot provide both a partial schema and field projection filter." + "Only one of (%s, %s, %s) should be set.", PARQUET_READ_SCHEMA, STRICT_THRIFT_COLUMN_FILTER_KEY, THRIFT_COLUMN_FILTER_KEY));
    }
    // set requestedProjections only when it's specified
    if (partialSchemaString != null) {
        requestedProjection = getSchemaForRead(fileMessageType, partialSchemaString);
    } else if (projectionFilter != null) {
        try {
            initThriftClassFromMultipleFiles(context.getKeyValueMetadata(), configuration);
            requestedProjection = getProjectedSchema(projectionFilter);
        } catch (ClassNotFoundException e) {
            throw new ThriftProjectionException("can not find thriftClass from configuration", e);
        }
    }
    MessageType schemaForRead = getSchemaForRead(fileMessageType, requestedProjection);
    return new ReadContext(schemaForRead);
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) StrictFieldProjectionFilter(org.apache.parquet.thrift.projection.StrictFieldProjectionFilter) DeprecatedFieldProjectionFilter(org.apache.parquet.thrift.projection.deprecated.DeprecatedFieldProjectionFilter) FieldProjectionFilter(org.apache.parquet.thrift.projection.FieldProjectionFilter) ThriftProjectionException(org.apache.parquet.thrift.projection.ThriftProjectionException) MessageType(org.apache.parquet.schema.MessageType)

Example 3 with ThriftProjectionException

use of org.apache.parquet.thrift.projection.ThriftProjectionException in project parquet-mr by apache.

the class ThriftSchemaConvertVisitor method convert.

public static MessageType convert(StructType struct, FieldProjectionFilter filter, boolean keepOneOfEachUnion) {
    State state = new State(new FieldsPath(), REPEATED, "ParquetSchema");
    ConvertedField converted = struct.accept(new ThriftSchemaConvertVisitor(filter, true, keepOneOfEachUnion), state);
    if (!converted.isKeep()) {
        throw new ThriftProjectionException("No columns have been selected");
    }
    return new MessageType(state.name, converted.asKeep().getType().asGroupType().getFields());
}
Also used : FieldsPath(org.apache.parquet.thrift.projection.FieldsPath) ThriftProjectionException(org.apache.parquet.thrift.projection.ThriftProjectionException) MessageType(org.apache.parquet.schema.MessageType)

Aggregations

MessageType (org.apache.parquet.schema.MessageType)3 ThriftProjectionException (org.apache.parquet.thrift.projection.ThriftProjectionException)3 Configuration (org.apache.hadoop.conf.Configuration)1 ConversionPatterns.listType (org.apache.parquet.schema.ConversionPatterns.listType)1 ConversionPatterns.mapType (org.apache.parquet.schema.ConversionPatterns.mapType)1 GroupType (org.apache.parquet.schema.GroupType)1 OriginalType (org.apache.parquet.schema.OriginalType)1 PrimitiveType (org.apache.parquet.schema.PrimitiveType)1 Type (org.apache.parquet.schema.Type)1 Drop (org.apache.parquet.thrift.ConvertedField.Drop)1 Keep (org.apache.parquet.thrift.ConvertedField.Keep)1 FieldProjectionFilter (org.apache.parquet.thrift.projection.FieldProjectionFilter)1 FieldsPath (org.apache.parquet.thrift.projection.FieldsPath)1 StrictFieldProjectionFilter (org.apache.parquet.thrift.projection.StrictFieldProjectionFilter)1 DeprecatedFieldProjectionFilter (org.apache.parquet.thrift.projection.deprecated.DeprecatedFieldProjectionFilter)1 ThriftField (org.apache.parquet.thrift.struct.ThriftField)1 ThriftType (org.apache.parquet.thrift.struct.ThriftType)1 BoolType (org.apache.parquet.thrift.struct.ThriftType.BoolType)1 ByteType (org.apache.parquet.thrift.struct.ThriftType.ByteType)1 DoubleType (org.apache.parquet.thrift.struct.ThriftType.DoubleType)1