Search in sources :

Example 46 with ColumnMetadata

use of org.apache.drill.exec.record.metadata.ColumnMetadata in project drill by apache.

the class SchemaPathUtils method isFieldNestedInDictOrRepeatedMap.

/**
 * Checks if field identified by the schema path is child in either {@code DICT} or {@code REPEATED MAP}.
 * For such fields, nested in {@code DICT} or {@code REPEATED MAP},
 * filters can't be removed based on Parquet statistics.
 *
 * <p>The need for the check arises because statistics data is not obtained for such fields as their representation
 * differs from the 'canonical' one. For example, field {@code `a`} in Parquet's {@code STRUCT ARRAY} is represented
 * as {@code `struct_array`.`bag`.`array_element`.`a`} but once it is used in a filter, {@code ... WHERE struct_array[0].a = 1},
 * it has different representation (with indexes stripped): {@code `struct_array`.`a`} which is not present in statistics.
 * The same happens with DICT's {@code value}: for {@code SELECT ... WHERE dict_col['a'] = 0}, statistics exist for
 * {@code `dict_col`.`key_value`.`value`} but the field in filter is translated to {@code `dict_col`.`a`} and hence it is
 * considered not present in statistics. If the fields (such as ones shown in examples) are {@code OPTIONAL INT} then
 * the field is considered not present in a table and is treated as {@code NULL}. To avoid this situation, the method is used.</p>
 *
 * @param schemaPath schema path used in filter
 * @param schema schema containing all the fields in the file
 * @return {@literal true} if field is nested inside {@code DICT} (is {@code `key`} or {@code `value`})
 *         or inside {@code REPEATED MAP} field, {@literal false} otherwise.
 */
public static boolean isFieldNestedInDictOrRepeatedMap(SchemaPath schemaPath, TupleMetadata schema) {
    PathSegment.NameSegment colPath = schemaPath.getUnIndexed().getRootSegment();
    ColumnMetadata colMetadata = schema.metadata(colPath.getPath());
    while (!colPath.isLastPath() && colMetadata != null) {
        if (colMetadata.isDict() || (colMetadata.isMap() && Types.isRepeated(colMetadata.majorType()))) {
            return true;
        } else if (!colMetadata.isMap()) {
            break;
        }
        colPath = (PathSegment.NameSegment) colPath.getChild();
        colMetadata = colMetadata.tupleSchema().metadata(colPath.getPath());
    }
    return false;
}
Also used : DictColumnMetadata(org.apache.drill.exec.record.metadata.DictColumnMetadata) ColumnMetadata(org.apache.drill.exec.record.metadata.ColumnMetadata) PrimitiveColumnMetadata(org.apache.drill.exec.record.metadata.PrimitiveColumnMetadata) PathSegment(org.apache.drill.common.expression.PathSegment)

Example 47 with ColumnMetadata

use of org.apache.drill.exec.record.metadata.ColumnMetadata in project drill by apache.

the class MapColumnConverter method buildMapMembers.

public void buildMapMembers(Record record, TupleMetadata providedSchema, TupleWriter tupleWriter, Map<String, ColumnConverter> converters) {
    TupleMetadata readerSchema = IcebergColumnConverterFactory.convertSchema(record.struct());
    TupleMetadata tableSchema = FixedReceiver.Builder.mergeSchemas(providedSchema, readerSchema);
    tableSchema.toMetadataList().forEach(tupleWriter::addColumn);
    for (ColumnMetadata columnMetadata : tableSchema) {
        String name = columnMetadata.name();
        converters.put(name, factory.getConverter(providedSchema, readerSchema.metadata(name), tupleWriter.column(name)));
    }
}
Also used : ColumnMetadata(org.apache.drill.exec.record.metadata.ColumnMetadata) TupleMetadata(org.apache.drill.exec.record.metadata.TupleMetadata)

Example 48 with ColumnMetadata

use of org.apache.drill.exec.record.metadata.ColumnMetadata in project drill by apache.

the class IcebergColumnConverterFactory method convertSchema.

public static TupleSchema convertSchema(Types.StructType structType) {
    TupleSchema schema = new TupleSchema();
    for (Types.NestedField field : structType.fields()) {
        ColumnMetadata columnMetadata = getColumnMetadata(field);
        schema.add(columnMetadata);
    }
    return schema;
}
Also used : Types(org.apache.iceberg.types.Types) DictColumnMetadata(org.apache.drill.exec.record.metadata.DictColumnMetadata) ColumnMetadata(org.apache.drill.exec.record.metadata.ColumnMetadata) TupleSchema(org.apache.drill.exec.record.metadata.TupleSchema)

Example 49 with ColumnMetadata

use of org.apache.drill.exec.record.metadata.ColumnMetadata in project drill by apache.

the class HttpdLogRecord method getColWriter.

private ScalarWriter getColWriter(TupleWriter tupleWriter, String fieldName, TypeProtos.MinorType type) {
    int index = tupleWriter.tupleSchema().index(fieldName);
    if (index == -1) {
        ColumnMetadata colSchema = MetadataUtils.newScalar(fieldName, type, TypeProtos.DataMode.OPTIONAL);
        index = tupleWriter.addColumn(colSchema);
    }
    return tupleWriter.scalar(index);
}
Also used : ColumnMetadata(org.apache.drill.exec.record.metadata.ColumnMetadata)

Example 50 with ColumnMetadata

use of org.apache.drill.exec.record.metadata.ColumnMetadata in project drill by apache.

the class HDF5BatchReader method writeTimestampListColumn.

private void writeTimestampListColumn(TupleWriter rowWriter, String name, long[] list) {
    int index = rowWriter.tupleSchema().index(name);
    if (index == -1) {
        ColumnMetadata colSchema = MetadataUtils.newScalar(name, TypeProtos.MinorType.TIMESTAMP, TypeProtos.DataMode.REPEATED);
        index = rowWriter.addColumn(colSchema);
    }
    ScalarWriter arrayWriter = rowWriter.column(index).array().scalar();
    for (long l : list) {
        arrayWriter.setTimestamp(Instant.ofEpochMilli(l));
    }
}
Also used : ColumnMetadata(org.apache.drill.exec.record.metadata.ColumnMetadata) ScalarWriter(org.apache.drill.exec.vector.accessor.ScalarWriter)

Aggregations

ColumnMetadata (org.apache.drill.exec.record.metadata.ColumnMetadata)195 Test (org.junit.Test)104 TupleMetadata (org.apache.drill.exec.record.metadata.TupleMetadata)98 SchemaBuilder (org.apache.drill.exec.record.metadata.SchemaBuilder)50 DrillTest (org.apache.drill.test.DrillTest)37 PrimitiveColumnMetadata (org.apache.drill.exec.record.metadata.PrimitiveColumnMetadata)31 SubOperatorTest (org.apache.drill.test.SubOperatorTest)31 BaseTest (org.apache.drill.test.BaseTest)26 MapColumnMetadata (org.apache.drill.exec.record.metadata.MapColumnMetadata)20 SchemaBuilder (org.apache.drill.test.rowSet.schema.SchemaBuilder)20 VariantColumnMetadata (org.apache.drill.exec.record.metadata.VariantColumnMetadata)19 VariantMetadata (org.apache.drill.exec.record.metadata.VariantMetadata)19 AbstractColumnMetadata (org.apache.drill.exec.record.metadata.AbstractColumnMetadata)17 ScalarWriter (org.apache.drill.exec.vector.accessor.ScalarWriter)17 DictColumnMetadata (org.apache.drill.exec.record.metadata.DictColumnMetadata)15 EvfTest (org.apache.drill.categories.EvfTest)13 ProjectionFilter (org.apache.drill.exec.physical.resultSet.impl.ProjectionFilter)13 TupleSchema (org.apache.drill.exec.record.metadata.TupleSchema)11 ArrayList (java.util.ArrayList)10 ProjResult (org.apache.drill.exec.physical.resultSet.impl.ProjectionFilter.ProjResult)10