use of org.apache.drill.exec.record.metadata.ColumnMetadata in project drill by apache.
the class SchemaPathUtils method isFieldNestedInDictOrRepeatedMap.
/**
* Checks if field identified by the schema path is child in either {@code DICT} or {@code REPEATED MAP}.
* For such fields, nested in {@code DICT} or {@code REPEATED MAP},
* filters can't be removed based on Parquet statistics.
*
* <p>The need for the check arises because statistics data is not obtained for such fields as their representation
* differs from the 'canonical' one. For example, field {@code `a`} in Parquet's {@code STRUCT ARRAY} is represented
* as {@code `struct_array`.`bag`.`array_element`.`a`} but once it is used in a filter, {@code ... WHERE struct_array[0].a = 1},
* it has different representation (with indexes stripped): {@code `struct_array`.`a`} which is not present in statistics.
* The same happens with DICT's {@code value}: for {@code SELECT ... WHERE dict_col['a'] = 0}, statistics exist for
* {@code `dict_col`.`key_value`.`value`} but the field in filter is translated to {@code `dict_col`.`a`} and hence it is
* considered not present in statistics. If the fields (such as ones shown in examples) are {@code OPTIONAL INT} then
* the field is considered not present in a table and is treated as {@code NULL}. To avoid this situation, the method is used.</p>
*
* @param schemaPath schema path used in filter
* @param schema schema containing all the fields in the file
* @return {@literal true} if field is nested inside {@code DICT} (is {@code `key`} or {@code `value`})
* or inside {@code REPEATED MAP} field, {@literal false} otherwise.
*/
public static boolean isFieldNestedInDictOrRepeatedMap(SchemaPath schemaPath, TupleMetadata schema) {
PathSegment.NameSegment colPath = schemaPath.getUnIndexed().getRootSegment();
ColumnMetadata colMetadata = schema.metadata(colPath.getPath());
while (!colPath.isLastPath() && colMetadata != null) {
if (colMetadata.isDict() || (colMetadata.isMap() && Types.isRepeated(colMetadata.majorType()))) {
return true;
} else if (!colMetadata.isMap()) {
break;
}
colPath = (PathSegment.NameSegment) colPath.getChild();
colMetadata = colMetadata.tupleSchema().metadata(colPath.getPath());
}
return false;
}
use of org.apache.drill.exec.record.metadata.ColumnMetadata in project drill by apache.
the class MapColumnConverter method buildMapMembers.
public void buildMapMembers(Record record, TupleMetadata providedSchema, TupleWriter tupleWriter, Map<String, ColumnConverter> converters) {
TupleMetadata readerSchema = IcebergColumnConverterFactory.convertSchema(record.struct());
TupleMetadata tableSchema = FixedReceiver.Builder.mergeSchemas(providedSchema, readerSchema);
tableSchema.toMetadataList().forEach(tupleWriter::addColumn);
for (ColumnMetadata columnMetadata : tableSchema) {
String name = columnMetadata.name();
converters.put(name, factory.getConverter(providedSchema, readerSchema.metadata(name), tupleWriter.column(name)));
}
}
use of org.apache.drill.exec.record.metadata.ColumnMetadata in project drill by apache.
the class IcebergColumnConverterFactory method convertSchema.
public static TupleSchema convertSchema(Types.StructType structType) {
TupleSchema schema = new TupleSchema();
for (Types.NestedField field : structType.fields()) {
ColumnMetadata columnMetadata = getColumnMetadata(field);
schema.add(columnMetadata);
}
return schema;
}
use of org.apache.drill.exec.record.metadata.ColumnMetadata in project drill by apache.
the class HttpdLogRecord method getColWriter.
private ScalarWriter getColWriter(TupleWriter tupleWriter, String fieldName, TypeProtos.MinorType type) {
int index = tupleWriter.tupleSchema().index(fieldName);
if (index == -1) {
ColumnMetadata colSchema = MetadataUtils.newScalar(fieldName, type, TypeProtos.DataMode.OPTIONAL);
index = tupleWriter.addColumn(colSchema);
}
return tupleWriter.scalar(index);
}
use of org.apache.drill.exec.record.metadata.ColumnMetadata in project drill by apache.
the class HDF5BatchReader method writeTimestampListColumn.
private void writeTimestampListColumn(TupleWriter rowWriter, String name, long[] list) {
int index = rowWriter.tupleSchema().index(name);
if (index == -1) {
ColumnMetadata colSchema = MetadataUtils.newScalar(name, TypeProtos.MinorType.TIMESTAMP, TypeProtos.DataMode.REPEATED);
index = rowWriter.addColumn(colSchema);
}
ScalarWriter arrayWriter = rowWriter.column(index).array().scalar();
for (long l : list) {
arrayWriter.setTimestamp(Instant.ofEpochMilli(l));
}
}
Aggregations