Search in sources :

Example 1 with NamedReference

use of org.apache.spark.sql.connector.expressions.NamedReference in project iceberg by apache.

the class SparkCopyOnWriteOperation method requiredMetadataAttributes.

@Override
public NamedReference[] requiredMetadataAttributes() {
    NamedReference file = Expressions.column(MetadataColumns.FILE_PATH.name());
    NamedReference pos = Expressions.column(MetadataColumns.ROW_POSITION.name());
    if (command == DELETE || command == UPDATE) {
        return new NamedReference[] { file, pos };
    } else {
        return new NamedReference[] { file };
    }
}
Also used : NamedReference(org.apache.spark.sql.connector.expressions.NamedReference)

Example 2 with NamedReference

use of org.apache.spark.sql.connector.expressions.NamedReference in project iceberg by apache.

the class Spark3Util method toIcebergTerm.

public static Term toIcebergTerm(Expression expr) {
    if (expr instanceof Transform) {
        Transform transform = (Transform) expr;
        Preconditions.checkArgument(transform.references().length == 1, "Cannot convert transform with more than one column reference: %s", transform);
        String colName = DOT.join(transform.references()[0].fieldNames());
        switch(transform.name()) {
            case "identity":
                return org.apache.iceberg.expressions.Expressions.ref(colName);
            case "bucket":
                return org.apache.iceberg.expressions.Expressions.bucket(colName, findWidth(transform));
            case "years":
                return org.apache.iceberg.expressions.Expressions.year(colName);
            case "months":
                return org.apache.iceberg.expressions.Expressions.month(colName);
            case "date":
            case "days":
                return org.apache.iceberg.expressions.Expressions.day(colName);
            case "date_hour":
            case "hours":
                return org.apache.iceberg.expressions.Expressions.hour(colName);
            case "truncate":
                return org.apache.iceberg.expressions.Expressions.truncate(colName, findWidth(transform));
            default:
                throw new UnsupportedOperationException("Transform is not supported: " + transform);
        }
    } else if (expr instanceof NamedReference) {
        NamedReference ref = (NamedReference) expr;
        return org.apache.iceberg.expressions.Expressions.ref(DOT.join(ref.fieldNames()));
    } else {
        throw new UnsupportedOperationException("Cannot convert unknown expression: " + expr);
    }
}
Also used : NamedReference(org.apache.spark.sql.connector.expressions.NamedReference) Transform(org.apache.spark.sql.connector.expressions.Transform)

Example 3 with NamedReference

use of org.apache.spark.sql.connector.expressions.NamedReference in project iceberg by apache.

the class SparkBatchQueryScan method filterAttributes.

@Override
public NamedReference[] filterAttributes() {
    Set<Integer> partitionFieldSourceIds = Sets.newHashSet();
    for (Integer specId : specIds()) {
        PartitionSpec spec = table().specs().get(specId);
        for (PartitionField field : spec.fields()) {
            partitionFieldSourceIds.add(field.sourceId());
        }
    }
    Map<Integer, String> quotedNameById = SparkSchemaUtil.indexQuotedNameById(expectedSchema());
    return partitionFieldSourceIds.stream().filter(fieldId -> expectedSchema().findField(fieldId) != null).map(fieldId -> Spark3Util.toNamedReference(quotedNameById.get(fieldId))).toArray(NamedReference[]::new);
}
Also used : Statistics(org.apache.spark.sql.connector.read.Statistics) LoggerFactory(org.slf4j.LoggerFactory) SparkFilters(org.apache.iceberg.spark.SparkFilters) Spark3Util(org.apache.iceberg.spark.Spark3Util) TableScanUtil(org.apache.iceberg.util.TableScanUtil) PartitionField(org.apache.iceberg.PartitionField) Lists(org.apache.iceberg.relocated.com.google.common.collect.Lists) Expression(org.apache.iceberg.expressions.Expression) Map(java.util.Map) FileScanTask(org.apache.iceberg.FileScanTask) SparkSession(org.apache.spark.sql.SparkSession) NamedReference(org.apache.spark.sql.connector.expressions.NamedReference) Logger(org.slf4j.Logger) Binder(org.apache.iceberg.expressions.Binder) CloseableIterable(org.apache.iceberg.io.CloseableIterable) Table(org.apache.iceberg.Table) SnapshotUtil(org.apache.iceberg.util.SnapshotUtil) Maps(org.apache.iceberg.relocated.com.google.common.collect.Maps) Set(java.util.Set) IOException(java.io.IOException) TableScan(org.apache.iceberg.TableScan) SupportsRuntimeFiltering(org.apache.spark.sql.connector.read.SupportsRuntimeFiltering) Schema(org.apache.iceberg.Schema) SparkSchemaUtil(org.apache.iceberg.spark.SparkSchemaUtil) Collectors(java.util.stream.Collectors) CombinedScanTask(org.apache.iceberg.CombinedScanTask) UncheckedIOException(java.io.UncheckedIOException) Objects(java.util.Objects) ValidationException(org.apache.iceberg.exceptions.ValidationException) Evaluator(org.apache.iceberg.expressions.Evaluator) Sets(org.apache.iceberg.relocated.com.google.common.collect.Sets) List(java.util.List) PartitionSpec(org.apache.iceberg.PartitionSpec) Projections(org.apache.iceberg.expressions.Projections) Filter(org.apache.spark.sql.sources.Filter) Expressions(org.apache.iceberg.expressions.Expressions) SparkReadConf(org.apache.iceberg.spark.SparkReadConf) Collections(java.util.Collections) Snapshot(org.apache.iceberg.Snapshot) PartitionField(org.apache.iceberg.PartitionField) PartitionSpec(org.apache.iceberg.PartitionSpec)

Example 4 with NamedReference

use of org.apache.spark.sql.connector.expressions.NamedReference in project iceberg by apache.

the class SparkPositionDeltaOperation method rowId.

@Override
public NamedReference[] rowId() {
    NamedReference file = Expressions.column(MetadataColumns.FILE_PATH.name());
    NamedReference pos = Expressions.column(MetadataColumns.ROW_POSITION.name());
    return new NamedReference[] { file, pos };
}
Also used : NamedReference(org.apache.spark.sql.connector.expressions.NamedReference)

Example 5 with NamedReference

use of org.apache.spark.sql.connector.expressions.NamedReference in project iceberg by apache.

the class SparkPositionDeltaOperation method requiredMetadataAttributes.

@Override
public NamedReference[] requiredMetadataAttributes() {
    NamedReference specId = Expressions.column(MetadataColumns.SPEC_ID.name());
    NamedReference partition = Expressions.column(MetadataColumns.PARTITION_COLUMN_NAME);
    return new NamedReference[] { specId, partition };
}
Also used : NamedReference(org.apache.spark.sql.connector.expressions.NamedReference)

Aggregations

NamedReference (org.apache.spark.sql.connector.expressions.NamedReference)5 IOException (java.io.IOException)1 UncheckedIOException (java.io.UncheckedIOException)1 Collections (java.util.Collections)1 List (java.util.List)1 Map (java.util.Map)1 Objects (java.util.Objects)1 Set (java.util.Set)1 Collectors (java.util.stream.Collectors)1 CombinedScanTask (org.apache.iceberg.CombinedScanTask)1 FileScanTask (org.apache.iceberg.FileScanTask)1 PartitionField (org.apache.iceberg.PartitionField)1 PartitionSpec (org.apache.iceberg.PartitionSpec)1 Schema (org.apache.iceberg.Schema)1 Snapshot (org.apache.iceberg.Snapshot)1 Table (org.apache.iceberg.Table)1 TableScan (org.apache.iceberg.TableScan)1 ValidationException (org.apache.iceberg.exceptions.ValidationException)1 Binder (org.apache.iceberg.expressions.Binder)1 Evaluator (org.apache.iceberg.expressions.Evaluator)1