Search in sources :

Example 1 with StrictMetricsEvaluator

use of org.apache.iceberg.expressions.StrictMetricsEvaluator in project iceberg by apache.

the class SparkTable method canDeleteUsingMetadata.

// a metadata delete is possible iff matching files can be deleted entirely
private boolean canDeleteUsingMetadata(Expression deleteExpr) {
    boolean caseSensitive = Boolean.parseBoolean(sparkSession().conf().get("spark.sql.caseSensitive"));
    TableScan scan = table().newScan().filter(deleteExpr).caseSensitive(caseSensitive).includeColumnStats().ignoreResiduals();
    try (CloseableIterable<FileScanTask> tasks = scan.planFiles()) {
        Map<Integer, Evaluator> evaluators = Maps.newHashMap();
        StrictMetricsEvaluator metricsEvaluator = new StrictMetricsEvaluator(table().schema(), deleteExpr);
        return Iterables.all(tasks, task -> {
            DataFile file = task.file();
            PartitionSpec spec = task.spec();
            Evaluator evaluator = evaluators.computeIfAbsent(spec.specId(), specId -> new Evaluator(spec.partitionType(), Projections.strict(spec).project(deleteExpr)));
            return evaluator.eval(file.partition()) || metricsEvaluator.eval(file);
        });
    } catch (IOException ioe) {
        LOG.warn("Failed to close task iterable", ioe);
        return false;
    }
}
Also used : DataFile(org.apache.iceberg.DataFile) TableScan(org.apache.iceberg.TableScan) IOException(java.io.IOException) FileScanTask(org.apache.iceberg.FileScanTask) Evaluator(org.apache.iceberg.expressions.Evaluator) StrictMetricsEvaluator(org.apache.iceberg.expressions.StrictMetricsEvaluator) StrictMetricsEvaluator(org.apache.iceberg.expressions.StrictMetricsEvaluator) PartitionSpec(org.apache.iceberg.PartitionSpec)

Example 2 with StrictMetricsEvaluator

use of org.apache.iceberg.expressions.StrictMetricsEvaluator in project iceberg by apache.

the class BaseOverwriteFiles method validate.

@Override
protected void validate(TableMetadata base) {
    if (validateAddedFilesMatchOverwriteFilter) {
        PartitionSpec spec = dataSpec();
        Expression rowFilter = rowFilter();
        Expression inclusiveExpr = Projections.inclusive(spec).project(rowFilter);
        Evaluator inclusive = new Evaluator(spec.partitionType(), inclusiveExpr);
        Expression strictExpr = Projections.strict(spec).project(rowFilter);
        Evaluator strict = new Evaluator(spec.partitionType(), strictExpr);
        StrictMetricsEvaluator metrics = new StrictMetricsEvaluator(base.schema(), rowFilter, isCaseSensitive());
        for (DataFile file : addedFiles()) {
            // the real test is that the strict or metrics test matches the file, indicating that all
            // records in the file match the filter. inclusive is used to avoid testing the metrics,
            // which is more complicated
            ValidationException.check(inclusive.eval(file.partition()) && (strict.eval(file.partition()) || metrics.eval(file)), "Cannot append file with rows that do not match filter: %s: %s", rowFilter, file.path());
        }
    }
    if (validateNewDataFiles) {
        validateAddedDataFiles(base, startingSnapshotId, dataConflictDetectionFilter());
    }
    if (validateNewDeletes) {
        if (rowFilter() != Expressions.alwaysFalse()) {
            Expression filter = conflictDetectionFilter != null ? conflictDetectionFilter : rowFilter();
            validateNoNewDeleteFiles(base, startingSnapshotId, filter);
            validateDeletedDataFiles(base, startingSnapshotId, filter);
        }
        if (deletedDataFiles.size() > 0) {
            validateNoNewDeletesForDataFiles(base, startingSnapshotId, conflictDetectionFilter, deletedDataFiles);
        }
    }
}
Also used : Expression(org.apache.iceberg.expressions.Expression) Evaluator(org.apache.iceberg.expressions.Evaluator) StrictMetricsEvaluator(org.apache.iceberg.expressions.StrictMetricsEvaluator) StrictMetricsEvaluator(org.apache.iceberg.expressions.StrictMetricsEvaluator)

Example 3 with StrictMetricsEvaluator

use of org.apache.iceberg.expressions.StrictMetricsEvaluator in project iceberg by apache.

the class TestNotStartsWith method testStrictMetricsEvaluatorForNotStartsWith.

@Test
public void testStrictMetricsEvaluatorForNotStartsWith() {
    boolean shouldRead = new StrictMetricsEvaluator(SCHEMA, notStartsWith(COLUMN, "bbb")).eval(FILE_1);
    Assert.assertFalse("Should not match: strict metrics eval is always false for notStartsWith", shouldRead);
}
Also used : StrictMetricsEvaluator(org.apache.iceberg.expressions.StrictMetricsEvaluator) Test(org.junit.Test)

Aggregations

StrictMetricsEvaluator (org.apache.iceberg.expressions.StrictMetricsEvaluator)3 Evaluator (org.apache.iceberg.expressions.Evaluator)2 IOException (java.io.IOException)1 DataFile (org.apache.iceberg.DataFile)1 FileScanTask (org.apache.iceberg.FileScanTask)1 PartitionSpec (org.apache.iceberg.PartitionSpec)1 TableScan (org.apache.iceberg.TableScan)1 Expression (org.apache.iceberg.expressions.Expression)1 Test (org.junit.Test)1