Search in sources :

Example 1 with ExpireSnapshots

use of org.apache.iceberg.ExpireSnapshots in project incubator-gobblin by apache.

the class IcebergMetadataWriter method dropFiles.

/**
 * Deal with both regular file deletions manifested by GMCE(aggregation but no commit),
 * and expiring older snapshots(commit).
 */
protected void dropFiles(GobblinMetadataChangeEvent gmce, Map<String, Collection<HiveSpec>> oldSpecsMap, Table table, TableMetadata tableMetadata, TableIdentifier tid) throws IOException {
    PartitionSpec partitionSpec = table.spec();
    // Update DeleteFiles in tableMetadata: This is regular file deletion
    DeleteFiles deleteFiles = tableMetadata.getOrInitDeleteFiles();
    Set<DataFile> oldDataFiles = getIcebergDataFilesToBeDeleted(gmce, table, new HashMap<>(), oldSpecsMap, partitionSpec);
    oldDataFiles.forEach(deleteFiles::deleteFile);
    // Update ExpireSnapshots and commit the updates at once: This is for expiring snapshots that are
    // beyond look-back allowance for time-travel.
    parallelRunner.submitCallable(new Callable<Void>() {

        @Override
        public Void call() throws Exception {
            try {
                long olderThan = getExpireSnapshotTime();
                long start = System.currentTimeMillis();
                ExpireSnapshots expireSnapshots = table.expireSnapshots();
                final Table tmpTable = table;
                expireSnapshots.deleteWith(new Consumer<String>() {

                    @Override
                    public void accept(String file) {
                        if (file.startsWith(tmpTable.location())) {
                            tmpTable.io().deleteFile(file);
                        }
                    }
                }).expireOlderThan(olderThan).commit();
                // TODO: emit these metrics to Ingraphs, in addition to metrics for publishing new snapshots and other Iceberg metadata operations.
                log.info("Spent {} ms to expire snapshots older than {} ({}) in table {}", System.currentTimeMillis() - start, new DateTime(olderThan).toString(), olderThan, tid.toString());
            } catch (Exception e) {
                log.error(String.format("Fail to expire snapshots for table %s due to exception ", tid.toString()), e);
            }
            return null;
        }
    }, tid.toString());
}
Also used : Table(org.apache.iceberg.Table) DeleteFiles(org.apache.iceberg.DeleteFiles) PartitionSpec(org.apache.iceberg.PartitionSpec) AlreadyExistsException(org.apache.iceberg.exceptions.AlreadyExistsException) SchemaRegistryException(org.apache.gobblin.metrics.kafka.SchemaRegistryException) NoSuchTableException(org.apache.iceberg.exceptions.NoSuchTableException) IOException(java.io.IOException) ZonedDateTime(java.time.ZonedDateTime) DateTime(org.joda.time.DateTime) DataFile(org.apache.iceberg.DataFile) Consumer(java.util.function.Consumer) ExpireSnapshots(org.apache.iceberg.ExpireSnapshots)

Aggregations

IOException (java.io.IOException)1 ZonedDateTime (java.time.ZonedDateTime)1 Consumer (java.util.function.Consumer)1 SchemaRegistryException (org.apache.gobblin.metrics.kafka.SchemaRegistryException)1 DataFile (org.apache.iceberg.DataFile)1 DeleteFiles (org.apache.iceberg.DeleteFiles)1 ExpireSnapshots (org.apache.iceberg.ExpireSnapshots)1 PartitionSpec (org.apache.iceberg.PartitionSpec)1 Table (org.apache.iceberg.Table)1 AlreadyExistsException (org.apache.iceberg.exceptions.AlreadyExistsException)1 NoSuchTableException (org.apache.iceberg.exceptions.NoSuchTableException)1 DateTime (org.joda.time.DateTime)1