Search in sources :

Example 1 with StagedSparkTable

use of org.apache.iceberg.spark.source.StagedSparkTable in project iceberg by apache.

the class SparkCatalog method stageCreate.

@Override
public StagedTable stageCreate(Identifier ident, StructType schema, Transform[] transforms, Map<String, String> properties) throws TableAlreadyExistsException {
    Schema icebergSchema = SparkSchemaUtil.convert(schema, useTimestampsWithoutZone);
    try {
        Catalog.TableBuilder builder = newBuilder(ident, icebergSchema);
        Transaction transaction = builder.withPartitionSpec(Spark3Util.toPartitionSpec(icebergSchema, transforms)).withLocation(properties.get("location")).withProperties(Spark3Util.rebuildCreateProperties(properties)).createTransaction();
        return new StagedSparkTable(transaction);
    } catch (AlreadyExistsException e) {
        throw new TableAlreadyExistsException(ident);
    }
}
Also used : TableAlreadyExistsException(org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException) Transaction(org.apache.iceberg.Transaction) AlreadyExistsException(org.apache.iceberg.exceptions.AlreadyExistsException) NamespaceAlreadyExistsException(org.apache.spark.sql.catalyst.analysis.NamespaceAlreadyExistsException) TableAlreadyExistsException(org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException) Schema(org.apache.iceberg.Schema) StagedSparkTable(org.apache.iceberg.spark.source.StagedSparkTable) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) CachingCatalog(org.apache.iceberg.CachingCatalog) HadoopCatalog(org.apache.iceberg.hadoop.HadoopCatalog) Catalog(org.apache.iceberg.catalog.Catalog)

Example 2 with StagedSparkTable

use of org.apache.iceberg.spark.source.StagedSparkTable in project iceberg by apache.

the class SparkCatalog method stageCreateOrReplace.

@Override
public StagedTable stageCreateOrReplace(Identifier ident, StructType schema, Transform[] transforms, Map<String, String> properties) {
    Schema icebergSchema = SparkSchemaUtil.convert(schema, useTimestampsWithoutZone);
    Catalog.TableBuilder builder = newBuilder(ident, icebergSchema);
    Transaction transaction = builder.withPartitionSpec(Spark3Util.toPartitionSpec(icebergSchema, transforms)).withLocation(properties.get("location")).withProperties(Spark3Util.rebuildCreateProperties(properties)).createOrReplaceTransaction();
    return new StagedSparkTable(transaction);
}
Also used : Transaction(org.apache.iceberg.Transaction) Schema(org.apache.iceberg.Schema) StagedSparkTable(org.apache.iceberg.spark.source.StagedSparkTable) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) CachingCatalog(org.apache.iceberg.CachingCatalog) HadoopCatalog(org.apache.iceberg.hadoop.HadoopCatalog) Catalog(org.apache.iceberg.catalog.Catalog)

Example 3 with StagedSparkTable

use of org.apache.iceberg.spark.source.StagedSparkTable in project iceberg by apache.

the class BaseTableCreationSparkAction method stageDestTable.

protected StagedSparkTable stageDestTable() {
    try {
        Map<String, String> props = destTableProps();
        StructType schema = sourceTable.schema();
        Transform[] partitioning = sourceTable.partitioning();
        return (StagedSparkTable) destCatalog().stageCreate(destTableIdent(), schema, partitioning, props);
    } catch (org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException e) {
        throw new NoSuchNamespaceException("Cannot create table %s as the namespace does not exist", destTableIdent());
    } catch (org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException e) {
        throw new AlreadyExistsException("Cannot create table %s as it already exists", destTableIdent());
    }
}
Also used : StructType(org.apache.spark.sql.types.StructType) AlreadyExistsException(org.apache.iceberg.exceptions.AlreadyExistsException) NoSuchNamespaceException(org.apache.iceberg.exceptions.NoSuchNamespaceException) StagedSparkTable(org.apache.iceberg.spark.source.StagedSparkTable) Transform(org.apache.spark.sql.connector.expressions.Transform)

Example 4 with StagedSparkTable

use of org.apache.iceberg.spark.source.StagedSparkTable in project iceberg by apache.

the class BaseSnapshotTableSparkAction method doExecute.

private SnapshotTable.Result doExecute() {
    Preconditions.checkArgument(destCatalog() != null && destTableIdent() != null, "The destination catalog and identifier cannot be null. " + "Make sure to configure the action with a valid destination table identifier via the `as` method.");
    LOG.info("Staging a new Iceberg table {} as a snapshot of {}", destTableIdent(), sourceTableIdent());
    StagedSparkTable stagedTable = stageDestTable();
    Table icebergTable = stagedTable.table();
    // TODO: Check the dest table location does not overlap with the source table location
    boolean threw = true;
    try {
        LOG.info("Ensuring {} has a valid name mapping", destTableIdent());
        ensureNameMappingPresent(icebergTable);
        TableIdentifier v1TableIdent = v1SourceTable().identifier();
        String stagingLocation = getMetadataLocation(icebergTable);
        LOG.info("Generating Iceberg metadata for {} in {}", destTableIdent(), stagingLocation);
        SparkTableUtil.importSparkTable(spark(), v1TableIdent, icebergTable, stagingLocation);
        LOG.info("Committing staged changes to {}", destTableIdent());
        stagedTable.commitStagedChanges();
        threw = false;
    } finally {
        if (threw) {
            LOG.error("Error when populating the staged table with metadata, aborting changes");
            try {
                stagedTable.abortStagedChanges();
            } catch (Exception abortException) {
                LOG.error("Cannot abort staged changes", abortException);
            }
        }
    }
    Snapshot snapshot = icebergTable.currentSnapshot();
    long importedDataFilesCount = Long.parseLong(snapshot.summary().get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
    LOG.info("Successfully loaded Iceberg metadata for {} files to {}", importedDataFilesCount, destTableIdent());
    return new BaseSnapshotTableActionResult(importedDataFilesCount);
}
Also used : TableIdentifier(org.apache.spark.sql.catalyst.TableIdentifier) Snapshot(org.apache.iceberg.Snapshot) Table(org.apache.iceberg.Table) StagedSparkTable(org.apache.iceberg.spark.source.StagedSparkTable) SnapshotTable(org.apache.iceberg.actions.SnapshotTable) StagedSparkTable(org.apache.iceberg.spark.source.StagedSparkTable) BaseSnapshotTableActionResult(org.apache.iceberg.actions.BaseSnapshotTableActionResult)

Example 5 with StagedSparkTable

use of org.apache.iceberg.spark.source.StagedSparkTable in project iceberg by apache.

the class BaseMigrateTableSparkAction method doExecute.

private MigrateTable.Result doExecute() {
    LOG.info("Starting the migration of {} to Iceberg", sourceTableIdent());
    // move the source table to a new name, halting all modifications and allowing us to stage
    // the creation of a new Iceberg table in its place
    renameAndBackupSourceTable();
    StagedSparkTable stagedTable = null;
    Table icebergTable;
    boolean threw = true;
    try {
        LOG.info("Staging a new Iceberg table {}", destTableIdent());
        stagedTable = stageDestTable();
        icebergTable = stagedTable.table();
        LOG.info("Ensuring {} has a valid name mapping", destTableIdent());
        ensureNameMappingPresent(icebergTable);
        Some<String> backupNamespace = Some.apply(backupIdent.namespace()[0]);
        TableIdentifier v1BackupIdent = new TableIdentifier(backupIdent.name(), backupNamespace);
        String stagingLocation = getMetadataLocation(icebergTable);
        LOG.info("Generating Iceberg metadata for {} in {}", destTableIdent(), stagingLocation);
        SparkTableUtil.importSparkTable(spark(), v1BackupIdent, icebergTable, stagingLocation);
        LOG.info("Committing staged changes to {}", destTableIdent());
        stagedTable.commitStagedChanges();
        threw = false;
    } finally {
        if (threw) {
            LOG.error("Failed to perform the migration, aborting table creation and restoring the original table");
            restoreSourceTable();
            if (stagedTable != null) {
                try {
                    stagedTable.abortStagedChanges();
                } catch (Exception abortException) {
                    LOG.error("Cannot abort staged changes", abortException);
                }
            }
        }
    }
    Snapshot snapshot = icebergTable.currentSnapshot();
    long migratedDataFilesCount = Long.parseLong(snapshot.summary().get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
    LOG.info("Successfully loaded Iceberg metadata for {} files to {}", migratedDataFilesCount, destTableIdent());
    return new BaseMigrateTableActionResult(migratedDataFilesCount);
}
Also used : TableIdentifier(org.apache.spark.sql.catalyst.TableIdentifier) Snapshot(org.apache.iceberg.Snapshot) Table(org.apache.iceberg.Table) MigrateTable(org.apache.iceberg.actions.MigrateTable) StagedSparkTable(org.apache.iceberg.spark.source.StagedSparkTable) BaseMigrateTableActionResult(org.apache.iceberg.actions.BaseMigrateTableActionResult) StagedSparkTable(org.apache.iceberg.spark.source.StagedSparkTable) AlreadyExistsException(org.apache.iceberg.exceptions.AlreadyExistsException) NoSuchTableException(org.apache.iceberg.exceptions.NoSuchTableException)

Aggregations

StagedSparkTable (org.apache.iceberg.spark.source.StagedSparkTable)6 CachingCatalog (org.apache.iceberg.CachingCatalog)3 Schema (org.apache.iceberg.Schema)3 Transaction (org.apache.iceberg.Transaction)3 Catalog (org.apache.iceberg.catalog.Catalog)3 AlreadyExistsException (org.apache.iceberg.exceptions.AlreadyExistsException)3 HadoopCatalog (org.apache.iceberg.hadoop.HadoopCatalog)3 TableCatalog (org.apache.spark.sql.connector.catalog.TableCatalog)3 Snapshot (org.apache.iceberg.Snapshot)2 Table (org.apache.iceberg.Table)2 TableIdentifier (org.apache.spark.sql.catalyst.TableIdentifier)2 BaseMigrateTableActionResult (org.apache.iceberg.actions.BaseMigrateTableActionResult)1 BaseSnapshotTableActionResult (org.apache.iceberg.actions.BaseSnapshotTableActionResult)1 MigrateTable (org.apache.iceberg.actions.MigrateTable)1 SnapshotTable (org.apache.iceberg.actions.SnapshotTable)1 NoSuchNamespaceException (org.apache.iceberg.exceptions.NoSuchNamespaceException)1 NoSuchTableException (org.apache.iceberg.exceptions.NoSuchTableException)1 NamespaceAlreadyExistsException (org.apache.spark.sql.catalyst.analysis.NamespaceAlreadyExistsException)1 NoSuchTableException (org.apache.spark.sql.catalyst.analysis.NoSuchTableException)1 TableAlreadyExistsException (org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException)1