Search in sources :

Example 1 with TableCatalog

use of org.apache.spark.sql.connector.catalog.TableCatalog in project iceberg by apache.

the class SparkSessionCatalog method stageCreateOrReplace.

@Override
public StagedTable stageCreateOrReplace(Identifier ident, StructType schema, Transform[] partitions, Map<String, String> properties) throws NoSuchNamespaceException {
    String provider = properties.get("provider");
    TableCatalog catalog;
    if (useIceberg(provider)) {
        if (asStagingCatalog != null) {
            return asStagingCatalog.stageCreateOrReplace(ident, schema, partitions, properties);
        }
        catalog = icebergCatalog;
    } else {
        catalog = getSessionCatalog();
    }
    // drop the table if it exists
    catalog.dropTable(ident);
    try {
        // create the table with the session catalog, then wrap it in a staged table that will delete to roll back
        Table sessionCatalogTable = catalog.createTable(ident, schema, partitions, properties);
        return new RollbackStagedTable(catalog, ident, sessionCatalogTable);
    } catch (TableAlreadyExistsException e) {
        // the table was deleted, but now already exists again. retry the replace.
        return stageCreateOrReplace(ident, schema, partitions, properties);
    }
}
Also used : TableAlreadyExistsException(org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException) StagedTable(org.apache.spark.sql.connector.catalog.StagedTable) Table(org.apache.spark.sql.connector.catalog.Table) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) StagingTableCatalog(org.apache.spark.sql.connector.catalog.StagingTableCatalog)

Example 2 with TableCatalog

use of org.apache.spark.sql.connector.catalog.TableCatalog in project iceberg by apache.

the class SparkSessionCatalog method stageCreate.

@Override
public StagedTable stageCreate(Identifier ident, StructType schema, Transform[] partitions, Map<String, String> properties) throws TableAlreadyExistsException, NoSuchNamespaceException {
    String provider = properties.get("provider");
    TableCatalog catalog;
    if (useIceberg(provider)) {
        if (asStagingCatalog != null) {
            return asStagingCatalog.stageCreate(ident, schema, partitions, properties);
        }
        catalog = icebergCatalog;
    } else {
        catalog = getSessionCatalog();
    }
    // create the table with the session catalog, then wrap it in a staged table that will delete to roll back
    Table table = catalog.createTable(ident, schema, partitions, properties);
    return new RollbackStagedTable(catalog, ident, table);
}
Also used : StagedTable(org.apache.spark.sql.connector.catalog.StagedTable) Table(org.apache.spark.sql.connector.catalog.Table) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) StagingTableCatalog(org.apache.spark.sql.connector.catalog.StagingTableCatalog)

Example 3 with TableCatalog

use of org.apache.spark.sql.connector.catalog.TableCatalog in project iceberg by apache.

the class IcebergSource method getTable.

@Override
public Table getTable(StructType schema, Transform[] partitioning, Map<String, String> options) {
    Spark3Util.CatalogAndIdentifier catalogIdentifier = catalogAndIdentifier(new CaseInsensitiveStringMap(options));
    CatalogPlugin catalog = catalogIdentifier.catalog();
    Identifier ident = catalogIdentifier.identifier();
    try {
        if (catalog instanceof TableCatalog) {
            return ((TableCatalog) catalog).loadTable(ident);
        }
    } catch (NoSuchTableException e) {
        // throwing an iceberg NoSuchTableException because the Spark one is typed and cant be thrown from this interface
        throw new org.apache.iceberg.exceptions.NoSuchTableException(e, "Cannot find table for %s.", ident);
    }
    // throwing an iceberg NoSuchTableException because the Spark one is typed and cant be thrown from this interface
    throw new org.apache.iceberg.exceptions.NoSuchTableException("Cannot find table for %s.", ident);
}
Also used : CatalogPlugin(org.apache.spark.sql.connector.catalog.CatalogPlugin) PathIdentifier(org.apache.iceberg.spark.PathIdentifier) Identifier(org.apache.spark.sql.connector.catalog.Identifier) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) NoSuchTableException(org.apache.spark.sql.catalyst.analysis.NoSuchTableException) CaseInsensitiveStringMap(org.apache.spark.sql.util.CaseInsensitiveStringMap) Spark3Util(org.apache.iceberg.spark.Spark3Util)

Example 4 with TableCatalog

use of org.apache.spark.sql.connector.catalog.TableCatalog in project iceberg by apache.

the class TestSparkTable method testTableEquality.

@Test
public void testTableEquality() throws NoSuchTableException {
    CatalogManager catalogManager = spark.sessionState().catalogManager();
    TableCatalog catalog = (TableCatalog) catalogManager.catalog(catalogName);
    Identifier identifier = Identifier.of(tableIdent.namespace().levels(), tableIdent.name());
    SparkTable table1 = (SparkTable) catalog.loadTable(identifier);
    SparkTable table2 = (SparkTable) catalog.loadTable(identifier);
    // different instances pointing to the same table must be equivalent
    Assert.assertNotSame("References must be different", table1, table2);
    Assert.assertEquals("Tables must be equivalent", table1, table2);
}
Also used : Identifier(org.apache.spark.sql.connector.catalog.Identifier) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) CatalogManager(org.apache.spark.sql.connector.catalog.CatalogManager) Test(org.junit.Test)

Example 5 with TableCatalog

use of org.apache.spark.sql.connector.catalog.TableCatalog in project OpenLineage by OpenLineage.

the class IcebergHandler method getDatasetIdentifier.

@Override
public DatasetIdentifier getDatasetIdentifier(SparkSession session, TableCatalog tableCatalog, Identifier identifier, Map<String, String> properties) {
    SparkCatalog sparkCatalog = (SparkCatalog) tableCatalog;
    String catalogName = sparkCatalog.name();
    String prefix = String.format("spark.sql.catalog.%s", catalogName);
    Map<String, String> conf = ScalaConversionUtils.<String, String>fromMap(session.conf().getAll());
    log.info(conf.toString());
    Map<String, String> catalogConf = conf.entrySet().stream().filter(x -> x.getKey().startsWith(prefix)).filter(x -> x.getKey().length() > prefix.length()).collect(Collectors.toMap(// handle dot after prefix
    x -> x.getKey().substring(prefix.length() + 1), Map.Entry::getValue));
    log.info(catalogConf.toString());
    if (catalogConf.isEmpty() || !catalogConf.containsKey("type")) {
        throw new UnsupportedCatalogException(catalogName);
    }
    log.info(catalogConf.get("type"));
    switch(catalogConf.get("type")) {
        case "hadoop":
            return getHadoopIdentifier(catalogConf, identifier.toString());
        case "hive":
            return getHiveIdentifier(session, catalogConf.get(CatalogProperties.URI), identifier.toString());
        default:
            throw new UnsupportedCatalogException(catalogConf.get("type"));
    }
}
Also used : SparkCatalog(org.apache.iceberg.spark.SparkCatalog) SneakyThrows(lombok.SneakyThrows) DatasetIdentifier(io.openlineage.spark.agent.util.DatasetIdentifier) PathUtils(io.openlineage.spark.agent.util.PathUtils) ScalaConversionUtils(io.openlineage.spark.agent.util.ScalaConversionUtils) Collectors(java.util.stream.Collectors) CatalogProperties(org.apache.iceberg.CatalogProperties) Slf4j(lombok.extern.slf4j.Slf4j) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) NoSuchTableException(org.apache.spark.sql.catalyst.analysis.NoSuchTableException) TableProviderFacet(io.openlineage.spark.agent.facets.TableProviderFacet) Map(java.util.Map) Optional(java.util.Optional) Path(org.apache.hadoop.fs.Path) URI(java.net.URI) Identifier(org.apache.spark.sql.connector.catalog.Identifier) SparkTable(org.apache.iceberg.spark.source.SparkTable) Nullable(javax.annotation.Nullable) SparkConfUtils(io.openlineage.spark.agent.util.SparkConfUtils) SparkSession(org.apache.spark.sql.SparkSession) SparkCatalog(org.apache.iceberg.spark.SparkCatalog) Map(java.util.Map)

Aggregations

TableCatalog (org.apache.spark.sql.connector.catalog.TableCatalog)19 Identifier (org.apache.spark.sql.connector.catalog.Identifier)11 DatasetIdentifier (io.openlineage.spark.agent.util.DatasetIdentifier)7 Table (org.apache.spark.sql.connector.catalog.Table)6 OpenLineage (io.openlineage.client.OpenLineage)3 TableProviderFacet (io.openlineage.spark.agent.facets.TableProviderFacet)3 PathUtils (io.openlineage.spark.agent.util.PathUtils)3 Map (java.util.Map)3 Optional (java.util.Optional)3 Slf4j (lombok.extern.slf4j.Slf4j)3 Path (org.apache.hadoop.fs.Path)3 SparkTable (org.apache.iceberg.spark.source.SparkTable)3 SparkSession (org.apache.spark.sql.SparkSession)3 NoSuchTableException (org.apache.spark.sql.catalyst.analysis.NoSuchTableException)3 StagedTable (org.apache.spark.sql.connector.catalog.StagedTable)3 StagingTableCatalog (org.apache.spark.sql.connector.catalog.StagingTableCatalog)3 Arrays (java.util.Arrays)2 HashMap (java.util.HashMap)2 SneakyThrows (lombok.SneakyThrows)2 TableIdentifier (org.apache.spark.sql.catalyst.TableIdentifier)2