Search in sources :

Example 1 with SparkSessionCatalog

use of org.apache.iceberg.spark.SparkSessionCatalog in project iceberg by apache.

the class IcebergSource method catalogAndIdentifier.

private Spark3Util.CatalogAndIdentifier catalogAndIdentifier(CaseInsensitiveStringMap options) {
    Preconditions.checkArgument(options.containsKey("path"), "Cannot open table: path is not set");
    SparkSession spark = SparkSession.active();
    setupDefaultSparkCatalog(spark);
    String path = options.get("path");
    Long snapshotId = propertyAsLong(options, SparkReadOptions.SNAPSHOT_ID);
    Long asOfTimestamp = propertyAsLong(options, SparkReadOptions.AS_OF_TIMESTAMP);
    Preconditions.checkArgument(asOfTimestamp == null || snapshotId == null, "Cannot specify both snapshot-id (%s) and as-of-timestamp (%s)", snapshotId, asOfTimestamp);
    String selector = null;
    if (snapshotId != null) {
        selector = SNAPSHOT_ID + snapshotId;
    }
    if (asOfTimestamp != null) {
        selector = AT_TIMESTAMP + asOfTimestamp;
    }
    CatalogManager catalogManager = spark.sessionState().catalogManager();
    if (path.contains("/")) {
        // contains a path. Return iceberg default catalog and a PathIdentifier
        String newPath = (selector == null) ? path : path + "#" + selector;
        return new Spark3Util.CatalogAndIdentifier(catalogManager.catalog(DEFAULT_CATALOG_NAME), new PathIdentifier(newPath));
    }
    final Spark3Util.CatalogAndIdentifier catalogAndIdentifier = Spark3Util.catalogAndIdentifier("path or identifier", spark, path);
    Identifier ident = identifierWithSelector(catalogAndIdentifier.identifier(), selector);
    if (catalogAndIdentifier.catalog().name().equals("spark_catalog") && !(catalogAndIdentifier.catalog() instanceof SparkSessionCatalog)) {
        // catalog is a session catalog but does not support Iceberg. Use Iceberg instead.
        return new Spark3Util.CatalogAndIdentifier(catalogManager.catalog(DEFAULT_CATALOG_NAME), ident);
    } else {
        return new Spark3Util.CatalogAndIdentifier(catalogAndIdentifier.catalog(), ident);
    }
}
Also used : SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) SparkSession(org.apache.spark.sql.SparkSession) PathIdentifier(org.apache.iceberg.spark.PathIdentifier) Identifier(org.apache.spark.sql.connector.catalog.Identifier) PathIdentifier(org.apache.iceberg.spark.PathIdentifier) CatalogManager(org.apache.spark.sql.connector.catalog.CatalogManager) Spark3Util(org.apache.iceberg.spark.Spark3Util)

Example 2 with SparkSessionCatalog

use of org.apache.iceberg.spark.SparkSessionCatalog in project iceberg by apache.

the class TestRemoveOrphanFilesAction3 method testSparkSessionCatalogHiveTable.

@Test
public void testSparkSessionCatalogHiveTable() throws Exception {
    spark.conf().set("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog");
    spark.conf().set("spark.sql.catalog.spark_catalog.type", "hive");
    SparkSessionCatalog cat = (SparkSessionCatalog) spark.sessionState().catalogManager().v2SessionCatalog();
    String[] database = { "default" };
    Identifier id = Identifier.of(database, "sessioncattest");
    Map<String, String> options = Maps.newHashMap();
    Transform[] transforms = {};
    cat.dropTable(id);
    cat.createTable(id, SparkSchemaUtil.convert(SCHEMA), transforms, options);
    SparkTable table = (SparkTable) cat.loadTable(id);
    spark.sql("INSERT INTO default.sessioncattest VALUES (1,1,1)");
    String location = table.table().location().replaceFirst("file:", "");
    new File(location + "/data/trashfile").createNewFile();
    DeleteOrphanFiles.Result results = SparkActions.get().deleteOrphanFiles(table.table()).olderThan(System.currentTimeMillis() + 1000).execute();
    Assert.assertTrue("trash file should be removed", StreamSupport.stream(results.orphanFileLocations().spliterator(), false).anyMatch(file -> file.contains("file:" + location + "/data/trashfile")));
}
Also used : SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) SparkCatalog(org.apache.iceberg.spark.SparkCatalog) Maps(org.apache.iceberg.relocated.com.google.common.collect.Maps) Test(org.junit.Test) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) SparkSchemaUtil(org.apache.iceberg.spark.SparkSchemaUtil) File(java.io.File) SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) Map(java.util.Map) After(org.junit.After) Transform(org.apache.spark.sql.connector.expressions.Transform) StreamSupport(java.util.stream.StreamSupport) Identifier(org.apache.spark.sql.connector.catalog.Identifier) Assert(org.junit.Assert) SparkTable(org.apache.iceberg.spark.source.SparkTable) Identifier(org.apache.spark.sql.connector.catalog.Identifier) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) Transform(org.apache.spark.sql.connector.expressions.Transform) SparkTable(org.apache.iceberg.spark.source.SparkTable) File(java.io.File) Test(org.junit.Test)

Example 3 with SparkSessionCatalog

use of org.apache.iceberg.spark.SparkSessionCatalog in project iceberg by apache.

the class TestRemoveOrphanFilesAction3 method testSparkSessionCatalogHadoopTable.

@Test
public void testSparkSessionCatalogHadoopTable() throws Exception {
    spark.conf().set("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog");
    spark.conf().set("spark.sql.catalog.spark_catalog.type", "hadoop");
    spark.conf().set("spark.sql.catalog.spark_catalog.warehouse", tableLocation);
    SparkSessionCatalog cat = (SparkSessionCatalog) spark.sessionState().catalogManager().v2SessionCatalog();
    String[] database = { "default" };
    Identifier id = Identifier.of(database, "table");
    Map<String, String> options = Maps.newHashMap();
    Transform[] transforms = {};
    cat.createTable(id, SparkSchemaUtil.convert(SCHEMA), transforms, options);
    SparkTable table = (SparkTable) cat.loadTable(id);
    spark.sql("INSERT INTO default.table VALUES (1,1,1)");
    String location = table.table().location().replaceFirst("file:", "");
    new File(location + "/data/trashfile").createNewFile();
    DeleteOrphanFiles.Result results = SparkActions.get().deleteOrphanFiles(table.table()).olderThan(System.currentTimeMillis() + 1000).execute();
    Assert.assertTrue("trash file should be removed", StreamSupport.stream(results.orphanFileLocations().spliterator(), false).anyMatch(file -> file.contains("file:" + location + "/data/trashfile")));
}
Also used : SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) SparkCatalog(org.apache.iceberg.spark.SparkCatalog) Maps(org.apache.iceberg.relocated.com.google.common.collect.Maps) Test(org.junit.Test) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) SparkSchemaUtil(org.apache.iceberg.spark.SparkSchemaUtil) File(java.io.File) SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) Map(java.util.Map) After(org.junit.After) Transform(org.apache.spark.sql.connector.expressions.Transform) StreamSupport(java.util.stream.StreamSupport) Identifier(org.apache.spark.sql.connector.catalog.Identifier) Assert(org.junit.Assert) SparkTable(org.apache.iceberg.spark.source.SparkTable) Identifier(org.apache.spark.sql.connector.catalog.Identifier) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) Transform(org.apache.spark.sql.connector.expressions.Transform) SparkTable(org.apache.iceberg.spark.source.SparkTable) File(java.io.File) Test(org.junit.Test)

Aggregations

SparkSessionCatalog (org.apache.iceberg.spark.SparkSessionCatalog)3 Identifier (org.apache.spark.sql.connector.catalog.Identifier)3 File (java.io.File)2 Map (java.util.Map)2 StreamSupport (java.util.stream.StreamSupport)2 DeleteOrphanFiles (org.apache.iceberg.actions.DeleteOrphanFiles)2 Maps (org.apache.iceberg.relocated.com.google.common.collect.Maps)2 SparkCatalog (org.apache.iceberg.spark.SparkCatalog)2 SparkSchemaUtil (org.apache.iceberg.spark.SparkSchemaUtil)2 SparkTable (org.apache.iceberg.spark.source.SparkTable)2 Transform (org.apache.spark.sql.connector.expressions.Transform)2 After (org.junit.After)2 Assert (org.junit.Assert)2 Test (org.junit.Test)2 PathIdentifier (org.apache.iceberg.spark.PathIdentifier)1 Spark3Util (org.apache.iceberg.spark.Spark3Util)1 SparkSession (org.apache.spark.sql.SparkSession)1 CatalogManager (org.apache.spark.sql.connector.catalog.CatalogManager)1