Search in sources :

Example 11 with SparkTable

use of org.apache.iceberg.spark.source.SparkTable in project iceberg by apache.

the class TestRemoveOrphanFilesAction3 method testSparkCatalogNamedHiveTable.

@Test
public void testSparkCatalogNamedHiveTable() throws Exception {
    spark.conf().set("spark.sql.catalog.hive", "org.apache.iceberg.spark.SparkCatalog");
    spark.conf().set("spark.sql.catalog.hive.type", "hadoop");
    spark.conf().set("spark.sql.catalog.hive.warehouse", tableLocation);
    SparkCatalog cat = (SparkCatalog) spark.sessionState().catalogManager().catalog("hive");
    String[] database = { "default" };
    Identifier id = Identifier.of(database, "table");
    Map<String, String> options = Maps.newHashMap();
    Transform[] transforms = {};
    cat.createTable(id, SparkSchemaUtil.convert(SCHEMA), transforms, options);
    SparkTable table = cat.loadTable(id);
    spark.sql("INSERT INTO hive.default.table VALUES (1,1,1)");
    String location = table.table().location().replaceFirst("file:", "");
    new File(location + "/data/trashfile").createNewFile();
    DeleteOrphanFiles.Result results = SparkActions.get().deleteOrphanFiles(table.table()).olderThan(System.currentTimeMillis() + 1000).execute();
    Assert.assertTrue("trash file should be removed", StreamSupport.stream(results.orphanFileLocations().spliterator(), false).anyMatch(file -> file.contains("file:" + location + "/data/trashfile")));
}
Also used : SparkCatalog(org.apache.iceberg.spark.SparkCatalog) Maps(org.apache.iceberg.relocated.com.google.common.collect.Maps) Test(org.junit.Test) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) SparkSchemaUtil(org.apache.iceberg.spark.SparkSchemaUtil) File(java.io.File) SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) Map(java.util.Map) After(org.junit.After) Transform(org.apache.spark.sql.connector.expressions.Transform) StreamSupport(java.util.stream.StreamSupport) Identifier(org.apache.spark.sql.connector.catalog.Identifier) Assert(org.junit.Assert) SparkTable(org.apache.iceberg.spark.source.SparkTable) Identifier(org.apache.spark.sql.connector.catalog.Identifier) SparkCatalog(org.apache.iceberg.spark.SparkCatalog) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) Transform(org.apache.spark.sql.connector.expressions.Transform) SparkTable(org.apache.iceberg.spark.source.SparkTable) File(java.io.File) Test(org.junit.Test)

Example 12 with SparkTable

use of org.apache.iceberg.spark.source.SparkTable in project iceberg by apache.

the class TestRemoveOrphanFilesAction3 method testSparkSessionCatalogHiveTable.

@Test
public void testSparkSessionCatalogHiveTable() throws Exception {
    spark.conf().set("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog");
    spark.conf().set("spark.sql.catalog.spark_catalog.type", "hive");
    SparkSessionCatalog cat = (SparkSessionCatalog) spark.sessionState().catalogManager().v2SessionCatalog();
    String[] database = { "default" };
    Identifier id = Identifier.of(database, "sessioncattest");
    Map<String, String> options = Maps.newHashMap();
    Transform[] transforms = {};
    cat.dropTable(id);
    cat.createTable(id, SparkSchemaUtil.convert(SCHEMA), transforms, options);
    SparkTable table = (SparkTable) cat.loadTable(id);
    spark.sql("INSERT INTO default.sessioncattest VALUES (1,1,1)");
    String location = table.table().location().replaceFirst("file:", "");
    new File(location + "/data/trashfile").createNewFile();
    DeleteOrphanFiles.Result results = SparkActions.get().deleteOrphanFiles(table.table()).olderThan(System.currentTimeMillis() + 1000).execute();
    Assert.assertTrue("trash file should be removed", StreamSupport.stream(results.orphanFileLocations().spliterator(), false).anyMatch(file -> file.contains("file:" + location + "/data/trashfile")));
}
Also used : SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) SparkCatalog(org.apache.iceberg.spark.SparkCatalog) Maps(org.apache.iceberg.relocated.com.google.common.collect.Maps) Test(org.junit.Test) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) SparkSchemaUtil(org.apache.iceberg.spark.SparkSchemaUtil) File(java.io.File) SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) Map(java.util.Map) After(org.junit.After) Transform(org.apache.spark.sql.connector.expressions.Transform) StreamSupport(java.util.stream.StreamSupport) Identifier(org.apache.spark.sql.connector.catalog.Identifier) Assert(org.junit.Assert) SparkTable(org.apache.iceberg.spark.source.SparkTable) Identifier(org.apache.spark.sql.connector.catalog.Identifier) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) Transform(org.apache.spark.sql.connector.expressions.Transform) SparkTable(org.apache.iceberg.spark.source.SparkTable) File(java.io.File) Test(org.junit.Test)

Example 13 with SparkTable

use of org.apache.iceberg.spark.source.SparkTable in project OpenLineage by OpenLineage.

the class IcebergHandler method getDatasetVersion.

@SneakyThrows
public Optional<String> getDatasetVersion(TableCatalog tableCatalog, Identifier identifier, Map<String, String> properties) {
    SparkCatalog sparkCatalog = (SparkCatalog) tableCatalog;
    SparkTable table;
    try {
        table = sparkCatalog.loadTable(identifier);
    } catch (NoSuchTableException ex) {
        return Optional.empty();
    }
    if (table.table() != null && table.table().currentSnapshot() != null) {
        return Optional.of(Long.toString(table.table().currentSnapshot().snapshotId()));
    }
    return Optional.empty();
}
Also used : SparkCatalog(org.apache.iceberg.spark.SparkCatalog) NoSuchTableException(org.apache.spark.sql.catalyst.analysis.NoSuchTableException) SparkTable(org.apache.iceberg.spark.source.SparkTable) SneakyThrows(lombok.SneakyThrows)

Example 14 with SparkTable

use of org.apache.iceberg.spark.source.SparkTable in project iceberg by apache.

the class TestAlterTablePartitionFields method sparkTable.

private SparkTable sparkTable() throws Exception {
    validationCatalog.loadTable(tableIdent).refresh();
    CatalogManager catalogManager = spark.sessionState().catalogManager();
    TableCatalog catalog = (TableCatalog) catalogManager.catalog(catalogName);
    Identifier identifier = Identifier.of(tableIdent.namespace().levels(), tableIdent.name());
    return (SparkTable) catalog.loadTable(identifier);
}
Also used : Identifier(org.apache.spark.sql.connector.catalog.Identifier) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) SparkTable(org.apache.iceberg.spark.source.SparkTable) CatalogManager(org.apache.spark.sql.connector.catalog.CatalogManager)

Example 15 with SparkTable

use of org.apache.iceberg.spark.source.SparkTable in project iceberg by apache.

the class Spark3Util method loadIcebergTable.

/**
 * Returns an Iceberg Table by its name from a Spark V2 Catalog. If cache is enabled in {@link SparkCatalog},
 * the {@link TableOperations} of the table may be stale, please refresh the table to get the latest one.
 *
 * @param spark SparkSession used for looking up catalog references and tables
 * @param name  The multipart identifier of the Iceberg table
 * @return an Iceberg table
 */
public static org.apache.iceberg.Table loadIcebergTable(SparkSession spark, String name) throws ParseException, NoSuchTableException {
    CatalogAndIdentifier catalogAndIdentifier = catalogAndIdentifier(spark, name);
    TableCatalog catalog = asTableCatalog(catalogAndIdentifier.catalog);
    Table sparkTable = catalog.loadTable(catalogAndIdentifier.identifier);
    return toIcebergTable(sparkTable);
}
Also used : SparkTable(org.apache.iceberg.spark.source.SparkTable) Table(org.apache.spark.sql.connector.catalog.Table) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog)

Aggregations

SparkTable (org.apache.iceberg.spark.source.SparkTable)23 Test (org.junit.Test)12 Identifier (org.apache.spark.sql.connector.catalog.Identifier)8 File (java.io.File)7 SparkCatalog (org.apache.iceberg.spark.SparkCatalog)7 Map (java.util.Map)6 Table (org.apache.iceberg.Table)6 StreamSupport (java.util.stream.StreamSupport)5 DeleteOrphanFiles (org.apache.iceberg.actions.DeleteOrphanFiles)5 Maps (org.apache.iceberg.relocated.com.google.common.collect.Maps)5 SparkSchemaUtil (org.apache.iceberg.spark.SparkSchemaUtil)5 SparkSessionCatalog (org.apache.iceberg.spark.SparkSessionCatalog)5 Transform (org.apache.spark.sql.connector.expressions.Transform)5 After (org.junit.After)5 Assert (org.junit.Assert)5 Schema (org.apache.iceberg.Schema)4 MigrateTable (org.apache.iceberg.actions.MigrateTable)3 SnapshotTable (org.apache.iceberg.actions.SnapshotTable)3 NoSuchTableException (org.apache.spark.sql.catalyst.analysis.NoSuchTableException)3 CatalogTable (org.apache.spark.sql.catalyst.catalog.CatalogTable)3