Search in sources :

Example 16 with SparkTable

use of org.apache.iceberg.spark.source.SparkTable in project iceberg by apache.

the class AncestorsOfProcedure method call.

@Override
public InternalRow[] call(InternalRow args) {
    Identifier tableIdent = toIdentifier(args.getString(0), PARAMETERS[0].name());
    Long toSnapshotId = args.isNullAt(1) ? null : args.getLong(1);
    SparkTable sparkTable = loadSparkTable(tableIdent);
    Table icebergTable = sparkTable.table();
    if (toSnapshotId == null) {
        toSnapshotId = icebergTable.currentSnapshot() != null ? icebergTable.currentSnapshot().snapshotId() : -1;
    }
    List<Long> snapshotIds = Lists.newArrayList(SnapshotUtil.ancestorIdsBetween(toSnapshotId, null, icebergTable::snapshot));
    return toOutputRow(icebergTable, snapshotIds);
}
Also used : Identifier(org.apache.spark.sql.connector.catalog.Identifier) Table(org.apache.iceberg.Table) SparkTable(org.apache.iceberg.spark.source.SparkTable) SparkTable(org.apache.iceberg.spark.source.SparkTable)

Example 17 with SparkTable

use of org.apache.iceberg.spark.source.SparkTable in project iceberg by apache.

the class BaseProcedure method execute.

private <T> T execute(Identifier ident, boolean refreshSparkCache, Function<org.apache.iceberg.Table, T> func) {
    SparkTable sparkTable = loadSparkTable(ident);
    org.apache.iceberg.Table icebergTable = sparkTable.table();
    T result = func.apply(icebergTable);
    if (refreshSparkCache) {
        refreshSparkCache(ident, sparkTable);
    }
    return result;
}
Also used : SparkTable(org.apache.iceberg.spark.source.SparkTable)

Example 18 with SparkTable

use of org.apache.iceberg.spark.source.SparkTable in project iceberg by apache.

the class TestRemoveOrphanFilesAction3 method testSparkCatalogNamedHadoopTable.

@Test
public void testSparkCatalogNamedHadoopTable() throws Exception {
    spark.conf().set("spark.sql.catalog.hadoop", "org.apache.iceberg.spark.SparkCatalog");
    spark.conf().set("spark.sql.catalog.hadoop.type", "hadoop");
    spark.conf().set("spark.sql.catalog.hadoop.warehouse", tableLocation);
    SparkCatalog cat = (SparkCatalog) spark.sessionState().catalogManager().catalog("hadoop");
    String[] database = { "default" };
    Identifier id = Identifier.of(database, "table");
    Map<String, String> options = Maps.newHashMap();
    Transform[] transforms = {};
    cat.createTable(id, SparkSchemaUtil.convert(SCHEMA), transforms, options);
    SparkTable table = cat.loadTable(id);
    spark.sql("INSERT INTO hadoop.default.table VALUES (1,1,1)");
    String location = table.table().location().replaceFirst("file:", "");
    new File(location + "/data/trashfile").createNewFile();
    DeleteOrphanFiles.Result results = SparkActions.get().deleteOrphanFiles(table.table()).olderThan(System.currentTimeMillis() + 1000).execute();
    Assert.assertTrue("trash file should be removed", StreamSupport.stream(results.orphanFileLocations().spliterator(), false).anyMatch(file -> file.contains("file:" + location + "/data/trashfile")));
}
Also used : SparkCatalog(org.apache.iceberg.spark.SparkCatalog) Maps(org.apache.iceberg.relocated.com.google.common.collect.Maps) Test(org.junit.Test) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) SparkSchemaUtil(org.apache.iceberg.spark.SparkSchemaUtil) File(java.io.File) SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) Map(java.util.Map) After(org.junit.After) Transform(org.apache.spark.sql.connector.expressions.Transform) StreamSupport(java.util.stream.StreamSupport) Identifier(org.apache.spark.sql.connector.catalog.Identifier) Assert(org.junit.Assert) SparkTable(org.apache.iceberg.spark.source.SparkTable) Identifier(org.apache.spark.sql.connector.catalog.Identifier) SparkCatalog(org.apache.iceberg.spark.SparkCatalog) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) Transform(org.apache.spark.sql.connector.expressions.Transform) SparkTable(org.apache.iceberg.spark.source.SparkTable) File(java.io.File) Test(org.junit.Test)

Example 19 with SparkTable

use of org.apache.iceberg.spark.source.SparkTable in project iceberg by apache.

the class TestRemoveOrphanFilesAction3 method testSparkCatalogTable.

@Test
public void testSparkCatalogTable() throws Exception {
    spark.conf().set("spark.sql.catalog.mycat", "org.apache.iceberg.spark.SparkCatalog");
    spark.conf().set("spark.sql.catalog.mycat.type", "hadoop");
    spark.conf().set("spark.sql.catalog.mycat.warehouse", tableLocation);
    SparkCatalog cat = (SparkCatalog) spark.sessionState().catalogManager().catalog("mycat");
    String[] database = { "default" };
    Identifier id = Identifier.of(database, "table");
    Map<String, String> options = Maps.newHashMap();
    Transform[] transforms = {};
    cat.createTable(id, SparkSchemaUtil.convert(SCHEMA), transforms, options);
    SparkTable table = cat.loadTable(id);
    spark.sql("INSERT INTO mycat.default.table VALUES (1,1,1)");
    String location = table.table().location().replaceFirst("file:", "");
    new File(location + "/data/trashfile").createNewFile();
    DeleteOrphanFiles.Result results = SparkActions.get().deleteOrphanFiles(table.table()).olderThan(System.currentTimeMillis() + 1000).execute();
    Assert.assertTrue("trash file should be removed", StreamSupport.stream(results.orphanFileLocations().spliterator(), false).anyMatch(file -> file.contains("file:" + location + "/data/trashfile")));
}
Also used : SparkCatalog(org.apache.iceberg.spark.SparkCatalog) Maps(org.apache.iceberg.relocated.com.google.common.collect.Maps) Test(org.junit.Test) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) SparkSchemaUtil(org.apache.iceberg.spark.SparkSchemaUtil) File(java.io.File) SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) Map(java.util.Map) After(org.junit.After) Transform(org.apache.spark.sql.connector.expressions.Transform) StreamSupport(java.util.stream.StreamSupport) Identifier(org.apache.spark.sql.connector.catalog.Identifier) Assert(org.junit.Assert) SparkTable(org.apache.iceberg.spark.source.SparkTable) Identifier(org.apache.spark.sql.connector.catalog.Identifier) SparkCatalog(org.apache.iceberg.spark.SparkCatalog) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) Transform(org.apache.spark.sql.connector.expressions.Transform) SparkTable(org.apache.iceberg.spark.source.SparkTable) File(java.io.File) Test(org.junit.Test)

Example 20 with SparkTable

use of org.apache.iceberg.spark.source.SparkTable in project iceberg by apache.

the class TestRemoveOrphanFilesAction3 method testSparkSessionCatalogHadoopTable.

@Test
public void testSparkSessionCatalogHadoopTable() throws Exception {
    spark.conf().set("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog");
    spark.conf().set("spark.sql.catalog.spark_catalog.type", "hadoop");
    spark.conf().set("spark.sql.catalog.spark_catalog.warehouse", tableLocation);
    SparkSessionCatalog cat = (SparkSessionCatalog) spark.sessionState().catalogManager().v2SessionCatalog();
    String[] database = { "default" };
    Identifier id = Identifier.of(database, "table");
    Map<String, String> options = Maps.newHashMap();
    Transform[] transforms = {};
    cat.createTable(id, SparkSchemaUtil.convert(SCHEMA), transforms, options);
    SparkTable table = (SparkTable) cat.loadTable(id);
    spark.sql("INSERT INTO default.table VALUES (1,1,1)");
    String location = table.table().location().replaceFirst("file:", "");
    new File(location + "/data/trashfile").createNewFile();
    DeleteOrphanFiles.Result results = SparkActions.get().deleteOrphanFiles(table.table()).olderThan(System.currentTimeMillis() + 1000).execute();
    Assert.assertTrue("trash file should be removed", StreamSupport.stream(results.orphanFileLocations().spliterator(), false).anyMatch(file -> file.contains("file:" + location + "/data/trashfile")));
}
Also used : SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) SparkCatalog(org.apache.iceberg.spark.SparkCatalog) Maps(org.apache.iceberg.relocated.com.google.common.collect.Maps) Test(org.junit.Test) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) SparkSchemaUtil(org.apache.iceberg.spark.SparkSchemaUtil) File(java.io.File) SparkSessionCatalog(org.apache.iceberg.spark.SparkSessionCatalog) Map(java.util.Map) After(org.junit.After) Transform(org.apache.spark.sql.connector.expressions.Transform) StreamSupport(java.util.stream.StreamSupport) Identifier(org.apache.spark.sql.connector.catalog.Identifier) Assert(org.junit.Assert) SparkTable(org.apache.iceberg.spark.source.SparkTable) Identifier(org.apache.spark.sql.connector.catalog.Identifier) DeleteOrphanFiles(org.apache.iceberg.actions.DeleteOrphanFiles) Transform(org.apache.spark.sql.connector.expressions.Transform) SparkTable(org.apache.iceberg.spark.source.SparkTable) File(java.io.File) Test(org.junit.Test)

Aggregations

SparkTable (org.apache.iceberg.spark.source.SparkTable)23 Test (org.junit.Test)12 Identifier (org.apache.spark.sql.connector.catalog.Identifier)8 File (java.io.File)7 SparkCatalog (org.apache.iceberg.spark.SparkCatalog)7 Map (java.util.Map)6 Table (org.apache.iceberg.Table)6 StreamSupport (java.util.stream.StreamSupport)5 DeleteOrphanFiles (org.apache.iceberg.actions.DeleteOrphanFiles)5 Maps (org.apache.iceberg.relocated.com.google.common.collect.Maps)5 SparkSchemaUtil (org.apache.iceberg.spark.SparkSchemaUtil)5 SparkSessionCatalog (org.apache.iceberg.spark.SparkSessionCatalog)5 Transform (org.apache.spark.sql.connector.expressions.Transform)5 After (org.junit.After)5 Assert (org.junit.Assert)5 Schema (org.apache.iceberg.Schema)4 MigrateTable (org.apache.iceberg.actions.MigrateTable)3 SnapshotTable (org.apache.iceberg.actions.SnapshotTable)3 NoSuchTableException (org.apache.spark.sql.catalyst.analysis.NoSuchTableException)3 CatalogTable (org.apache.spark.sql.catalyst.catalog.CatalogTable)3