Search in sources :

Example 1 with DeltaCatalog

use of org.apache.spark.sql.delta.catalog.DeltaCatalog in project OpenLineage by OpenLineage.

the class DeltaHandler method getDatasetVersion.

@SneakyThrows
public Optional<String> getDatasetVersion(TableCatalog tableCatalog, Identifier identifier, Map<String, String> properties) {
    DeltaCatalog deltaCatalog = (DeltaCatalog) tableCatalog;
    Table table = deltaCatalog.loadTable(identifier);
    if (table instanceof DeltaTableV2) {
        DeltaTableV2 deltaTable = (DeltaTableV2) table;
        return Optional.of(Long.toString(deltaTable.snapshot().version()));
    }
    return Optional.empty();
}
Also used : DeltaTableV2(org.apache.spark.sql.delta.catalog.DeltaTableV2) DeltaCatalog(org.apache.spark.sql.delta.catalog.DeltaCatalog) Table(org.apache.spark.sql.connector.catalog.Table) SneakyThrows(lombok.SneakyThrows)

Example 2 with DeltaCatalog

use of org.apache.spark.sql.delta.catalog.DeltaCatalog in project OpenLineage by OpenLineage.

the class DeltaHandler method getDatasetIdentifier.

@Override
public DatasetIdentifier getDatasetIdentifier(SparkSession session, TableCatalog tableCatalog, Identifier identifier, Map<String, String> properties) {
    DeltaCatalog catalog = (DeltaCatalog) tableCatalog;
    Optional<String> location;
    if (catalog.isPathIdentifier(identifier)) {
        location = Optional.of(identifier.name());
    } else {
        location = Optional.ofNullable(properties.get("location"));
    }
    // Delta uses spark2 catalog when location isn't specified.
    Path path = new Path(location.orElse(session.sessionState().catalog().defaultTablePath(TableIdentifier.apply(identifier.name(), Option.apply(Arrays.stream(identifier.namespace()).reduce((x, y) -> y).orElse(null)))).toString()));
    log.info(path.toString());
    return PathUtils.fromPath(path, "file");
}
Also used : Path(org.apache.hadoop.fs.Path) Arrays(java.util.Arrays) SneakyThrows(lombok.SneakyThrows) DatasetIdentifier(io.openlineage.spark.agent.util.DatasetIdentifier) PathUtils(io.openlineage.spark.agent.util.PathUtils) TableIdentifier(org.apache.spark.sql.catalyst.TableIdentifier) Option(scala.Option) Slf4j(lombok.extern.slf4j.Slf4j) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) DeltaCatalog(org.apache.spark.sql.delta.catalog.DeltaCatalog) TableProviderFacet(io.openlineage.spark.agent.facets.TableProviderFacet) DeltaTableV2(org.apache.spark.sql.delta.catalog.DeltaTableV2) Map(java.util.Map) Optional(java.util.Optional) Path(org.apache.hadoop.fs.Path) Identifier(org.apache.spark.sql.connector.catalog.Identifier) SparkSession(org.apache.spark.sql.SparkSession) Table(org.apache.spark.sql.connector.catalog.Table) DeltaCatalog(org.apache.spark.sql.delta.catalog.DeltaCatalog)

Example 3 with DeltaCatalog

use of org.apache.spark.sql.delta.catalog.DeltaCatalog in project OpenLineage by OpenLineage.

the class DeltaHandlerTest method testGetVersionString.

@Test
public void testGetVersionString() {
    DeltaCatalog deltaCatalog = mock(DeltaCatalog.class);
    DeltaTableV2 deltaTable = mock(DeltaTableV2.class, RETURNS_DEEP_STUBS);
    Identifier identifier = Identifier.of(new String[] { "database", "schema" }, "table");
    DeltaHandler deltaHandler = new DeltaHandler();
    when(deltaCatalog.loadTable(identifier)).thenReturn(deltaTable);
    when(deltaTable.snapshot().version()).thenReturn(2L);
    Optional<String> version = deltaHandler.getDatasetVersion(deltaCatalog, identifier, Collections.emptyMap());
    assertTrue(version.isPresent());
    assertEquals(version.get(), "2");
}
Also used : DeltaTableV2(org.apache.spark.sql.delta.catalog.DeltaTableV2) DeltaCatalog(org.apache.spark.sql.delta.catalog.DeltaCatalog) Identifier(org.apache.spark.sql.connector.catalog.Identifier) Test(org.junit.jupiter.api.Test)

Aggregations

DeltaCatalog (org.apache.spark.sql.delta.catalog.DeltaCatalog)3 DeltaTableV2 (org.apache.spark.sql.delta.catalog.DeltaTableV2)3 SneakyThrows (lombok.SneakyThrows)2 Identifier (org.apache.spark.sql.connector.catalog.Identifier)2 Table (org.apache.spark.sql.connector.catalog.Table)2 TableProviderFacet (io.openlineage.spark.agent.facets.TableProviderFacet)1 DatasetIdentifier (io.openlineage.spark.agent.util.DatasetIdentifier)1 PathUtils (io.openlineage.spark.agent.util.PathUtils)1 Arrays (java.util.Arrays)1 Map (java.util.Map)1 Optional (java.util.Optional)1 Slf4j (lombok.extern.slf4j.Slf4j)1 Path (org.apache.hadoop.fs.Path)1 SparkSession (org.apache.spark.sql.SparkSession)1 TableIdentifier (org.apache.spark.sql.catalyst.TableIdentifier)1 TableCatalog (org.apache.spark.sql.connector.catalog.TableCatalog)1 Test (org.junit.jupiter.api.Test)1 Option (scala.Option)1