Search in sources :

Example 11 with OpenLineage

use of io.openlineage.client.OpenLineage in project OpenLineage by OpenLineage.

the class SparkVersionFacetBuilderTest method testBuild.

@Test
public void testBuild() {
    SparkVersionFacetBuilder builder = new SparkVersionFacetBuilder(OpenLineageContext.builder().sparkContext(sparkContext).openLineage(new OpenLineage(OpenLineageClient.OPEN_LINEAGE_CLIENT_URI)).build());
    Map<String, RunFacet> runFacetMap = new HashMap<>();
    builder.build(new SparkListenerSQLExecutionEnd(1, 1L), runFacetMap::put);
    assertThat(runFacetMap).hasEntrySatisfying("spark_version", facet -> assertThat(facet).isInstanceOf(SparkVersionFacet.class).hasFieldOrPropertyWithValue("sparkVersion", sparkContext.version()));
}
Also used : SparkListenerSQLExecutionEnd(org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionEnd) HashMap(java.util.HashMap) OpenLineage(io.openlineage.client.OpenLineage) RunFacet(io.openlineage.client.OpenLineage.RunFacet) Test(org.junit.jupiter.api.Test)

Example 12 with OpenLineage

use of io.openlineage.client.OpenLineage in project OpenLineage by OpenLineage.

the class PlanUtils3 method fromDataSourceV2Relation.

public static <D extends OpenLineage.Dataset> List<D> fromDataSourceV2Relation(DatasetFactory<D> datasetFactory, OpenLineageContext context, DataSourceV2Relation relation, OpenLineage.DatasetFacetsBuilder datasetFacetsBuilder) {
    if (relation.identifier().isEmpty()) {
        throw new IllegalArgumentException("Couldn't find identifier for dataset in plan " + relation);
    }
    Identifier identifier = relation.identifier().get();
    if (relation.catalog().isEmpty() || !(relation.catalog().get() instanceof TableCatalog)) {
        throw new IllegalArgumentException("Couldn't find catalog for dataset in plan " + relation);
    }
    TableCatalog tableCatalog = (TableCatalog) relation.catalog().get();
    Map<String, String> tableProperties = relation.table().properties();
    Optional<DatasetIdentifier> di = PlanUtils3.getDatasetIdentifier(context, tableCatalog, identifier, tableProperties);
    if (!di.isPresent()) {
        return Collections.emptyList();
    }
    OpenLineage openLineage = context.getOpenLineage();
    datasetFacetsBuilder.schema(PlanUtils.schemaFacet(openLineage, relation.schema())).dataSource(PlanUtils.datasourceFacet(openLineage, di.get().getNamespace()));
    CatalogUtils3.getTableProviderFacet(tableCatalog, tableProperties).map(provider -> datasetFacetsBuilder.put("tableProvider", provider));
    return Collections.singletonList(datasetFactory.getDataset(di.get().getName(), di.get().getNamespace(), datasetFacetsBuilder.build()));
}
Also used : DatasetIdentifier(io.openlineage.spark.agent.util.DatasetIdentifier) Identifier(org.apache.spark.sql.connector.catalog.Identifier) DatasetIdentifier(io.openlineage.spark.agent.util.DatasetIdentifier) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) OpenLineage(io.openlineage.client.OpenLineage)

Example 13 with OpenLineage

use of io.openlineage.client.OpenLineage in project OpenLineage by OpenLineage.

the class AbstractQueryPlanDatasetBuilderTest method testApplyOnBuilderWithGenericArg.

@Test
public void testApplyOnBuilderWithGenericArg() {
    SparkSession session = SparkSession.builder().config("spark.sql.warehouse.dir", "/tmp/warehouse").master("local").getOrCreate();
    OpenLineage openLineage = new OpenLineage(OpenLineageClient.OPEN_LINEAGE_CLIENT_URI);
    InputDataset expected = openLineage.newInputDataset("namespace", "the_name", null, null);
    OpenLineageContext context = createContext(session, openLineage);
    MyGenericArgInputDatasetBuilder<SparkListenerJobEnd> builder = new MyGenericArgInputDatasetBuilder<>(context, true, expected);
    SparkListenerJobEnd jobEnd = new SparkListenerJobEnd(1, 2, null);
    // Even though our instance of builder is parameterized with SparkListenerJobEnd, it's not
    // *compiled* with that argument, so the isDefinedAt method fails to resolve the type arg
    Assertions.assertFalse(((PartialFunction) builder).isDefinedAt(jobEnd));
}
Also used : SparkSession(org.apache.spark.sql.SparkSession) InputDataset(io.openlineage.client.OpenLineage.InputDataset) SparkListenerJobEnd(org.apache.spark.scheduler.SparkListenerJobEnd) OpenLineage(io.openlineage.client.OpenLineage) Test(org.junit.jupiter.api.Test)

Example 14 with OpenLineage

use of io.openlineage.client.OpenLineage in project OpenLineage by OpenLineage.

the class AlterTableDatasetBuilder method apply.

@Override
public List<OpenLineage.OutputDataset> apply(AlterTable alterTable) {
    TableCatalog tableCatalog = alterTable.catalog();
    Table table;
    try {
        table = alterTable.catalog().loadTable(alterTable.ident());
    } catch (Exception e) {
        return Collections.emptyList();
    }
    Optional<DatasetIdentifier> di = PlanUtils3.getDatasetIdentifier(context, tableCatalog, alterTable.ident(), table.properties());
    if (di.isPresent()) {
        OpenLineage openLineage = context.getOpenLineage();
        OpenLineage.DatasetFacetsBuilder builder = openLineage.newDatasetFacetsBuilder().schema(PlanUtils.schemaFacet(openLineage, table.schema())).dataSource(PlanUtils.datasourceFacet(openLineage, di.get().getNamespace()));
        Optional<String> datasetVersion = CatalogUtils3.getDatasetVersion(tableCatalog, alterTable.ident(), table.properties());
        datasetVersion.ifPresent(version -> builder.version(openLineage.newDatasetVersionDatasetFacet(version)));
        return Collections.singletonList(outputDataset().getDataset(di.get().getName(), di.get().getNamespace(), builder.build()));
    } else {
        return Collections.emptyList();
    }
}
Also used : AlterTable(org.apache.spark.sql.catalyst.plans.logical.AlterTable) Table(org.apache.spark.sql.connector.catalog.Table) DatasetIdentifier(io.openlineage.spark.agent.util.DatasetIdentifier) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) OpenLineage(io.openlineage.client.OpenLineage)

Example 15 with OpenLineage

use of io.openlineage.client.OpenLineage in project OpenLineage by OpenLineage.

the class CreateReplaceDatasetBuilder method apply.

@Override
public List<OpenLineage.OutputDataset> apply(LogicalPlan x) {
    TableCatalog tableCatalog;
    Map<String, String> tableProperties;
    Identifier identifier;
    StructType schema;
    OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange lifecycleStateChange;
    if (x instanceof CreateTableAsSelect) {
        CreateTableAsSelect command = (CreateTableAsSelect) x;
        tableCatalog = command.catalog();
        tableProperties = ScalaConversionUtils.<String, String>fromMap(command.properties());
        identifier = command.tableName();
        schema = command.tableSchema();
        lifecycleStateChange = OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.CREATE;
    } else if (x instanceof CreateV2Table) {
        CreateV2Table command = (CreateV2Table) x;
        tableCatalog = command.catalog();
        tableProperties = ScalaConversionUtils.<String, String>fromMap(command.properties());
        identifier = command.tableName();
        schema = command.tableSchema();
        lifecycleStateChange = OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.CREATE;
    } else if (x instanceof ReplaceTable) {
        ReplaceTable command = (ReplaceTable) x;
        tableCatalog = command.catalog();
        tableProperties = ScalaConversionUtils.<String, String>fromMap(command.properties());
        identifier = command.tableName();
        schema = command.tableSchema();
        lifecycleStateChange = OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.OVERWRITE;
    } else {
        ReplaceTableAsSelect command = (ReplaceTableAsSelect) x;
        tableCatalog = command.catalog();
        tableProperties = ScalaConversionUtils.<String, String>fromMap(command.properties());
        identifier = command.tableName();
        schema = command.tableSchema();
        lifecycleStateChange = OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.OVERWRITE;
    }
    Optional<DatasetIdentifier> di = PlanUtils3.getDatasetIdentifier(context, tableCatalog, identifier, tableProperties);
    if (!di.isPresent()) {
        return Collections.emptyList();
    }
    OpenLineage openLineage = context.getOpenLineage();
    OpenLineage.DatasetFacetsBuilder builder = openLineage.newDatasetFacetsBuilder().schema(PlanUtils.schemaFacet(openLineage, schema)).lifecycleStateChange(openLineage.newLifecycleStateChangeDatasetFacet(lifecycleStateChange, null)).dataSource(PlanUtils.datasourceFacet(openLineage, di.get().getNamespace()));
    Optional<String> datasetVersion = CatalogUtils3.getDatasetVersion(tableCatalog, identifier, tableProperties);
    datasetVersion.ifPresent(version -> builder.version(openLineage.newDatasetVersionDatasetFacet(version)));
    CatalogUtils3.getTableProviderFacet(tableCatalog, tableProperties).map(provider -> builder.put("tableProvider", provider));
    return Collections.singletonList(outputDataset().getDataset(di.get().getName(), di.get().getNamespace(), builder.build()));
}
Also used : StructType(org.apache.spark.sql.types.StructType) DatasetIdentifier(io.openlineage.spark.agent.util.DatasetIdentifier) DatasetIdentifier(io.openlineage.spark.agent.util.DatasetIdentifier) Identifier(org.apache.spark.sql.connector.catalog.Identifier) CreateV2Table(org.apache.spark.sql.catalyst.plans.logical.CreateV2Table) ReplaceTableAsSelect(org.apache.spark.sql.catalyst.plans.logical.ReplaceTableAsSelect) TableCatalog(org.apache.spark.sql.connector.catalog.TableCatalog) CreateTableAsSelect(org.apache.spark.sql.catalyst.plans.logical.CreateTableAsSelect) OpenLineage(io.openlineage.client.OpenLineage) ReplaceTable(org.apache.spark.sql.catalyst.plans.logical.ReplaceTable)

Aggregations

OpenLineage (io.openlineage.client.OpenLineage)38 Test (org.junit.jupiter.api.Test)23 SparkListenerJobEnd (org.apache.spark.scheduler.SparkListenerJobEnd)12 SparkListenerJobStart (org.apache.spark.scheduler.SparkListenerJobStart)9 SparkListenerSQLExecutionEnd (org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionEnd)9 InputDataset (io.openlineage.client.OpenLineage.InputDataset)7 OpenLineageContext (io.openlineage.spark.api.OpenLineageContext)7 LogicalRelation (org.apache.spark.sql.execution.datasources.LogicalRelation)7 SparkListenerSQLExecutionStart (org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart)7 OutputDataset (io.openlineage.client.OpenLineage.OutputDataset)6 HashMap (java.util.HashMap)6 SparkSession (org.apache.spark.sql.SparkSession)6 AttributeReference (org.apache.spark.sql.catalyst.expressions.AttributeReference)6 RunFacet (io.openlineage.client.OpenLineage.RunFacet)5 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)4 RunEvent (io.openlineage.client.OpenLineage.RunEvent)4 SparkListenerStageCompleted (org.apache.spark.scheduler.SparkListenerStageCompleted)4 JsonAnyGetter (com.fasterxml.jackson.annotation.JsonAnyGetter)3 JsonAnySetter (com.fasterxml.jackson.annotation.JsonAnySetter)3 JsonParser (com.fasterxml.jackson.core.JsonParser)3