Search in sources :

Example 1 with OpenLineageContext

use of io.openlineage.spark.api.OpenLineageContext in project OpenLineage by OpenLineage.

the class InternalEventHandlerFactory method createInputDatasetBuilder.

@Override
public Collection<PartialFunction<Object, List<InputDataset>>> createInputDatasetBuilder(OpenLineageContext context) {
    ImmutableList builders = ImmutableList.<PartialFunction<Object, List<InputDataset>>>builder().addAll(generate(eventHandlerFactories, factory -> factory.createInputDatasetBuilder(context))).addAll(DatasetBuilderFactoryProvider.getInstance().getInputBuilders(context)).build();
    context.getInputDatasetBuilders().addAll(builders);
    return builders;
}
Also used : Spliterators(java.util.Spliterators) InputDataset(io.openlineage.client.OpenLineage.InputDataset) OutputDatasetFacet(io.openlineage.client.OpenLineage.OutputDatasetFacet) Function(java.util.function.Function) SparkVersionFacetBuilder(io.openlineage.spark.agent.facets.builder.SparkVersionFacetBuilder) ImmutableList(com.google.common.collect.ImmutableList) OutputDataset(io.openlineage.client.OpenLineage.OutputDataset) ErrorFacetBuilder(io.openlineage.spark.agent.facets.builder.ErrorFacetBuilder) OutputStatisticsOutputDatasetFacetBuilder(io.openlineage.spark.agent.facets.builder.OutputStatisticsOutputDatasetFacetBuilder) JobFacet(io.openlineage.client.OpenLineage.JobFacet) DatabricksEnvironmentFacetBuilder(io.openlineage.spark.agent.facets.builder.DatabricksEnvironmentFacetBuilder) StreamSupport(java.util.stream.StreamSupport) LogicalPlanRunFacetBuilder(io.openlineage.spark.agent.facets.builder.LogicalPlanRunFacetBuilder) LogicalPlan(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan) PartialFunction(scala.PartialFunction) OpenLineageContext(io.openlineage.spark.api.OpenLineageContext) Collection(java.util.Collection) InputDatasetFacet(io.openlineage.client.OpenLineage.InputDatasetFacet) ServiceLoader(java.util.ServiceLoader) DatasetFacet(io.openlineage.client.OpenLineage.DatasetFacet) Collectors(java.util.stream.Collectors) List(java.util.List) OpenLineageEventHandlerFactory(io.openlineage.spark.api.OpenLineageEventHandlerFactory) CustomFacetBuilder(io.openlineage.spark.api.CustomFacetBuilder) Builder(com.google.common.collect.ImmutableList.Builder) Spliterator(java.util.Spliterator) RunFacet(io.openlineage.client.OpenLineage.RunFacet) ImmutableList(com.google.common.collect.ImmutableList) ImmutableList(com.google.common.collect.ImmutableList) List(java.util.List)

Example 2 with OpenLineageContext

use of io.openlineage.spark.api.OpenLineageContext in project OpenLineage by OpenLineage.

the class DataSourceV2RelationDatasetBuilderTest method provideBuildersWithSparkListeners.

private static Stream<Arguments> provideBuildersWithSparkListeners() {
    OpenLineageContext context = mock(OpenLineageContext.class);
    DatasetFactory factory = mock(DatasetFactory.class);
    return Stream.of(Arguments.of(new DataSourceV2RelationInputDatasetBuilder(context, factory), mock(SparkListenerJobStart.class), true), Arguments.of(new DataSourceV2RelationInputDatasetBuilder(context, factory), mock(SparkListenerSQLExecutionStart.class), true), Arguments.of(new DataSourceV2RelationInputDatasetBuilder(context, factory), mock(SparkListenerJobEnd.class), false), Arguments.of(new DataSourceV2RelationInputDatasetBuilder(context, factory), mock(SparkListenerSQLExecutionEnd.class), false), Arguments.of(new DataSourceV2RelationOutputDatasetBuilder(context, factory), mock(SparkListenerJobStart.class), false), Arguments.of(new DataSourceV2RelationOutputDatasetBuilder(context, factory), mock(SparkListenerSQLExecutionStart.class), false), Arguments.of(new DataSourceV2RelationOutputDatasetBuilder(context, factory), mock(SparkListenerJobEnd.class), true), Arguments.of(new DataSourceV2RelationOutputDatasetBuilder(context, factory), mock(SparkListenerSQLExecutionEnd.class), true));
}
Also used : OpenLineageContext(io.openlineage.spark.api.OpenLineageContext) DatasetFactory(io.openlineage.spark.api.DatasetFactory)

Example 3 with OpenLineageContext

use of io.openlineage.spark.api.OpenLineageContext in project OpenLineage by OpenLineage.

the class DataSourceV2RelationDatasetBuilderTest method provideBuilders.

private static Stream<Arguments> provideBuilders() {
    OpenLineageContext context = mock(OpenLineageContext.class);
    DatasetFactory factory = mock(DatasetFactory.class);
    OpenLineage openLineage = mock(OpenLineage.class);
    return Stream.of(Arguments.of(new DataSourceV2RelationInputDatasetBuilder(context, factory), mock(DataSourceV2Relation.class), context, factory, openLineage), Arguments.of(new DataSourceV2RelationOutputDatasetBuilder(context, factory), mock(DataSourceV2Relation.class), context, factory, openLineage));
}
Also used : OpenLineage(io.openlineage.client.OpenLineage) OpenLineageContext(io.openlineage.spark.api.OpenLineageContext) DatasetFactory(io.openlineage.spark.api.DatasetFactory)

Example 4 with OpenLineageContext

use of io.openlineage.spark.api.OpenLineageContext in project OpenLineage by OpenLineage.

the class OpenLineageSparkListenerTest method testSqlEventWithJobEventEmitsOnce.

@Test
public void testSqlEventWithJobEventEmitsOnce() {
    SparkSession sparkSession = mock(SparkSession.class);
    SparkContext sparkContext = mock(SparkContext.class);
    EventEmitter emitter = mock(EventEmitter.class);
    QueryExecution qe = mock(QueryExecution.class);
    LogicalPlan query = UnresolvedRelation$.MODULE$.apply(TableIdentifier.apply("tableName"));
    SparkPlan plan = mock(SparkPlan.class);
    when(sparkSession.sparkContext()).thenReturn(sparkContext);
    when(sparkContext.appName()).thenReturn("appName");
    when(qe.optimizedPlan()).thenReturn(new InsertIntoHadoopFsRelationCommand(new Path("file:///tmp/dir"), null, false, Seq$.MODULE$.empty(), Option.empty(), null, Map$.MODULE$.empty(), query, SaveMode.Overwrite, Option.empty(), Option.empty(), Seq$.MODULE$.<String>empty()));
    when(qe.executedPlan()).thenReturn(plan);
    when(plan.sparkContext()).thenReturn(sparkContext);
    when(plan.nodeName()).thenReturn("execute");
    OpenLineageContext olContext = OpenLineageContext.builder().sparkSession(Optional.of(sparkSession)).sparkContext(sparkSession.sparkContext()).openLineage(new OpenLineage(OpenLineageClient.OPEN_LINEAGE_CLIENT_URI)).queryExecution(qe).build();
    olContext.getOutputDatasetQueryPlanVisitors().add(new InsertIntoHadoopFsRelationVisitor(olContext));
    ExecutionContext executionContext = new StaticExecutionContextFactory(emitter).createSparkSQLExecutionContext(1L, emitter, qe, olContext);
    executionContext.start(new SparkListenerSQLExecutionStart(1L, "", "", "", new SparkPlanInfo("name", "string", Seq$.MODULE$.empty(), Map$.MODULE$.empty(), Seq$.MODULE$.empty()), 1L));
    executionContext.start(new SparkListenerJobStart(0, 2L, Seq$.MODULE$.<StageInfo>empty(), new Properties()));
    ArgumentCaptor<OpenLineage.RunEvent> lineageEvent = ArgumentCaptor.forClass(OpenLineage.RunEvent.class);
    verify(emitter, times(2)).emit(lineageEvent.capture());
}
Also used : Path(org.apache.hadoop.fs.Path) SparkSession(org.apache.spark.sql.SparkSession) SparkPlan(org.apache.spark.sql.execution.SparkPlan) SparkListenerJobStart(org.apache.spark.scheduler.SparkListenerJobStart) StageInfo(org.apache.spark.scheduler.StageInfo) StaticExecutionContextFactory(io.openlineage.spark.agent.lifecycle.StaticExecutionContextFactory) InsertIntoHadoopFsRelationCommand(org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand) Properties(java.util.Properties) QueryExecution(org.apache.spark.sql.execution.QueryExecution) SparkListenerSQLExecutionStart(org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart) SparkContext(org.apache.spark.SparkContext) ExecutionContext(io.openlineage.spark.agent.lifecycle.ExecutionContext) SparkPlanInfo(org.apache.spark.sql.execution.SparkPlanInfo) OpenLineage(io.openlineage.client.OpenLineage) LogicalPlan(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan) OpenLineageContext(io.openlineage.spark.api.OpenLineageContext) InsertIntoHadoopFsRelationVisitor(io.openlineage.spark.agent.lifecycle.plan.InsertIntoHadoopFsRelationVisitor) Test(org.junit.jupiter.api.Test)

Example 5 with OpenLineageContext

use of io.openlineage.spark.api.OpenLineageContext in project OpenLineage by OpenLineage.

the class InternalEventHandlerFactory method createRunFacetBuilders.

@Override
public Collection<CustomFacetBuilder<?, ? extends RunFacet>> createRunFacetBuilders(OpenLineageContext context) {
    Builder<CustomFacetBuilder<?, ? extends RunFacet>> listBuilder;
    listBuilder = ImmutableList.<CustomFacetBuilder<?, ? extends RunFacet>>builder().addAll(generate(eventHandlerFactories, factory -> factory.createRunFacetBuilders((context)))).add(new ErrorFacetBuilder(), new LogicalPlanRunFacetBuilder(context), new SparkVersionFacetBuilder(context));
    if (DatabricksEnvironmentFacetBuilder.isDatabricksRuntime()) {
        listBuilder.add(new DatabricksEnvironmentFacetBuilder(context));
    }
    return listBuilder.build();
}
Also used : Spliterators(java.util.Spliterators) InputDataset(io.openlineage.client.OpenLineage.InputDataset) OutputDatasetFacet(io.openlineage.client.OpenLineage.OutputDatasetFacet) Function(java.util.function.Function) SparkVersionFacetBuilder(io.openlineage.spark.agent.facets.builder.SparkVersionFacetBuilder) ImmutableList(com.google.common.collect.ImmutableList) OutputDataset(io.openlineage.client.OpenLineage.OutputDataset) ErrorFacetBuilder(io.openlineage.spark.agent.facets.builder.ErrorFacetBuilder) OutputStatisticsOutputDatasetFacetBuilder(io.openlineage.spark.agent.facets.builder.OutputStatisticsOutputDatasetFacetBuilder) JobFacet(io.openlineage.client.OpenLineage.JobFacet) DatabricksEnvironmentFacetBuilder(io.openlineage.spark.agent.facets.builder.DatabricksEnvironmentFacetBuilder) StreamSupport(java.util.stream.StreamSupport) LogicalPlanRunFacetBuilder(io.openlineage.spark.agent.facets.builder.LogicalPlanRunFacetBuilder) LogicalPlan(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan) PartialFunction(scala.PartialFunction) OpenLineageContext(io.openlineage.spark.api.OpenLineageContext) Collection(java.util.Collection) InputDatasetFacet(io.openlineage.client.OpenLineage.InputDatasetFacet) ServiceLoader(java.util.ServiceLoader) DatasetFacet(io.openlineage.client.OpenLineage.DatasetFacet) Collectors(java.util.stream.Collectors) List(java.util.List) OpenLineageEventHandlerFactory(io.openlineage.spark.api.OpenLineageEventHandlerFactory) CustomFacetBuilder(io.openlineage.spark.api.CustomFacetBuilder) Builder(com.google.common.collect.ImmutableList.Builder) Spliterator(java.util.Spliterator) RunFacet(io.openlineage.client.OpenLineage.RunFacet) ErrorFacetBuilder(io.openlineage.spark.agent.facets.builder.ErrorFacetBuilder) LogicalPlanRunFacetBuilder(io.openlineage.spark.agent.facets.builder.LogicalPlanRunFacetBuilder) DatabricksEnvironmentFacetBuilder(io.openlineage.spark.agent.facets.builder.DatabricksEnvironmentFacetBuilder) RunFacet(io.openlineage.client.OpenLineage.RunFacet) CustomFacetBuilder(io.openlineage.spark.api.CustomFacetBuilder) SparkVersionFacetBuilder(io.openlineage.spark.agent.facets.builder.SparkVersionFacetBuilder)

Aggregations

OpenLineageContext (io.openlineage.spark.api.OpenLineageContext)8 OpenLineage (io.openlineage.client.OpenLineage)4 OutputDataset (io.openlineage.client.OpenLineage.OutputDataset)4 LogicalPlan (org.apache.spark.sql.catalyst.plans.logical.LogicalPlan)4 ImmutableList (com.google.common.collect.ImmutableList)3 Builder (com.google.common.collect.ImmutableList.Builder)3 DatasetFacet (io.openlineage.client.OpenLineage.DatasetFacet)3 InputDataset (io.openlineage.client.OpenLineage.InputDataset)3 InputDatasetFacet (io.openlineage.client.OpenLineage.InputDatasetFacet)3 JobFacet (io.openlineage.client.OpenLineage.JobFacet)3 OutputDatasetFacet (io.openlineage.client.OpenLineage.OutputDatasetFacet)3 RunFacet (io.openlineage.client.OpenLineage.RunFacet)3 DatabricksEnvironmentFacetBuilder (io.openlineage.spark.agent.facets.builder.DatabricksEnvironmentFacetBuilder)3 ErrorFacetBuilder (io.openlineage.spark.agent.facets.builder.ErrorFacetBuilder)3 LogicalPlanRunFacetBuilder (io.openlineage.spark.agent.facets.builder.LogicalPlanRunFacetBuilder)3 OutputStatisticsOutputDatasetFacetBuilder (io.openlineage.spark.agent.facets.builder.OutputStatisticsOutputDatasetFacetBuilder)3 SparkVersionFacetBuilder (io.openlineage.spark.agent.facets.builder.SparkVersionFacetBuilder)3 CustomFacetBuilder (io.openlineage.spark.api.CustomFacetBuilder)3 OpenLineageEventHandlerFactory (io.openlineage.spark.api.OpenLineageEventHandlerFactory)3 Collection (java.util.Collection)3