Search in sources :

Example 1 with CreateDataSourceTableCommandVisitor

use of io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableCommandVisitor in project OpenLineage by OpenLineage.

the class BaseVisitorFactory method getOutputVisitors.

@Override
public List<PartialFunction<LogicalPlan, List<OpenLineage.OutputDataset>>> getOutputVisitors(OpenLineageContext context) {
    DatasetFactory<OpenLineage.OutputDataset> factory = DatasetFactory.output(context.getOpenLineage());
    List<PartialFunction<LogicalPlan, List<OpenLineage.OutputDataset>>> outputCommonVisitors = getCommonVisitors(context, factory);
    List<PartialFunction<LogicalPlan, List<OpenLineage.OutputDataset>>> list = new ArrayList<>(outputCommonVisitors);
    list.add(new InsertIntoDataSourceDirVisitor(context));
    list.add(new InsertIntoDataSourceVisitor(context));
    list.add(new InsertIntoHadoopFsRelationVisitor(context));
    list.add(new CreateDataSourceTableAsSelectCommandVisitor(context));
    list.add(new InsertIntoDirVisitor(context));
    if (InsertIntoHiveTableVisitor.hasHiveClasses()) {
        list.add(new InsertIntoHiveTableVisitor(context));
        list.add(new InsertIntoHiveDirVisitor(context));
        list.add(new CreateHiveTableAsSelectCommandVisitor(context));
    }
    if (OptimizedCreateHiveTableAsSelectCommandVisitor.hasClasses()) {
        list.add(new OptimizedCreateHiveTableAsSelectCommandVisitor(context));
    }
    list.add(new CreateDataSourceTableCommandVisitor(context));
    list.add(new LoadDataCommandVisitor(context));
    list.add(new AlterTableRenameCommandVisitor(context));
    list.add(new AlterTableAddColumnsCommandVisitor(context));
    list.add(new CreateTableCommandVisitor(context));
    list.add(new DropTableCommandVisitor(context));
    list.add(new TruncateTableCommandVisitor(context));
    return list;
}
Also used : AlterTableRenameCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.AlterTableRenameCommandVisitor) PartialFunction(scala.PartialFunction) OptimizedCreateHiveTableAsSelectCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.OptimizedCreateHiveTableAsSelectCommandVisitor) TruncateTableCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.TruncateTableCommandVisitor) DropTableCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.DropTableCommandVisitor) ArrayList(java.util.ArrayList) CreateDataSourceTableCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableCommandVisitor) LoadDataCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.LoadDataCommandVisitor) CreateDataSourceTableAsSelectCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableAsSelectCommandVisitor) InsertIntoHiveDirVisitor(io.openlineage.spark.agent.lifecycle.plan.InsertIntoHiveDirVisitor) InsertIntoHiveTableVisitor(io.openlineage.spark.agent.lifecycle.plan.InsertIntoHiveTableVisitor) CreateTableCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.CreateTableCommandVisitor) CreateHiveTableAsSelectCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.CreateHiveTableAsSelectCommandVisitor) OptimizedCreateHiveTableAsSelectCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.OptimizedCreateHiveTableAsSelectCommandVisitor) InsertIntoDirVisitor(io.openlineage.spark.agent.lifecycle.plan.InsertIntoDirVisitor) InsertIntoDataSourceVisitor(io.openlineage.spark.agent.lifecycle.plan.InsertIntoDataSourceVisitor) AlterTableAddColumnsCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.AlterTableAddColumnsCommandVisitor) InsertIntoHadoopFsRelationVisitor(io.openlineage.spark.agent.lifecycle.plan.InsertIntoHadoopFsRelationVisitor) InsertIntoDataSourceDirVisitor(io.openlineage.spark.agent.lifecycle.plan.InsertIntoDataSourceDirVisitor)

Example 2 with CreateDataSourceTableCommandVisitor

use of io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableCommandVisitor in project OpenLineage by OpenLineage.

the class CreateDataSourceTableCommandVisitorTest method testCreateDataSourceTableCommand.

@Test
void testCreateDataSourceTableCommand() {
    CreateDataSourceTableCommandVisitor visitor = new CreateDataSourceTableCommandVisitor(SparkAgentTestExtension.newContext(session));
    CreateDataSourceTableCommand command = new CreateDataSourceTableCommand(SparkUtils.catalogTable(TableIdentifier$.MODULE$.apply("tablename", Option.apply("db")), CatalogTableType.EXTERNAL(), CatalogStorageFormat$.MODULE$.apply(Option.apply(URI.create("s3://bucket/directory")), null, null, null, false, Map$.MODULE$.empty()), new StructType(new StructField[] { new StructField("key", IntegerType$.MODULE$, false, new Metadata(new HashMap<>())), new StructField("value", StringType$.MODULE$, false, new Metadata(new HashMap<>())) })), false);
    assertThat(visitor.isDefinedAt(command)).isTrue();
    List<OpenLineage.OutputDataset> datasets = visitor.apply(command);
    assertEquals(1, datasets.size());
    OpenLineage.OutputDataset outputDataset = datasets.get(0);
    assertEquals(OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.CREATE, outputDataset.getFacets().getLifecycleStateChange().getLifecycleStateChange());
    assertEquals("directory", outputDataset.getName());
    assertEquals("s3://bucket", outputDataset.getNamespace());
}
Also used : StructField(org.apache.spark.sql.types.StructField) StructType(org.apache.spark.sql.types.StructType) CreateDataSourceTableCommand(org.apache.spark.sql.execution.command.CreateDataSourceTableCommand) Metadata(org.apache.spark.sql.types.Metadata) OpenLineage(io.openlineage.client.OpenLineage) CreateDataSourceTableCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableCommandVisitor) Test(org.junit.jupiter.api.Test)

Aggregations

CreateDataSourceTableCommandVisitor (io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableCommandVisitor)2 OpenLineage (io.openlineage.client.OpenLineage)1 AlterTableAddColumnsCommandVisitor (io.openlineage.spark.agent.lifecycle.plan.AlterTableAddColumnsCommandVisitor)1 AlterTableRenameCommandVisitor (io.openlineage.spark.agent.lifecycle.plan.AlterTableRenameCommandVisitor)1 CreateDataSourceTableAsSelectCommandVisitor (io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableAsSelectCommandVisitor)1 CreateHiveTableAsSelectCommandVisitor (io.openlineage.spark.agent.lifecycle.plan.CreateHiveTableAsSelectCommandVisitor)1 CreateTableCommandVisitor (io.openlineage.spark.agent.lifecycle.plan.CreateTableCommandVisitor)1 DropTableCommandVisitor (io.openlineage.spark.agent.lifecycle.plan.DropTableCommandVisitor)1 InsertIntoDataSourceDirVisitor (io.openlineage.spark.agent.lifecycle.plan.InsertIntoDataSourceDirVisitor)1 InsertIntoDataSourceVisitor (io.openlineage.spark.agent.lifecycle.plan.InsertIntoDataSourceVisitor)1 InsertIntoDirVisitor (io.openlineage.spark.agent.lifecycle.plan.InsertIntoDirVisitor)1 InsertIntoHadoopFsRelationVisitor (io.openlineage.spark.agent.lifecycle.plan.InsertIntoHadoopFsRelationVisitor)1 InsertIntoHiveDirVisitor (io.openlineage.spark.agent.lifecycle.plan.InsertIntoHiveDirVisitor)1 InsertIntoHiveTableVisitor (io.openlineage.spark.agent.lifecycle.plan.InsertIntoHiveTableVisitor)1 LoadDataCommandVisitor (io.openlineage.spark.agent.lifecycle.plan.LoadDataCommandVisitor)1 OptimizedCreateHiveTableAsSelectCommandVisitor (io.openlineage.spark.agent.lifecycle.plan.OptimizedCreateHiveTableAsSelectCommandVisitor)1 TruncateTableCommandVisitor (io.openlineage.spark.agent.lifecycle.plan.TruncateTableCommandVisitor)1 ArrayList (java.util.ArrayList)1 CreateDataSourceTableCommand (org.apache.spark.sql.execution.command.CreateDataSourceTableCommand)1 Metadata (org.apache.spark.sql.types.Metadata)1