Search in sources :

Example 1 with OptimizedCreateHiveTableAsSelectCommand

use of org.apache.spark.sql.hive.execution.OptimizedCreateHiveTableAsSelectCommand in project OpenLineage by OpenLineage.

the class OptimizedCreateHiveTableAsSelectCommandVisitorTest method testOptimizedCreateHiveTableAsSelectCommand.

@Test
void testOptimizedCreateHiveTableAsSelectCommand() {
    OptimizedCreateHiveTableAsSelectCommandVisitor visitor = new OptimizedCreateHiveTableAsSelectCommandVisitor(SparkAgentTestExtension.newContext(session));
    OptimizedCreateHiveTableAsSelectCommand command = new OptimizedCreateHiveTableAsSelectCommand(SparkUtils.catalogTable(TableIdentifier$.MODULE$.apply("tablename", Option.apply("db")), CatalogTableType.EXTERNAL(), CatalogStorageFormat$.MODULE$.apply(Option.apply(URI.create("s3://bucket/directory")), null, null, null, false, Map$.MODULE$.empty()), new StructType(new StructField[] { new StructField("key", IntegerType$.MODULE$, false, new Metadata(new HashMap<>())), new StructField("value", StringType$.MODULE$, false, new Metadata(new HashMap<>())) })), new LogicalRelation(new JDBCRelation(new StructType(new StructField[] { new StructField("key", IntegerType$.MODULE$, false, null), new StructField("value", StringType$.MODULE$, false, null) }), new Partition[] {}, new JDBCOptions("", "temp", scala.collection.immutable.Map$.MODULE$.newBuilder().$plus$eq(Tuple2.apply("driver", Driver.class.getName())).result()), session), Seq$.MODULE$.<AttributeReference>newBuilder().$plus$eq(new AttributeReference("key", IntegerType$.MODULE$, false, null, ExprId.apply(1L), Seq$.MODULE$.<String>empty())).$plus$eq(new AttributeReference("value", StringType$.MODULE$, false, null, ExprId.apply(2L), Seq$.MODULE$.<String>empty())).result(), Option.empty(), false), ScalaConversionUtils.fromList(Arrays.asList("key", "value")), SaveMode.Overwrite);
    assertThat(visitor.isDefinedAt(command)).isTrue();
    List<OpenLineage.OutputDataset> datasets = visitor.apply(command);
    assertEquals(1, datasets.size());
    OpenLineage.OutputDataset outputDataset = datasets.get(0);
    assertEquals(OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.OVERWRITE, outputDataset.getFacets().getLifecycleStateChange().getLifecycleStateChange());
    assertEquals("directory", outputDataset.getName());
    assertEquals("s3://bucket", outputDataset.getNamespace());
}
Also used : StructType(org.apache.spark.sql.types.StructType) OptimizedCreateHiveTableAsSelectCommandVisitor(io.openlineage.spark.agent.lifecycle.plan.OptimizedCreateHiveTableAsSelectCommandVisitor) AttributeReference(org.apache.spark.sql.catalyst.expressions.AttributeReference) Metadata(org.apache.spark.sql.types.Metadata) JDBCRelation(org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation) Driver(org.postgresql.Driver) LogicalRelation(org.apache.spark.sql.execution.datasources.LogicalRelation) StructField(org.apache.spark.sql.types.StructField) JDBCOptions(org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions) OptimizedCreateHiveTableAsSelectCommand(org.apache.spark.sql.hive.execution.OptimizedCreateHiveTableAsSelectCommand) OpenLineage(io.openlineage.client.OpenLineage) Test(org.junit.jupiter.api.Test)

Example 2 with OptimizedCreateHiveTableAsSelectCommand

use of org.apache.spark.sql.hive.execution.OptimizedCreateHiveTableAsSelectCommand in project OpenLineage by OpenLineage.

the class OptimizedCreateHiveTableAsSelectCommandVisitor method apply.

@Override
public List<OpenLineage.OutputDataset> apply(LogicalPlan x) {
    OptimizedCreateHiveTableAsSelectCommand command = (OptimizedCreateHiveTableAsSelectCommand) x;
    CatalogTable table = command.tableDesc();
    DatasetIdentifier datasetIdentifier = PathUtils.fromCatalogTable(table);
    StructType schema = outputSchema(ScalaConversionUtils.fromSeq(command.outputColumns()));
    OpenLineage.OutputDataset outputDataset;
    if ((SaveMode.Overwrite == command.mode())) {
        outputDataset = outputDataset().getDataset(datasetIdentifier, schema, OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.OVERWRITE);
    } else {
        outputDataset = outputDataset().getDataset(datasetIdentifier, schema);
    }
    return Collections.singletonList(outputDataset);
}
Also used : StructType(org.apache.spark.sql.types.StructType) OptimizedCreateHiveTableAsSelectCommand(org.apache.spark.sql.hive.execution.OptimizedCreateHiveTableAsSelectCommand) DatasetIdentifier(io.openlineage.spark.agent.util.DatasetIdentifier) OpenLineage(io.openlineage.client.OpenLineage) CatalogTable(org.apache.spark.sql.catalyst.catalog.CatalogTable)

Aggregations

OpenLineage (io.openlineage.client.OpenLineage)2 OptimizedCreateHiveTableAsSelectCommand (org.apache.spark.sql.hive.execution.OptimizedCreateHiveTableAsSelectCommand)2 StructType (org.apache.spark.sql.types.StructType)2 OptimizedCreateHiveTableAsSelectCommandVisitor (io.openlineage.spark.agent.lifecycle.plan.OptimizedCreateHiveTableAsSelectCommandVisitor)1 DatasetIdentifier (io.openlineage.spark.agent.util.DatasetIdentifier)1 CatalogTable (org.apache.spark.sql.catalyst.catalog.CatalogTable)1 AttributeReference (org.apache.spark.sql.catalyst.expressions.AttributeReference)1 LogicalRelation (org.apache.spark.sql.execution.datasources.LogicalRelation)1 JDBCOptions (org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions)1 JDBCRelation (org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation)1 Metadata (org.apache.spark.sql.types.Metadata)1 StructField (org.apache.spark.sql.types.StructField)1 Test (org.junit.jupiter.api.Test)1 Driver (org.postgresql.Driver)1