Examples with TextFileFormat - org.apache.spark.sql.execution.datasources.text.TextFileFormat

Example 1 with TextFileFormat

use of org.apache.spark.sql.execution.datasources.text.TextFileFormat in project OpenLineage by OpenLineage.

the class LogicalPlanSerializerTest method testSerializeInsertIntoHadoopPlan.

@Test
public void testSerializeInsertIntoHadoopPlan() throws IOException, InvocationTargetException, IllegalAccessException {
    SparkSession session = SparkSession.builder().master("local").getOrCreate();
    HadoopFsRelation hadoopFsRelation = new HadoopFsRelation(new CatalogFileIndex(session, CatalogTableTestUtils.getCatalogTable(new TableIdentifier("test", Option.apply("db"))), 100L), new StructType(new StructField[] { new StructField("name", StringType$.MODULE$, false, Metadata.empty()) }), new StructType(new StructField[] { new StructField("name", StringType$.MODULE$, false, Metadata.empty()) }), Option.empty(), new TextFileFormat(), new HashMap<>(), session);
    LogicalRelation logicalRelation = new LogicalRelation(hadoopFsRelation, Seq$.MODULE$.<AttributeReference>newBuilder().$plus$eq(new AttributeReference("name", StringType$.MODULE$, false, Metadata.empty(), ExprId.apply(1L), Seq$.MODULE$.<String>empty())).result(), Option.empty(), false);
    InsertIntoHadoopFsRelationCommand command = new InsertIntoHadoopFsRelationCommand(new org.apache.hadoop.fs.Path("/tmp"), new HashMap<>(), false, Seq$.MODULE$.<Attribute>newBuilder().$plus$eq(new AttributeReference("name", StringType$.MODULE$, false, Metadata.empty(), ExprId.apply(1L), Seq$.MODULE$.<String>empty())).result(), Option.empty(), new TextFileFormat(), new HashMap<>(), logicalRelation, SaveMode.Overwrite, Option.empty(), Option.empty(), Seq$.MODULE$.<String>newBuilder().$plus$eq("name").result());
    Map<String, Object> commandActualNode = objectMapper.readValue(logicalPlanSerializer.serialize(command), mapTypeReference);
    Map<String, Object> hadoopFSActualNode = objectMapper.readValue(logicalPlanSerializer.serialize(logicalRelation), mapTypeReference);
    Path expectedCommandNodePath = Paths.get("src", "test", "resources", "test_data", "serde", "insertintofs-node.json");
    Path expectedHadoopFSNodePath = Paths.get("src", "test", "resources", "test_data", "serde", "hadoopfsrelation-node.json");
    Map<String, Object> expectedCommandNode = objectMapper.readValue(expectedCommandNodePath.toFile(), mapTypeReference);
    Map<String, Object> expectedHadoopFSNode = objectMapper.readValue(expectedHadoopFSNodePath.toFile(), mapTypeReference);
    assertThat(commandActualNode).satisfies(new MatchesMapRecursively(expectedCommandNode, Collections.singleton("exprId")));
    assertThat(hadoopFSActualNode).satisfies(new MatchesMapRecursively(expectedHadoopFSNode, Collections.singleton("exprId")));
}

Also used : TableIdentifier(org.apache.spark.sql.catalyst.TableIdentifier) Path(java.nio.file.Path) SparkSession(org.apache.spark.sql.SparkSession) StructType(org.apache.spark.sql.types.StructType) AttributeReference(org.apache.spark.sql.catalyst.expressions.AttributeReference) InsertIntoHadoopFsRelationCommand(org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand) LogicalRelation(org.apache.spark.sql.execution.datasources.LogicalRelation) StructField(org.apache.spark.sql.types.StructField) HadoopFsRelation(org.apache.spark.sql.execution.datasources.HadoopFsRelation) CatalogFileIndex(org.apache.spark.sql.execution.datasources.CatalogFileIndex) TextFileFormat(org.apache.spark.sql.execution.datasources.text.TextFileFormat) Test(org.junit.jupiter.api.Test)

Aggregations

Path (java.nio.file.Path)1 SparkSession (org.apache.spark.sql.SparkSession)1 TableIdentifier (org.apache.spark.sql.catalyst.TableIdentifier)1 AttributeReference (org.apache.spark.sql.catalyst.expressions.AttributeReference)1 CatalogFileIndex (org.apache.spark.sql.execution.datasources.CatalogFileIndex)1 HadoopFsRelation (org.apache.spark.sql.execution.datasources.HadoopFsRelation)1 InsertIntoHadoopFsRelationCommand (org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand)1 LogicalRelation (org.apache.spark.sql.execution.datasources.LogicalRelation)1 TextFileFormat (org.apache.spark.sql.execution.datasources.text.TextFileFormat)1 StructField (org.apache.spark.sql.types.StructField)1 StructType (org.apache.spark.sql.types.StructType)1 Test (org.junit.jupiter.api.Test)1