Search in sources :

Example 1 with ChainedTransformer

use of org.apache.hudi.utilities.transform.ChainedTransformer in project hudi by apache.

the class TestChainedTransformer method testChainedTransformation.

@Test
public void testChainedTransformation() {
    StructType schema = DataTypes.createStructType(new StructField[] { createStructField("foo", StringType, false) });
    Row r1 = RowFactory.create("100");
    Row r2 = RowFactory.create("200");
    Dataset<Row> original = spark().sqlContext().createDataFrame(Arrays.asList(r1, r2), schema);
    Transformer t1 = (jsc, sparkSession, dataset, properties) -> dataset.withColumnRenamed("foo", "bar");
    Transformer t2 = (jsc, sparkSession, dataset, properties) -> dataset.withColumn("bar", dataset.col("bar").cast(IntegerType));
    ChainedTransformer transformer = new ChainedTransformer(Arrays.asList(t1, t2));
    Dataset<Row> transformed = transformer.apply(jsc(), spark(), original, null);
    assertEquals(2, transformed.count());
    assertArrayEquals(new String[] { "bar" }, transformed.columns());
    List<Row> rows = transformed.collectAsList();
    assertEquals(100, rows.get(0).getInt(0));
    assertEquals(200, rows.get(1).getInt(0));
}
Also used : DataTypes(org.apache.spark.sql.types.DataTypes) StructField(org.apache.spark.sql.types.StructField) StructType(org.apache.spark.sql.types.StructType) Arrays(java.util.Arrays) Dataset(org.apache.spark.sql.Dataset) RowFactory(org.apache.spark.sql.RowFactory) StringType(org.apache.spark.sql.types.DataTypes.StringType) Row(org.apache.spark.sql.Row) Test(org.junit.jupiter.api.Test) Assertions.assertArrayEquals(org.junit.jupiter.api.Assertions.assertArrayEquals) List(java.util.List) SparkClientFunctionalTestHarness(org.apache.hudi.testutils.SparkClientFunctionalTestHarness) IntegerType(org.apache.spark.sql.types.DataTypes.IntegerType) ChainedTransformer(org.apache.hudi.utilities.transform.ChainedTransformer) Transformer(org.apache.hudi.utilities.transform.Transformer) Tag(org.junit.jupiter.api.Tag) Assertions.assertEquals(org.junit.jupiter.api.Assertions.assertEquals) DataTypes.createStructField(org.apache.spark.sql.types.DataTypes.createStructField) ChainedTransformer(org.apache.hudi.utilities.transform.ChainedTransformer) Transformer(org.apache.hudi.utilities.transform.Transformer) StructType(org.apache.spark.sql.types.StructType) ChainedTransformer(org.apache.hudi.utilities.transform.ChainedTransformer) Row(org.apache.spark.sql.Row) Test(org.junit.jupiter.api.Test)

Aggregations

Arrays (java.util.Arrays)1 List (java.util.List)1 SparkClientFunctionalTestHarness (org.apache.hudi.testutils.SparkClientFunctionalTestHarness)1 ChainedTransformer (org.apache.hudi.utilities.transform.ChainedTransformer)1 Transformer (org.apache.hudi.utilities.transform.Transformer)1 Dataset (org.apache.spark.sql.Dataset)1 Row (org.apache.spark.sql.Row)1 RowFactory (org.apache.spark.sql.RowFactory)1 DataTypes (org.apache.spark.sql.types.DataTypes)1 IntegerType (org.apache.spark.sql.types.DataTypes.IntegerType)1 StringType (org.apache.spark.sql.types.DataTypes.StringType)1 DataTypes.createStructField (org.apache.spark.sql.types.DataTypes.createStructField)1 StructField (org.apache.spark.sql.types.StructField)1 StructType (org.apache.spark.sql.types.StructType)1 Assertions.assertArrayEquals (org.junit.jupiter.api.Assertions.assertArrayEquals)1 Assertions.assertEquals (org.junit.jupiter.api.Assertions.assertEquals)1 Tag (org.junit.jupiter.api.Tag)1 Test (org.junit.jupiter.api.Test)1