Search in sources :

Example 1 with ScalaUDF

use of org.apache.spark.sql.catalyst.expressions.ScalaUDF in project kylo by Teradata.

the class AbstractSparkDataSetProviderTest method readDeleteSourceFile.

/**
 * Verify reading a data set and deleting the source file.
 */
@Test
@SuppressWarnings("unchecked")
public void readDeleteSourceFile() {
    isFileFormat = true;
    // Mock data set
    dataSet = Mockito.mock(DataFrame.class);
    Mockito.when(dataSet.col("value")).thenReturn(new Column("value"));
    final StructType schema = DataTypes.createStructType(Collections.singletonList(DataTypes.createStructField("value", DataTypes.StringType, true)));
    Mockito.when(dataSet.schema()).thenReturn(schema);
    final DataFrame mapDataSet = Mockito.mock(DataFrame.class);
    Mockito.when(dataSet.withColumn(Mockito.eq("value"), Mockito.any(Column.class))).thenReturn(mapDataSet);
    // Mock options
    final DataSetOptions options = new DataSetOptions();
    options.setFormat("text");
    options.setOption(KyloCatalogConstants.PATH_OPTION, "/mock/path/file.txt");
    options.setOption("keepSourceFile", "FALSE");
    // Test reading
    final MockSparkDataSetProvider provider = new MockSparkDataSetProvider();
    final DataFrame df = provider.read(Mockito.mock(KyloCatalogClient.class), options);
    Assert.assertEquals(mapDataSet, df);
    final ArgumentCaptor<Column> newColumn = ArgumentCaptor.forClass(Column.class);
    Mockito.verify(dataSet).withColumn(Mockito.eq("value"), newColumn.capture());
    Assert.assertTrue("Expected new column to be a UDF", newColumn.getValue().expr() instanceof ScalaUDF);
}
Also used : StructType(org.apache.spark.sql.types.StructType) Column(org.apache.spark.sql.Column) KyloCatalogClient(com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient) DataSetOptions(com.thinkbiganalytics.kylo.catalog.spi.DataSetOptions) DataFrame(org.apache.spark.sql.DataFrame) ScalaUDF(org.apache.spark.sql.catalyst.expressions.ScalaUDF) Test(org.junit.Test)

Example 2 with ScalaUDF

use of org.apache.spark.sql.catalyst.expressions.ScalaUDF in project kylo by Teradata.

the class AbstractJdbcDataSetProviderTest method updateHighWaterMarkWithDate.

/**
 * Verify updating a high water mark for a date column.
 */
@Test
public void updateHighWaterMarkWithDate() {
    // Mock data set
    final DataFrame dataSet = Mockito.mock(DataFrame.class);
    Mockito.when(dataSet.col("mockField")).thenReturn(new Column("mockField"));
    final StructField field = DataTypes.createStructField("mockField", DataTypes.DateType, true);
    Mockito.when(dataSet.schema()).thenReturn(DataTypes.createStructType(Collections.singletonList(field)));
    final DataFrame mapDataSet = Mockito.mock(DataFrame.class);
    Mockito.when(dataSet.withColumn(Mockito.eq("mockField"), Mockito.any(Column.class))).thenReturn(mapDataSet);
    // Test updating high water mark
    final KyloCatalogClient client = Mockito.mock(KyloCatalogClient.class);
    final JdbcHighWaterMark highWaterMark = new JdbcHighWaterMark("mockWaterMark", client);
    final MockJdbcDataSetProvider provider = new MockJdbcDataSetProvider();
    final DataFrame newDataSet = provider.updateHighWaterMark(dataSet, "mockField", highWaterMark, client);
    Assert.assertEquals(mapDataSet, newDataSet);
    // Test replaced column
    final ArgumentCaptor<Column> newColumn = ArgumentCaptor.forClass(Column.class);
    Mockito.verify(dataSet).withColumn(Mockito.eq("mockField"), newColumn.capture());
    Assert.assertTrue("Expected new column to be a UDF", newColumn.getValue().expr() instanceof ScalaUDF);
}
Also used : StructField(org.apache.spark.sql.types.StructField) JdbcHighWaterMark(com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMark) Column(org.apache.spark.sql.Column) KyloCatalogClient(com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient) DataFrame(org.apache.spark.sql.DataFrame) ScalaUDF(org.apache.spark.sql.catalyst.expressions.ScalaUDF) Test(org.junit.Test)

Example 3 with ScalaUDF

use of org.apache.spark.sql.catalyst.expressions.ScalaUDF in project kylo by Teradata.

the class AbstractJdbcDataSetProviderTest method updateHighWaterMarkWithTimestamp.

/**
 * Verify updating a high water mark for a timestamp column.
 */
@Test
public void updateHighWaterMarkWithTimestamp() {
    // Mock data set
    final DataFrame dataSet = Mockito.mock(DataFrame.class);
    Mockito.when(dataSet.col("mockField")).thenReturn(new Column("mockField"));
    final StructField field = DataTypes.createStructField("mockField", DataTypes.TimestampType, true);
    Mockito.when(dataSet.schema()).thenReturn(DataTypes.createStructType(Collections.singletonList(field)));
    final DataFrame mapDataSet = Mockito.mock(DataFrame.class);
    Mockito.when(dataSet.withColumn(Mockito.eq("mockField"), Mockito.any(Column.class))).thenReturn(mapDataSet);
    // Test updating high water mark
    final KyloCatalogClient client = Mockito.mock(KyloCatalogClient.class);
    final JdbcHighWaterMark highWaterMark = new JdbcHighWaterMark("mockWaterMark", client);
    final MockJdbcDataSetProvider provider = new MockJdbcDataSetProvider();
    final DataFrame newDataSet = provider.updateHighWaterMark(dataSet, "mockField", highWaterMark, client);
    Assert.assertEquals(mapDataSet, newDataSet);
    // Test replaced column
    final ArgumentCaptor<Column> newColumn = ArgumentCaptor.forClass(Column.class);
    Mockito.verify(dataSet).withColumn(Mockito.eq("mockField"), newColumn.capture());
    Assert.assertTrue("Expected new column to be a UDF", newColumn.getValue().expr() instanceof ScalaUDF);
}
Also used : StructField(org.apache.spark.sql.types.StructField) JdbcHighWaterMark(com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMark) Column(org.apache.spark.sql.Column) KyloCatalogClient(com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient) DataFrame(org.apache.spark.sql.DataFrame) ScalaUDF(org.apache.spark.sql.catalyst.expressions.ScalaUDF) Test(org.junit.Test)

Aggregations

KyloCatalogClient (com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient)3 Column (org.apache.spark.sql.Column)3 DataFrame (org.apache.spark.sql.DataFrame)3 ScalaUDF (org.apache.spark.sql.catalyst.expressions.ScalaUDF)3 Test (org.junit.Test)3 JdbcHighWaterMark (com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMark)2 StructField (org.apache.spark.sql.types.StructField)2 DataSetOptions (com.thinkbiganalytics.kylo.catalog.spi.DataSetOptions)1 StructType (org.apache.spark.sql.types.StructType)1