Search in sources :

Example 6 with JdbcHighWaterMark

use of com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMark in project kylo by Teradata.

the class AbstractJdbcDataSetProviderTest method updateHighWaterMarkWithTimestamp.

/**
 * Verify updating a high water mark for a timestamp column.
 */
@Test
public void updateHighWaterMarkWithTimestamp() {
    // Mock data set
    final DataFrame dataSet = Mockito.mock(DataFrame.class);
    Mockito.when(dataSet.col("mockField")).thenReturn(new Column("mockField"));
    final StructField field = DataTypes.createStructField("mockField", DataTypes.TimestampType, true);
    Mockito.when(dataSet.schema()).thenReturn(DataTypes.createStructType(Collections.singletonList(field)));
    final DataFrame mapDataSet = Mockito.mock(DataFrame.class);
    Mockito.when(dataSet.withColumn(Mockito.eq("mockField"), Mockito.any(Column.class))).thenReturn(mapDataSet);
    // Test updating high water mark
    final KyloCatalogClient client = Mockito.mock(KyloCatalogClient.class);
    final JdbcHighWaterMark highWaterMark = new JdbcHighWaterMark("mockWaterMark", client);
    final MockJdbcDataSetProvider provider = new MockJdbcDataSetProvider();
    final DataFrame newDataSet = provider.updateHighWaterMark(dataSet, "mockField", highWaterMark, client);
    Assert.assertEquals(mapDataSet, newDataSet);
    // Test replaced column
    final ArgumentCaptor<Column> newColumn = ArgumentCaptor.forClass(Column.class);
    Mockito.verify(dataSet).withColumn(Mockito.eq("mockField"), newColumn.capture());
    Assert.assertTrue("Expected new column to be a UDF", newColumn.getValue().expr() instanceof ScalaUDF);
}
Also used : StructField(org.apache.spark.sql.types.StructField) JdbcHighWaterMark(com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMark) Column(org.apache.spark.sql.Column) KyloCatalogClient(com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient) DataFrame(org.apache.spark.sql.DataFrame) ScalaUDF(org.apache.spark.sql.catalyst.expressions.ScalaUDF) Test(org.junit.Test)

Example 7 with JdbcHighWaterMark

use of com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMark in project kylo by Teradata.

the class AbstractJdbcDataSetProvider method updateHighWaterMark.

/**
 * Scans the specified field and updates the specified high water mark.
 */
@Nonnull
@VisibleForTesting
T updateHighWaterMark(@Nonnull final T dataSet, @Nonnull final String fieldName, @Nonnull final JdbcHighWaterMark highWaterMark, @Nonnull final KyloCatalogClient<T> client) {
    // Determine function to convert column to Long
    final DataType fieldType = schema(dataSet).apply(fieldName).dataType();
    final Function1<?, Long> toLong;
    if (fieldType == DataTypes.DateType) {
        toLong = new DateToLong();
    } else if (fieldType == DataTypes.TimestampType) {
        toLong = new TimestampToLong();
    } else {
        throw new KyloCatalogException("Unsupported column type for high water mark: " + fieldType);
    }
    // Create UDF and apply to field
    final String accumulableId = (highWaterMark.getName() != null) ? highWaterMark.getName() : UUID.randomUUID().toString();
    final Accumulable<JdbcHighWaterMark, Long> accumulable = accumulable(highWaterMark, accumulableId, new JdbcHighWaterMarkAccumulableParam(), client);
    final JdbcHighWaterMarkVisitor<?> visitor = new JdbcHighWaterMarkVisitor<>(accumulable, toLong);
    return map(dataSet, fieldName, visitor, fieldType);
}
Also used : JdbcHighWaterMarkVisitor(com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMarkVisitor) JdbcHighWaterMark(com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMark) KyloCatalogException(com.thinkbiganalytics.kylo.catalog.api.KyloCatalogException) DataType(org.apache.spark.sql.types.DataType) JdbcHighWaterMarkAccumulableParam(com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMarkAccumulableParam) VisibleForTesting(com.google.common.annotations.VisibleForTesting) Nonnull(javax.annotation.Nonnull)

Aggregations

JdbcHighWaterMark (com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMark)7 KyloCatalogClient (com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient)4 Test (org.junit.Test)4 Nonnull (javax.annotation.Nonnull)3 VisibleForTesting (com.google.common.annotations.VisibleForTesting)2 KyloCatalogException (com.thinkbiganalytics.kylo.catalog.api.KyloCatalogException)2 Column (org.apache.spark.sql.Column)2 DataFrame (org.apache.spark.sql.DataFrame)2 ScalaUDF (org.apache.spark.sql.catalyst.expressions.ScalaUDF)2 StructField (org.apache.spark.sql.types.StructField)2 JdbcHighWaterMarkAccumulableParam (com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMarkAccumulableParam)1 JdbcHighWaterMarkVisitor (com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMarkVisitor)1 JdbcRelationProvider (com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcRelationProvider)1 DataFrameReader (org.apache.spark.sql.DataFrameReader)1 DataType (org.apache.spark.sql.types.DataType)1