Search in sources :

Example 1 with KyloCatalogClient

use of com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient in project kylo by Teradata.

the class JdbcHighWaterMarkTest method testClient.

/**
 * Verify interactions with a {@code KyloCatalogClient}.
 */
@Test
public void testClient() {
    // Test setting high water mark
    final KyloCatalogClient client = Mockito.mock(KyloCatalogClient.class);
    final JdbcHighWaterMark highWaterMark = new JdbcHighWaterMark("mockWaterMark", client);
    highWaterMark.accumulate(6L);
    Mockito.verify(client).setHighWaterMarks(Collections.singletonMap("mockWaterMark", "6"));
    // Test with formatter
    highWaterMark.setFormatter(new AbstractFunction1<Long, String>() {

        @Override
        public String apply(final Long value) {
            return ISODateTimeFormat.dateTimeNoMillis().withZoneUTC().print(value);
        }
    });
    highWaterMark.accumulate(1524960000000L);
    Mockito.verify(client).setHighWaterMarks(Collections.singletonMap("mockWaterMark", "2018-04-29T00:00:00Z"));
}
Also used : KyloCatalogClient(com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient) Test(org.junit.Test)

Example 2 with KyloCatalogClient

use of com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient in project kylo by Teradata.

the class AbstractJdbcDataSetProviderTest method updateHighWaterMarkWithDate.

/**
 * Verify updating a high water mark for a date column.
 */
@Test
public void updateHighWaterMarkWithDate() {
    // Mock data set
    final DataFrame dataSet = Mockito.mock(DataFrame.class);
    Mockito.when(dataSet.col("mockField")).thenReturn(new Column("mockField"));
    final StructField field = DataTypes.createStructField("mockField", DataTypes.DateType, true);
    Mockito.when(dataSet.schema()).thenReturn(DataTypes.createStructType(Collections.singletonList(field)));
    final DataFrame mapDataSet = Mockito.mock(DataFrame.class);
    Mockito.when(dataSet.withColumn(Mockito.eq("mockField"), Mockito.any(Column.class))).thenReturn(mapDataSet);
    // Test updating high water mark
    final KyloCatalogClient client = Mockito.mock(KyloCatalogClient.class);
    final JdbcHighWaterMark highWaterMark = new JdbcHighWaterMark("mockWaterMark", client);
    final MockJdbcDataSetProvider provider = new MockJdbcDataSetProvider();
    final DataFrame newDataSet = provider.updateHighWaterMark(dataSet, "mockField", highWaterMark, client);
    Assert.assertEquals(mapDataSet, newDataSet);
    // Test replaced column
    final ArgumentCaptor<Column> newColumn = ArgumentCaptor.forClass(Column.class);
    Mockito.verify(dataSet).withColumn(Mockito.eq("mockField"), newColumn.capture());
    Assert.assertTrue("Expected new column to be a UDF", newColumn.getValue().expr() instanceof ScalaUDF);
}
Also used : StructField(org.apache.spark.sql.types.StructField) JdbcHighWaterMark(com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMark) Column(org.apache.spark.sql.Column) KyloCatalogClient(com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient) DataFrame(org.apache.spark.sql.DataFrame) ScalaUDF(org.apache.spark.sql.catalyst.expressions.ScalaUDF) Test(org.junit.Test)

Example 3 with KyloCatalogClient

use of com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient in project kylo by Teradata.

the class SparkDataSetContextTest method getPathsHighWaterMarkEmpty.

/**
 * Verify retrieving High Water Mark paths when no files match.
 */
@Test
@SuppressWarnings("unchecked")
public void getPathsHighWaterMarkEmpty() throws IOException {
    final long currentTime = System.currentTimeMillis();
    // Mock client
    final KyloCatalogClient client = Mockito.mock(KyloCatalogClient.class);
    Mockito.when(client.getHighWaterMarks()).thenReturn(Collections.singletonMap("water.mark", Long.toString(currentTime)));
    // Mock file
    final File file = tempFolder.newFile("file.txt");
    Assert.assertTrue(file.setLastModified(currentTime - 1000));
    // Mock options
    final DataSetOptions options = new DataSetOptions();
    options.setFormat("mock");
    options.setOption(HighWaterMarkInputFormat.HIGH_WATER_MARK, "water.mark");
    options.setPaths(Collections.singletonList(file.getAbsolutePath()));
    // Mock delegate
    final SparkDataSetDelegate<DataFrame> delegate = Mockito.mock(SparkDataSetDelegate.class);
    Mockito.when(delegate.getHadoopConfiguration(Mockito.any(KyloCatalogClient.class))).thenReturn(new Configuration(false));
    Mockito.when(delegate.isFileFormat(Mockito.any(Class.class))).thenReturn(true);
    // Test resolving paths
    final SparkDataSetContext<DataFrame> context = new SparkDataSetContext<>(options, client, delegate);
    final List<String> paths = context.getPaths();
    Assert.assertNotNull("Expected paths to be non-null", paths);
    Assert.assertEquals("file:/dev/null", paths.get(0));
    Assert.assertEquals(1, paths.size());
    Mockito.verify(client).setHighWaterMarks(Collections.singletonMap("water.mark", Long.toString(currentTime)));
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) KyloCatalogClient(com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient) DataSetOptions(com.thinkbiganalytics.kylo.catalog.spi.DataSetOptions) DataFrame(org.apache.spark.sql.DataFrame) File(java.io.File) Test(org.junit.Test)

Example 4 with KyloCatalogClient

use of com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient in project kylo by Teradata.

the class SparkDataSetContextTest method getPathsHighWaterMark.

/**
 * Verify retrieving High Water Mark paths.
 */
@Test
@SuppressWarnings("unchecked")
public void getPathsHighWaterMark() throws IOException {
    final long currentTime = System.currentTimeMillis();
    // Mock client
    final KyloCatalogClient client = Mockito.mock(KyloCatalogClient.class);
    Mockito.when(client.getHighWaterMarks()).thenReturn(Collections.singletonMap("water.mark", Long.toString(currentTime - 60000)));
    // Mock files
    final List<String> inputPaths = new ArrayList<>();
    final File file1 = tempFolder.newFile("file1");
    Assert.assertTrue(file1.setLastModified(currentTime - 60000));
    inputPaths.add(file1.getAbsolutePath());
    final File file2 = tempFolder.newFile("file2");
    Assert.assertTrue(file2.setLastModified(currentTime - 30000));
    inputPaths.add(file2.getAbsolutePath());
    final File file3 = tempFolder.newFile("file3");
    Assert.assertTrue(file3.setLastModified(currentTime));
    inputPaths.add(file3.getAbsolutePath());
    // Mock options
    final DataSetOptions options = new DataSetOptions();
    options.setFormat("mock");
    options.setOption(HighWaterMarkInputFormat.HIGH_WATER_MARK, "water.mark");
    options.setOption(HighWaterMarkInputFormat.MAX_FILE_AGE, "300000");
    options.setOption(HighWaterMarkInputFormat.MIN_FILE_AGE, "15000");
    options.setPaths(inputPaths);
    // Mock delegate
    final SparkDataSetDelegate<DataFrame> delegate = Mockito.mock(SparkDataSetDelegate.class);
    Mockito.when(delegate.getHadoopConfiguration(Mockito.any(KyloCatalogClient.class))).thenReturn(new Configuration(false));
    Mockito.when(delegate.isFileFormat(Mockito.any(Class.class))).thenReturn(true);
    // Test resolving paths
    final SparkDataSetContext<DataFrame> context = new SparkDataSetContext<>(options, client, delegate);
    final List<String> paths = context.getPaths();
    Assert.assertNotNull("Expected paths to be non-null", paths);
    Assert.assertEquals(file2.toURI().toString(), paths.get(0));
    Assert.assertEquals(1, paths.size());
    Mockito.verify(client).setHighWaterMarks(Collections.singletonMap("water.mark", Long.toString(file2.lastModified())));
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) KyloCatalogClient(com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient) ArrayList(java.util.ArrayList) DataSetOptions(com.thinkbiganalytics.kylo.catalog.spi.DataSetOptions) DataFrame(org.apache.spark.sql.DataFrame) File(java.io.File) Test(org.junit.Test)

Example 5 with KyloCatalogClient

use of com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient in project kylo by Teradata.

the class AbstractJdbcDataSetProviderTest method createHighWaterMarkWithExisting.

/**
 * Verify creating a high water mark with an existing value.
 */
@Test
public void createHighWaterMarkWithExisting() {
    // Mock Kylo Catalog client
    final KyloCatalogClient client = Mockito.mock(KyloCatalogClient.class);
    Mockito.when(client.getHighWaterMarks()).thenReturn(Collections.singletonMap("mockWaterMark", "2018-04-29T15:05:03"));
    // Test creating high water mark
    final MockJdbcDataSetProvider provider = new MockJdbcDataSetProvider();
    final JdbcHighWaterMark highWaterMark = provider.createHighWaterMark("mockWaterMark", client);
    Assert.assertEquals("mockWaterMark", highWaterMark.getName());
    Assert.assertEquals(new Long(1525014303000L), highWaterMark.getValue());
    Mockito.reset(client);
    // Test adding a value
    highWaterMark.accumulate(86400000L);
    Mockito.verifyZeroInteractions(client);
    highWaterMark.accumulate(1532528828000L);
    Mockito.verify(client).setHighWaterMarks(Collections.singletonMap("mockWaterMark", "2018-07-25T14:27:08"));
}
Also used : JdbcHighWaterMark(com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMark) KyloCatalogClient(com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient) Test(org.junit.Test)

Aggregations

KyloCatalogClient (com.thinkbiganalytics.kylo.catalog.api.KyloCatalogClient)7 Test (org.junit.Test)7 JdbcHighWaterMark (com.thinkbiganalytics.kylo.catalog.spark.sources.jdbc.JdbcHighWaterMark)4 DataFrame (org.apache.spark.sql.DataFrame)4 DataSetOptions (com.thinkbiganalytics.kylo.catalog.spi.DataSetOptions)2 File (java.io.File)2 Configuration (org.apache.hadoop.conf.Configuration)2 Column (org.apache.spark.sql.Column)2 ScalaUDF (org.apache.spark.sql.catalyst.expressions.ScalaUDF)2 StructField (org.apache.spark.sql.types.StructField)2 ArrayList (java.util.ArrayList)1