Search in sources :

Example 6 with HoodieReadClient

use of org.apache.hudi.client.HoodieReadClient in project hudi by apache.

the class TestInlineCompaction method testSuccessfulCompactionBasedOnNumCommits.

@Test
public void testSuccessfulCompactionBasedOnNumCommits() throws Exception {
    // Given: make three commits
    HoodieWriteConfig cfg = getConfigForInlineCompaction(3, 60, CompactionTriggerStrategy.NUM_COMMITS);
    List<String> instants = IntStream.range(0, 2).mapToObj(i -> HoodieActiveTimeline.createNewInstantTime()).collect(Collectors.toList());
    try (SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(cfg)) {
        List<HoodieRecord> records = dataGen.generateInserts(instants.get(0), 100);
        HoodieReadClient readClient = getHoodieReadClient(cfg.getBasePath());
        runNextDeltaCommits(writeClient, readClient, instants, records, cfg, true, new ArrayList<>());
        // third commit, that will trigger compaction
        HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(cfg.getBasePath()).build();
        String finalInstant = HoodieActiveTimeline.createNewInstantTime();
        createNextDeltaCommit(finalInstant, dataGen.generateUpdates(finalInstant, 100), writeClient, metaClient, cfg, false);
        // Then: ensure the file slices are compacted as per policy
        metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(cfg.getBasePath()).build();
        assertEquals(4, metaClient.getActiveTimeline().getWriteTimeline().countInstants());
        assertEquals(HoodieTimeline.COMMIT_ACTION, metaClient.getActiveTimeline().lastInstant().get().getAction());
        String compactionTime = metaClient.getActiveTimeline().lastInstant().get().getTimestamp();
        assertFalse(WriteMarkersFactory.get(cfg.getMarkersType(), HoodieSparkTable.create(cfg, context), compactionTime).doesMarkerDirExist());
    }
}
Also used : IntStream(java.util.stream.IntStream) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) Arrays(java.util.Arrays) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) WriteMarkersFactory(org.apache.hudi.table.marker.WriteMarkersFactory) Collectors(java.util.stream.Collectors) ArrayList(java.util.ArrayList) HoodieCompactionConfig(org.apache.hudi.config.HoodieCompactionConfig) Test(org.junit.jupiter.api.Test) HoodieSparkTable(org.apache.hudi.table.HoodieSparkTable) List(java.util.List) SparkRDDWriteClient(org.apache.hudi.client.SparkRDDWriteClient) HoodieReadClient(org.apache.hudi.client.HoodieReadClient) Assertions.assertFalse(org.junit.jupiter.api.Assertions.assertFalse) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) Assertions.assertEquals(org.junit.jupiter.api.Assertions.assertEquals) HoodieActiveTimeline(org.apache.hudi.common.table.timeline.HoodieActiveTimeline) HoodieTimeline(org.apache.hudi.common.table.timeline.HoodieTimeline) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) HoodieReadClient(org.apache.hudi.client.HoodieReadClient) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) Test(org.junit.jupiter.api.Test)

Example 7 with HoodieReadClient

use of org.apache.hudi.client.HoodieReadClient in project hudi by apache.

the class TestInlineCompaction method testSuccessfulCompactionBasedOnTime.

@Test
public void testSuccessfulCompactionBasedOnTime() throws Exception {
    // Given: make one commit
    HoodieWriteConfig cfg = getConfigForInlineCompaction(5, 10, CompactionTriggerStrategy.TIME_ELAPSED);
    try (SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(cfg)) {
        String instantTime = HoodieActiveTimeline.createNewInstantTime();
        List<HoodieRecord> records = dataGen.generateInserts(instantTime, 10);
        HoodieReadClient readClient = getHoodieReadClient(cfg.getBasePath());
        runNextDeltaCommits(writeClient, readClient, Arrays.asList(instantTime), records, cfg, true, new ArrayList<>());
        // after 10s, that will trigger compaction
        String finalInstant = HoodieActiveTimeline.createNewInstantTime(10000);
        HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(cfg.getBasePath()).build();
        createNextDeltaCommit(finalInstant, dataGen.generateUpdates(finalInstant, 100), writeClient, metaClient, cfg, false);
        // Then: ensure the file slices are compacted as per policy
        metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(cfg.getBasePath()).build();
        assertEquals(3, metaClient.getActiveTimeline().getWriteTimeline().countInstants());
        assertEquals(HoodieTimeline.COMMIT_ACTION, metaClient.getActiveTimeline().lastInstant().get().getAction());
    }
}
Also used : HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) HoodieReadClient(org.apache.hudi.client.HoodieReadClient) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) Test(org.junit.jupiter.api.Test)

Example 8 with HoodieReadClient

use of org.apache.hudi.client.HoodieReadClient in project hudi by apache.

the class TestInlineCompaction method testSuccessfulCompactionBasedOnNumOrTime.

@Test
public void testSuccessfulCompactionBasedOnNumOrTime() throws Exception {
    // Given: make three commits
    HoodieWriteConfig cfg = getConfigForInlineCompaction(3, 20, CompactionTriggerStrategy.NUM_OR_TIME);
    try (SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(cfg)) {
        List<HoodieRecord> records = dataGen.generateInserts(HoodieActiveTimeline.createNewInstantTime(), 10);
        HoodieReadClient readClient = getHoodieReadClient(cfg.getBasePath());
        List<String> instants = IntStream.range(0, 2).mapToObj(i -> HoodieActiveTimeline.createNewInstantTime()).collect(Collectors.toList());
        runNextDeltaCommits(writeClient, readClient, instants, records, cfg, true, new ArrayList<>());
        // Then: trigger the compaction because reach 3 commits.
        String finalInstant = HoodieActiveTimeline.createNewInstantTime();
        HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(cfg.getBasePath()).build();
        createNextDeltaCommit(finalInstant, dataGen.generateUpdates(finalInstant, 10), writeClient, metaClient, cfg, false);
        metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(cfg.getBasePath()).build();
        assertEquals(4, metaClient.getActiveTimeline().getWriteTimeline().countInstants());
        // 4th commit, that will trigger compaction because reach the time elapsed
        metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(cfg.getBasePath()).build();
        finalInstant = HoodieActiveTimeline.createNewInstantTime(20000);
        createNextDeltaCommit(finalInstant, dataGen.generateUpdates(finalInstant, 10), writeClient, metaClient, cfg, false);
        metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(cfg.getBasePath()).build();
        assertEquals(6, metaClient.getActiveTimeline().getWriteTimeline().countInstants());
    }
}
Also used : IntStream(java.util.stream.IntStream) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) Arrays(java.util.Arrays) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) WriteMarkersFactory(org.apache.hudi.table.marker.WriteMarkersFactory) Collectors(java.util.stream.Collectors) ArrayList(java.util.ArrayList) HoodieCompactionConfig(org.apache.hudi.config.HoodieCompactionConfig) Test(org.junit.jupiter.api.Test) HoodieSparkTable(org.apache.hudi.table.HoodieSparkTable) List(java.util.List) SparkRDDWriteClient(org.apache.hudi.client.SparkRDDWriteClient) HoodieReadClient(org.apache.hudi.client.HoodieReadClient) Assertions.assertFalse(org.junit.jupiter.api.Assertions.assertFalse) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) Assertions.assertEquals(org.junit.jupiter.api.Assertions.assertEquals) HoodieActiveTimeline(org.apache.hudi.common.table.timeline.HoodieActiveTimeline) HoodieTimeline(org.apache.hudi.common.table.timeline.HoodieTimeline) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) HoodieReadClient(org.apache.hudi.client.HoodieReadClient) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) Test(org.junit.jupiter.api.Test)

Example 9 with HoodieReadClient

use of org.apache.hudi.client.HoodieReadClient in project hudi by apache.

the class TestInlineCompaction method testCompactionRetryOnFailureBasedOnTime.

@Test
public void testCompactionRetryOnFailureBasedOnTime() throws Exception {
    // Given: two commits, schedule compaction and its failed/in-flight
    HoodieWriteConfig cfg = getConfigBuilder(false).withCompactionConfig(HoodieCompactionConfig.newBuilder().withInlineCompaction(false).withMaxDeltaSecondsBeforeCompaction(5).withInlineCompactionTriggerStrategy(CompactionTriggerStrategy.TIME_ELAPSED).build()).build();
    String instantTime;
    List<String> instants = IntStream.range(0, 2).mapToObj(i -> HoodieActiveTimeline.createNewInstantTime()).collect(Collectors.toList());
    try (SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(cfg)) {
        List<HoodieRecord> records = dataGen.generateInserts(instants.get(0), 100);
        HoodieReadClient readClient = getHoodieReadClient(cfg.getBasePath());
        runNextDeltaCommits(writeClient, readClient, instants, records, cfg, true, new ArrayList<>());
        // Schedule compaction instantTime, make it in-flight (simulates inline compaction failing)
        instantTime = HoodieActiveTimeline.createNewInstantTime(10000);
        scheduleCompaction(instantTime, writeClient, cfg);
        moveCompactionFromRequestedToInflight(instantTime, cfg);
    }
    // When: commit happens after 10s
    HoodieWriteConfig inlineCfg = getConfigForInlineCompaction(5, 10, CompactionTriggerStrategy.TIME_ELAPSED);
    String instantTime2;
    try (SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(inlineCfg)) {
        HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(cfg.getBasePath()).build();
        instantTime2 = HoodieActiveTimeline.createNewInstantTime();
        createNextDeltaCommit(instantTime2, dataGen.generateUpdates(instantTime2, 10), writeClient, metaClient, inlineCfg, false);
    }
    // Then: 1 delta commit is done, the failed compaction is retried
    metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(cfg.getBasePath()).build();
    assertEquals(4, metaClient.getActiveTimeline().getWriteTimeline().countInstants());
    assertEquals(instantTime, metaClient.getActiveTimeline().getCommitTimeline().filterCompletedInstants().firstInstant().get().getTimestamp());
}
Also used : IntStream(java.util.stream.IntStream) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) Arrays(java.util.Arrays) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) WriteMarkersFactory(org.apache.hudi.table.marker.WriteMarkersFactory) Collectors(java.util.stream.Collectors) ArrayList(java.util.ArrayList) HoodieCompactionConfig(org.apache.hudi.config.HoodieCompactionConfig) Test(org.junit.jupiter.api.Test) HoodieSparkTable(org.apache.hudi.table.HoodieSparkTable) List(java.util.List) SparkRDDWriteClient(org.apache.hudi.client.SparkRDDWriteClient) HoodieReadClient(org.apache.hudi.client.HoodieReadClient) Assertions.assertFalse(org.junit.jupiter.api.Assertions.assertFalse) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) Assertions.assertEquals(org.junit.jupiter.api.Assertions.assertEquals) HoodieActiveTimeline(org.apache.hudi.common.table.timeline.HoodieActiveTimeline) HoodieTimeline(org.apache.hudi.common.table.timeline.HoodieTimeline) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) HoodieReadClient(org.apache.hudi.client.HoodieReadClient) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) Test(org.junit.jupiter.api.Test)

Example 10 with HoodieReadClient

use of org.apache.hudi.client.HoodieReadClient in project hudi by apache.

the class TestInlineCompaction method testCompactionIsNotScheduledEarly.

@Test
public void testCompactionIsNotScheduledEarly() throws Exception {
    // Given: make two commits
    HoodieWriteConfig cfg = getConfigForInlineCompaction(3, 60, CompactionTriggerStrategy.NUM_COMMITS);
    try (SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(cfg)) {
        List<HoodieRecord> records = dataGen.generateInserts(HoodieActiveTimeline.createNewInstantTime(), 100);
        HoodieReadClient readClient = getHoodieReadClient(cfg.getBasePath());
        List<String> instants = IntStream.range(0, 2).mapToObj(i -> HoodieActiveTimeline.createNewInstantTime()).collect(Collectors.toList());
        runNextDeltaCommits(writeClient, readClient, instants, records, cfg, true, new ArrayList<>());
        HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(cfg.getBasePath()).build();
        // Then: ensure no compaction is executed since there are only 2 delta commits
        assertEquals(2, metaClient.getActiveTimeline().getWriteTimeline().countInstants());
    }
}
Also used : IntStream(java.util.stream.IntStream) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) Arrays(java.util.Arrays) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) WriteMarkersFactory(org.apache.hudi.table.marker.WriteMarkersFactory) Collectors(java.util.stream.Collectors) ArrayList(java.util.ArrayList) HoodieCompactionConfig(org.apache.hudi.config.HoodieCompactionConfig) Test(org.junit.jupiter.api.Test) HoodieSparkTable(org.apache.hudi.table.HoodieSparkTable) List(java.util.List) SparkRDDWriteClient(org.apache.hudi.client.SparkRDDWriteClient) HoodieReadClient(org.apache.hudi.client.HoodieReadClient) Assertions.assertFalse(org.junit.jupiter.api.Assertions.assertFalse) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) Assertions.assertEquals(org.junit.jupiter.api.Assertions.assertEquals) HoodieActiveTimeline(org.apache.hudi.common.table.timeline.HoodieActiveTimeline) HoodieTimeline(org.apache.hudi.common.table.timeline.HoodieTimeline) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) HoodieReadClient(org.apache.hudi.client.HoodieReadClient) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) Test(org.junit.jupiter.api.Test)

Aggregations

HoodieReadClient (org.apache.hudi.client.HoodieReadClient)18 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)18 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)18 Test (org.junit.jupiter.api.Test)18 SparkRDDWriteClient (org.apache.hudi.client.SparkRDDWriteClient)17 HoodieTableMetaClient (org.apache.hudi.common.table.HoodieTableMetaClient)17 ArrayList (java.util.ArrayList)11 Arrays (java.util.Arrays)8 List (java.util.List)8 Collectors (java.util.stream.Collectors)8 HoodieActiveTimeline (org.apache.hudi.common.table.timeline.HoodieActiveTimeline)8 HoodieTimeline (org.apache.hudi.common.table.timeline.HoodieTimeline)8 Assertions.assertEquals (org.junit.jupiter.api.Assertions.assertEquals)8 Assertions.assertFalse (org.junit.jupiter.api.Assertions.assertFalse)8 IntStream (java.util.stream.IntStream)7 HoodieCompactionConfig (org.apache.hudi.config.HoodieCompactionConfig)7 HoodieSparkTable (org.apache.hudi.table.HoodieSparkTable)7 WriteMarkersFactory (org.apache.hudi.table.marker.WriteMarkersFactory)7 HoodieInstant (org.apache.hudi.common.table.timeline.HoodieInstant)6 HoodieTable (org.apache.hudi.table.HoodieTable)5