Search in sources :

Example 1 with HoodieBloomIndex

use of org.apache.hudi.index.bloom.HoodieBloomIndex in project hudi by apache.

the class TestHoodieCompactor method testWriteStatusContentsAfterCompaction.

@Test
public void testWriteStatusContentsAfterCompaction() throws Exception {
    // insert 100 records
    HoodieWriteConfig config = getConfigBuilder().withCompactionConfig(HoodieCompactionConfig.newBuilder().withMaxNumDeltaCommitsBeforeCompaction(1).build()).build();
    try (SparkRDDWriteClient writeClient = getHoodieWriteClient(config)) {
        String newCommitTime = "100";
        writeClient.startCommitWithTime(newCommitTime);
        List<HoodieRecord> records = dataGen.generateInserts(newCommitTime, 100);
        JavaRDD<HoodieRecord> recordsRDD = jsc.parallelize(records, 1);
        writeClient.insert(recordsRDD, newCommitTime).collect();
        // Update all the 100 records
        HoodieTable table = HoodieSparkTable.create(config, context);
        newCommitTime = "101";
        List<HoodieRecord> updatedRecords = dataGen.generateUpdates(newCommitTime, records);
        JavaRDD<HoodieRecord> updatedRecordsRDD = jsc.parallelize(updatedRecords, 1);
        HoodieIndex index = new HoodieBloomIndex(config, SparkHoodieBloomIndexHelper.getInstance());
        JavaRDD<HoodieRecord> updatedTaggedRecordsRDD = tagLocation(index, updatedRecordsRDD, table);
        writeClient.startCommitWithTime(newCommitTime);
        writeClient.upsertPreppedRecords(updatedTaggedRecordsRDD, newCommitTime).collect();
        metaClient.reloadActiveTimeline();
        // Verify that all data file has one log file
        table = HoodieSparkTable.create(config, context);
        for (String partitionPath : dataGen.getPartitionPaths()) {
            List<FileSlice> groupedLogFiles = table.getSliceView().getLatestFileSlices(partitionPath).collect(Collectors.toList());
            for (FileSlice fileSlice : groupedLogFiles) {
                assertEquals(1, fileSlice.getLogFiles().count(), "There should be 1 log file written for every data file");
            }
        }
        // Do a compaction
        table = HoodieSparkTable.create(config, context);
        String compactionInstantTime = "102";
        table.scheduleCompaction(context, compactionInstantTime, Option.empty());
        table.getMetaClient().reloadActiveTimeline();
        HoodieData<WriteStatus> result = (HoodieData<WriteStatus>) table.compact(context, compactionInstantTime).getWriteStatuses();
        // Verify that all partition paths are present in the WriteStatus result
        for (String partitionPath : dataGen.getPartitionPaths()) {
            List<WriteStatus> writeStatuses = result.collectAsList();
            assertTrue(writeStatuses.stream().filter(writeStatus -> writeStatus.getStat().getPartitionPath().contentEquals(partitionPath)).count() > 0);
        }
    }
}
Also used : HoodieData(org.apache.hudi.common.data.HoodieData) HoodieTable(org.apache.hudi.table.HoodieTable) Assertions.assertThrows(org.junit.jupiter.api.Assertions.assertThrows) BeforeEach(org.junit.jupiter.api.BeforeEach) HoodieInstant(org.apache.hudi.common.table.timeline.HoodieInstant) FileSlice(org.apache.hudi.common.model.FileSlice) HoodieTestDataGenerator(org.apache.hudi.common.testutils.HoodieTestDataGenerator) Option(org.apache.hudi.common.util.Option) SparkHoodieBloomIndexHelper(org.apache.hudi.index.bloom.SparkHoodieBloomIndexHelper) State(org.apache.hudi.common.table.timeline.HoodieInstant.State) HoodieBloomIndex(org.apache.hudi.index.bloom.HoodieBloomIndex) HoodieClientTestHarness(org.apache.hudi.testutils.HoodieClientTestHarness) HoodieTableType(org.apache.hudi.common.model.HoodieTableType) HoodieSparkTable(org.apache.hudi.table.HoodieSparkTable) Assertions.assertFalse(org.junit.jupiter.api.Assertions.assertFalse) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) Configuration(org.apache.hadoop.conf.Configuration) HoodieMemoryConfig(org.apache.hudi.config.HoodieMemoryConfig) HoodieStorageConfig(org.apache.hudi.config.HoodieStorageConfig) Assertions.assertEquals(org.junit.jupiter.api.Assertions.assertEquals) HoodieActiveTimeline(org.apache.hudi.common.table.timeline.HoodieActiveTimeline) HoodieTimeline(org.apache.hudi.common.table.timeline.HoodieTimeline) JavaRDD(org.apache.spark.api.java.JavaRDD) HoodieNotSupportedException(org.apache.hudi.exception.HoodieNotSupportedException) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieData(org.apache.hudi.common.data.HoodieData) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) Collectors(java.util.stream.Collectors) HoodieIndex(org.apache.hudi.index.HoodieIndex) HoodieCompactionConfig(org.apache.hudi.config.HoodieCompactionConfig) Test(org.junit.jupiter.api.Test) WriteStatus(org.apache.hudi.client.WriteStatus) AfterEach(org.junit.jupiter.api.AfterEach) List(java.util.List) SparkRDDWriteClient(org.apache.hudi.client.SparkRDDWriteClient) Assertions.assertTrue(org.junit.jupiter.api.Assertions.assertTrue) HoodieIndexConfig(org.apache.hudi.config.HoodieIndexConfig) HoodieCompactionPlan(org.apache.hudi.avro.model.HoodieCompactionPlan) HoodieTestUtils(org.apache.hudi.common.testutils.HoodieTestUtils) FSUtils(org.apache.hudi.common.fs.FSUtils) SparkRDDWriteClient(org.apache.hudi.client.SparkRDDWriteClient) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) FileSlice(org.apache.hudi.common.model.FileSlice) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) HoodieIndex(org.apache.hudi.index.HoodieIndex) HoodieBloomIndex(org.apache.hudi.index.bloom.HoodieBloomIndex) HoodieTable(org.apache.hudi.table.HoodieTable) WriteStatus(org.apache.hudi.client.WriteStatus) Test(org.junit.jupiter.api.Test)

Example 2 with HoodieBloomIndex

use of org.apache.hudi.index.bloom.HoodieBloomIndex in project hudi by apache.

the class TestHoodieIndexConfigs method testCreateIndex.

@ParameterizedTest
@EnumSource(value = IndexType.class, names = { "BLOOM", "GLOBAL_BLOOM", "SIMPLE", "GLOBAL_SIMPLE", "HBASE", "BUCKET" })
public void testCreateIndex(IndexType indexType) {
    HoodieWriteConfig config;
    HoodieWriteConfig.Builder clientConfigBuilder = HoodieWriteConfig.newBuilder();
    HoodieIndexConfig.Builder indexConfigBuilder = HoodieIndexConfig.newBuilder();
    switch(indexType) {
        case INMEMORY:
            config = clientConfigBuilder.withPath(basePath).withIndexConfig(indexConfigBuilder.withIndexType(HoodieIndex.IndexType.INMEMORY).build()).build();
            assertTrue(SparkHoodieIndexFactory.createIndex(config) instanceof HoodieInMemoryHashIndex);
            break;
        case BLOOM:
            config = clientConfigBuilder.withPath(basePath).withIndexConfig(indexConfigBuilder.withIndexType(HoodieIndex.IndexType.BLOOM).build()).build();
            assertTrue(SparkHoodieIndexFactory.createIndex(config) instanceof HoodieBloomIndex);
            break;
        case GLOBAL_BLOOM:
            config = clientConfigBuilder.withPath(basePath).withIndexConfig(indexConfigBuilder.withIndexType(IndexType.GLOBAL_BLOOM).build()).build();
            assertTrue(SparkHoodieIndexFactory.createIndex(config) instanceof HoodieGlobalBloomIndex);
            break;
        case SIMPLE:
            config = clientConfigBuilder.withPath(basePath).withIndexConfig(indexConfigBuilder.withIndexType(IndexType.SIMPLE).build()).build();
            assertTrue(SparkHoodieIndexFactory.createIndex(config) instanceof HoodieSimpleIndex);
            break;
        case HBASE:
            config = clientConfigBuilder.withPath(basePath).withIndexConfig(indexConfigBuilder.withIndexType(HoodieIndex.IndexType.HBASE).withHBaseIndexConfig(new HoodieHBaseIndexConfig.Builder().build()).build()).build();
            assertTrue(SparkHoodieIndexFactory.createIndex(config) instanceof SparkHoodieHBaseIndex);
            break;
        case BUCKET:
            config = clientConfigBuilder.withPath(basePath).withIndexConfig(indexConfigBuilder.withIndexType(IndexType.BUCKET).build()).build();
            assertTrue(SparkHoodieIndexFactory.createIndex(config) instanceof HoodieBucketIndex);
            break;
        default:
    }
}
Also used : HoodieBloomIndex(org.apache.hudi.index.bloom.HoodieBloomIndex) HoodieInMemoryHashIndex(org.apache.hudi.index.inmemory.HoodieInMemoryHashIndex) HoodieSimpleIndex(org.apache.hudi.index.simple.HoodieSimpleIndex) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) HoodieGlobalBloomIndex(org.apache.hudi.index.bloom.HoodieGlobalBloomIndex) HoodieBucketIndex(org.apache.hudi.index.bucket.HoodieBucketIndex) HoodieIndexConfig(org.apache.hudi.config.HoodieIndexConfig) SparkHoodieHBaseIndex(org.apache.hudi.index.hbase.SparkHoodieHBaseIndex) EnumSource(org.junit.jupiter.params.provider.EnumSource) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest)

Aggregations

HoodieIndexConfig (org.apache.hudi.config.HoodieIndexConfig)2 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)2 HoodieBloomIndex (org.apache.hudi.index.bloom.HoodieBloomIndex)2 List (java.util.List)1 Collectors (java.util.stream.Collectors)1 Configuration (org.apache.hadoop.conf.Configuration)1 HoodieCompactionPlan (org.apache.hudi.avro.model.HoodieCompactionPlan)1 SparkRDDWriteClient (org.apache.hudi.client.SparkRDDWriteClient)1 WriteStatus (org.apache.hudi.client.WriteStatus)1 HoodieData (org.apache.hudi.common.data.HoodieData)1 FSUtils (org.apache.hudi.common.fs.FSUtils)1 FileSlice (org.apache.hudi.common.model.FileSlice)1 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)1 HoodieTableType (org.apache.hudi.common.model.HoodieTableType)1 HoodieTableMetaClient (org.apache.hudi.common.table.HoodieTableMetaClient)1 HoodieActiveTimeline (org.apache.hudi.common.table.timeline.HoodieActiveTimeline)1 HoodieInstant (org.apache.hudi.common.table.timeline.HoodieInstant)1 State (org.apache.hudi.common.table.timeline.HoodieInstant.State)1 HoodieTimeline (org.apache.hudi.common.table.timeline.HoodieTimeline)1 HoodieTestDataGenerator (org.apache.hudi.common.testutils.HoodieTestDataGenerator)1