Search in sources :

Example 1 with HoodieBackedTableMetadata

use of org.apache.hudi.metadata.HoodieBackedTableMetadata in project hudi by apache.

the class MetadataCommand method listPartitions.

@CliCommand(value = "metadata list-partitions", help = "List all partitions from metadata")
public String listPartitions(@CliOption(key = "sparkMaster", unspecifiedDefaultValue = SparkUtil.DEFAULT_SPARK_MASTER, help = "Spark master") final String master) throws IOException {
    HoodieCLI.getTableMetaClient();
    initJavaSparkContext(Option.of(master));
    HoodieMetadataConfig config = HoodieMetadataConfig.newBuilder().enable(true).build();
    HoodieBackedTableMetadata metadata = new HoodieBackedTableMetadata(new HoodieSparkEngineContext(jsc), config, HoodieCLI.basePath, "/tmp");
    if (!metadata.enabled()) {
        return "[ERROR] Metadata Table not enabled/initialized\n\n";
    }
    HoodieTimer timer = new HoodieTimer().startTimer();
    List<String> partitions = metadata.getAllPartitionPaths();
    LOG.debug("Took " + timer.endTimer() + " ms");
    final List<Comparable[]> rows = new ArrayList<>();
    partitions.stream().sorted(Comparator.reverseOrder()).forEach(p -> {
        Comparable[] row = new Comparable[1];
        row[0] = p;
        rows.add(row);
    });
    TableHeader header = new TableHeader().addTableHeaderField("partition");
    return HoodiePrintHelper.print(header, new HashMap<>(), "", false, Integer.MAX_VALUE, false, rows);
}
Also used : HoodieSparkEngineContext(org.apache.hudi.client.common.HoodieSparkEngineContext) HoodieMetadataConfig(org.apache.hudi.common.config.HoodieMetadataConfig) TableHeader(org.apache.hudi.cli.TableHeader) ArrayList(java.util.ArrayList) HoodieTimer(org.apache.hudi.common.util.HoodieTimer) HoodieBackedTableMetadata(org.apache.hudi.metadata.HoodieBackedTableMetadata) CliCommand(org.springframework.shell.core.annotation.CliCommand)

Example 2 with HoodieBackedTableMetadata

use of org.apache.hudi.metadata.HoodieBackedTableMetadata in project hudi by apache.

the class TestHoodieBackedTableMetadata method testMetadataTableKeyGenerator.

/**
 * Verify if the Metadata table is constructed with table properties including
 * the right key generator class name.
 */
@ParameterizedTest
@EnumSource(HoodieTableType.class)
public void testMetadataTableKeyGenerator(final HoodieTableType tableType) throws Exception {
    init(tableType);
    HoodieBackedTableMetadata tableMetadata = new HoodieBackedTableMetadata(context, writeConfig.getMetadataConfig(), writeConfig.getBasePath(), writeConfig.getSpillableMapBasePath(), false);
    assertEquals(HoodieTableMetadataKeyGenerator.class.getCanonicalName(), tableMetadata.getMetadataMetaClient().getTableConfig().getKeyGeneratorClassName());
}
Also used : HoodieTableMetadataKeyGenerator(org.apache.hudi.metadata.HoodieTableMetadataKeyGenerator) HoodieBackedTableMetadata(org.apache.hudi.metadata.HoodieBackedTableMetadata) EnumSource(org.junit.jupiter.params.provider.EnumSource) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest)

Example 3 with HoodieBackedTableMetadata

use of org.apache.hudi.metadata.HoodieBackedTableMetadata in project hudi by apache.

the class TestHoodieBackedTableMetadata method testNotExistPartition.

/**
 * [HUDI-2852] Table metadata returns empty for non-exist partition.
 */
@ParameterizedTest
@EnumSource(HoodieTableType.class)
public void testNotExistPartition(final HoodieTableType tableType) throws Exception {
    init(tableType);
    HoodieBackedTableMetadata tableMetadata = new HoodieBackedTableMetadata(context, writeConfig.getMetadataConfig(), writeConfig.getBasePath(), writeConfig.getSpillableMapBasePath(), false);
    FileStatus[] allFilesInPartition = tableMetadata.getAllFilesInPartition(new Path(writeConfig.getBasePath() + "dummy"));
    assertEquals(allFilesInPartition.length, 0);
}
Also used : Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) HoodieBackedTableMetadata(org.apache.hudi.metadata.HoodieBackedTableMetadata) EnumSource(org.junit.jupiter.params.provider.EnumSource) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest)

Example 4 with HoodieBackedTableMetadata

use of org.apache.hudi.metadata.HoodieBackedTableMetadata in project hudi by apache.

the class TestHoodieBackedTableMetadata method verifyBaseMetadataTable.

private void verifyBaseMetadataTable() throws IOException {
    HoodieBackedTableMetadata tableMetadata = new HoodieBackedTableMetadata(context, writeConfig.getMetadataConfig(), writeConfig.getBasePath(), writeConfig.getSpillableMapBasePath(), false);
    assertTrue(tableMetadata.enabled());
    List<java.nio.file.Path> fsPartitionPaths = testTable.getAllPartitionPaths();
    List<String> fsPartitions = new ArrayList<>();
    fsPartitionPaths.forEach(entry -> fsPartitions.add(entry.getFileName().toString()));
    List<String> metadataPartitions = tableMetadata.getAllPartitionPaths();
    Collections.sort(fsPartitions);
    Collections.sort(metadataPartitions);
    assertEquals(fsPartitions.size(), metadataPartitions.size(), "Partitions should match");
    assertEquals(fsPartitions, metadataPartitions, "Partitions should match");
    // Files within each partition should match
    HoodieTable table = HoodieSparkTable.create(writeConfig, context, true);
    TableFileSystemView tableView = table.getHoodieView();
    List<String> fullPartitionPaths = fsPartitions.stream().map(partition -> basePath + "/" + partition).collect(Collectors.toList());
    Map<String, FileStatus[]> partitionToFilesMap = tableMetadata.getAllFilesInPartitions(fullPartitionPaths);
    assertEquals(fsPartitions.size(), partitionToFilesMap.size());
    fsPartitions.forEach(partition -> {
        try {
            validateFilesPerPartition(testTable, tableMetadata, tableView, partitionToFilesMap, partition);
        } catch (IOException e) {
            fail("Exception should not be raised: " + e);
        }
    });
}
Also used : Path(org.apache.hadoop.fs.Path) HoodieTable(org.apache.hudi.table.HoodieTable) Arrays(java.util.Arrays) ClosableIterator(org.apache.hudi.common.util.ClosableIterator) FileStatus(org.apache.hadoop.fs.FileStatus) Logger(org.apache.log4j.Logger) HoodieTableType(org.apache.hudi.common.model.HoodieTableType) Assertions.assertFalse(org.junit.jupiter.api.Assertions.assertFalse) HoodieDataBlock(org.apache.hudi.common.table.log.block.HoodieDataBlock) Map(java.util.Map) Path(org.apache.hadoop.fs.Path) HoodieTableMetadataKeyGenerator(org.apache.hudi.metadata.HoodieTableMetadataKeyGenerator) HoodieLogFormat(org.apache.hudi.common.table.log.HoodieLogFormat) Pair(org.apache.hadoop.hbase.util.Pair) Schema(org.apache.avro.Schema) HoodieMetadataPayload(org.apache.hudi.metadata.HoodieMetadataPayload) Collectors(java.util.stream.Collectors) Test(org.junit.jupiter.api.Test) CacheConfig(org.apache.hadoop.hbase.io.hfile.CacheConfig) MessageType(org.apache.parquet.schema.MessageType) HoodieBaseFile(org.apache.hudi.common.model.HoodieBaseFile) List(java.util.List) HoodieMetadataMergedLogRecordReader(org.apache.hudi.metadata.HoodieMetadataMergedLogRecordReader) Assertions.assertTrue(org.junit.jupiter.api.Assertions.assertTrue) TableFileSystemView(org.apache.hudi.common.table.view.TableFileSystemView) HoodieLogBlock(org.apache.hudi.common.table.log.block.HoodieLogBlock) Assertions.assertDoesNotThrow(org.junit.jupiter.api.Assertions.assertDoesNotThrow) Assertions.assertThrows(org.junit.jupiter.api.Assertions.assertThrows) Assertions.fail(org.junit.jupiter.api.Assertions.fail) AvroSchemaConverter(org.apache.parquet.avro.AvroSchemaConverter) HoodieBackedTableMetadata(org.apache.hudi.metadata.HoodieBackedTableMetadata) HoodieAvroUtils(org.apache.hudi.avro.HoodieAvroUtils) FileSlice(org.apache.hudi.common.model.FileSlice) Assertions.assertNull(org.junit.jupiter.api.Assertions.assertNull) EnumSource(org.junit.jupiter.params.provider.EnumSource) ArrayList(java.util.ArrayList) HoodieSparkTable(org.apache.hudi.table.HoodieSparkTable) MetadataPartitionType(org.apache.hudi.metadata.MetadataPartitionType) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) HoodieLogFile(org.apache.hudi.common.model.HoodieLogFile) ExternalSpillableMap(org.apache.hudi.common.util.collection.ExternalSpillableMap) Assertions.assertEquals(org.junit.jupiter.api.Assertions.assertEquals) IndexedRecord(org.apache.avro.generic.IndexedRecord) HoodieMetadataConfig(org.apache.hudi.common.config.HoodieMetadataConfig) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) GenericRecord(org.apache.avro.generic.GenericRecord) TableSchemaResolver(org.apache.hudi.common.table.TableSchemaResolver) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) HoodieTestTable(org.apache.hudi.common.testutils.HoodieTestTable) IOException(java.io.IOException) INSERT(org.apache.hudi.common.model.WriteOperationType.INSERT) HoodieRecordPayload(org.apache.hudi.common.model.HoodieRecordPayload) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest) HoodieMetadataRecord(org.apache.hudi.avro.model.HoodieMetadataRecord) HoodieHFileReader(org.apache.hudi.io.storage.HoodieHFileReader) LogManager(org.apache.log4j.LogManager) Collections(java.util.Collections) UPSERT(org.apache.hudi.common.model.WriteOperationType.UPSERT) HoodieTable(org.apache.hudi.table.HoodieTable) ArrayList(java.util.ArrayList) HoodieBackedTableMetadata(org.apache.hudi.metadata.HoodieBackedTableMetadata) IOException(java.io.IOException) TableFileSystemView(org.apache.hudi.common.table.view.TableFileSystemView)

Example 5 with HoodieBackedTableMetadata

use of org.apache.hudi.metadata.HoodieBackedTableMetadata in project hudi by apache.

the class MetadataCommand method listFiles.

@CliCommand(value = "metadata list-files", help = "Print a list of all files in a partition from the metadata")
public String listFiles(@CliOption(key = { "partition" }, help = "Name of the partition to list files", mandatory = true) final String partition) throws IOException {
    HoodieCLI.getTableMetaClient();
    HoodieMetadataConfig config = HoodieMetadataConfig.newBuilder().enable(true).build();
    HoodieBackedTableMetadata metaReader = new HoodieBackedTableMetadata(new HoodieLocalEngineContext(HoodieCLI.conf), config, HoodieCLI.basePath, "/tmp");
    if (!metaReader.enabled()) {
        return "[ERROR] Metadata Table not enabled/initialized\n\n";
    }
    HoodieTimer timer = new HoodieTimer().startTimer();
    FileStatus[] statuses = metaReader.getAllFilesInPartition(new Path(HoodieCLI.basePath, partition));
    LOG.debug("Took " + timer.endTimer() + " ms");
    final List<Comparable[]> rows = new ArrayList<>();
    Arrays.stream(statuses).sorted((p1, p2) -> p2.getPath().getName().compareTo(p1.getPath().getName())).forEach(f -> {
        Comparable[] row = new Comparable[1];
        row[0] = f;
        rows.add(row);
    });
    TableHeader header = new TableHeader().addTableHeaderField("file path");
    return HoodiePrintHelper.print(header, new HashMap<>(), "", false, Integer.MAX_VALUE, false, rows);
}
Also used : Path(org.apache.hadoop.fs.Path) Arrays(java.util.Arrays) HoodieBackedTableMetadata(org.apache.hudi.metadata.HoodieBackedTableMetadata) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Option(org.apache.hudi.common.util.Option) HashMap(java.util.HashMap) HoodieTimer(org.apache.hudi.common.util.HoodieTimer) FileStatus(org.apache.hadoop.fs.FileStatus) CliOption(org.springframework.shell.core.annotation.CliOption) ArrayList(java.util.ArrayList) HashSet(java.util.HashSet) Logger(org.apache.log4j.Logger) Map(java.util.Map) SparkHoodieBackedTableMetadataWriter(org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter) Path(org.apache.hadoop.fs.Path) HoodieSparkEngineContext(org.apache.hudi.client.common.HoodieSparkEngineContext) HoodieLocalEngineContext(org.apache.hudi.common.engine.HoodieLocalEngineContext) HoodieMetadataConfig(org.apache.hudi.common.config.HoodieMetadataConfig) CommandMarker(org.springframework.shell.core.CommandMarker) ValidationUtils(org.apache.hudi.common.util.ValidationUtils) CliCommand(org.springframework.shell.core.annotation.CliCommand) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) HoodieTableMetadata(org.apache.hudi.metadata.HoodieTableMetadata) TableHeader(org.apache.hudi.cli.TableHeader) Set(java.util.Set) IOException(java.io.IOException) SparkUtil(org.apache.hudi.cli.utils.SparkUtil) FileNotFoundException(java.io.FileNotFoundException) HoodieCLI(org.apache.hudi.cli.HoodieCLI) Component(org.springframework.stereotype.Component) List(java.util.List) HoodiePrintHelper(org.apache.hudi.cli.HoodiePrintHelper) LogManager(org.apache.log4j.LogManager) Comparator(java.util.Comparator) Collections(java.util.Collections) FileStatus(org.apache.hadoop.fs.FileStatus) TableHeader(org.apache.hudi.cli.TableHeader) ArrayList(java.util.ArrayList) HoodieTimer(org.apache.hudi.common.util.HoodieTimer) HoodieLocalEngineContext(org.apache.hudi.common.engine.HoodieLocalEngineContext) HoodieMetadataConfig(org.apache.hudi.common.config.HoodieMetadataConfig) HoodieBackedTableMetadata(org.apache.hudi.metadata.HoodieBackedTableMetadata) CliCommand(org.springframework.shell.core.annotation.CliCommand)

Aggregations

HoodieBackedTableMetadata (org.apache.hudi.metadata.HoodieBackedTableMetadata)7 ArrayList (java.util.ArrayList)5 HoodieMetadataConfig (org.apache.hudi.common.config.HoodieMetadataConfig)5 Map (java.util.Map)4 FileStatus (org.apache.hadoop.fs.FileStatus)4 Path (org.apache.hadoop.fs.Path)4 TableHeader (org.apache.hudi.cli.TableHeader)4 CliCommand (org.springframework.shell.core.annotation.CliCommand)4 HashMap (java.util.HashMap)3 HoodieLocalEngineContext (org.apache.hudi.common.engine.HoodieLocalEngineContext)3 HoodieTimer (org.apache.hudi.common.util.HoodieTimer)3 ParameterizedTest (org.junit.jupiter.params.ParameterizedTest)3 EnumSource (org.junit.jupiter.params.provider.EnumSource)3 IOException (java.io.IOException)2 Arrays (java.util.Arrays)2 Collections (java.util.Collections)2 HashSet (java.util.HashSet)2 List (java.util.List)2 HoodieSparkEngineContext (org.apache.hudi.client.common.HoodieSparkEngineContext)2 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)2