Search in sources :

Example 26 with HoodieCommitMetadata

use of org.apache.hudi.common.model.HoodieCommitMetadata in project hudi by apache.

the class HiveTestUtil method addCOWPartitions.

public static void addCOWPartitions(int numberOfPartitions, boolean isParquetSchemaSimple, boolean useSchemaFromCommitMetadata, ZonedDateTime startFrom, String instantTime) throws IOException, URISyntaxException {
    HoodieCommitMetadata commitMetadata = createPartitions(numberOfPartitions, isParquetSchemaSimple, useSchemaFromCommitMetadata, startFrom, instantTime, hiveSyncConfig.basePath);
    createdTablesSet.add(hiveSyncConfig.databaseName + "." + hiveSyncConfig.tableName);
    createCommitFile(commitMetadata, instantTime, hiveSyncConfig.basePath);
}
Also used : HoodieCommitMetadata(org.apache.hudi.common.model.HoodieCommitMetadata)

Example 27 with HoodieCommitMetadata

use of org.apache.hudi.common.model.HoodieCommitMetadata in project hudi by apache.

the class HiveTestUtil method createCOWTable.

public static void createCOWTable(String instantTime, int numberOfPartitions, boolean useSchemaFromCommitMetadata, String basePath, String databaseName, String tableName) throws IOException, URISyntaxException {
    Path path = new Path(basePath);
    FileIOUtils.deleteDirectory(new File(basePath));
    HoodieTableMetaClient.withPropertyBuilder().setTableType(HoodieTableType.COPY_ON_WRITE).setTableName(tableName).setPayloadClass(HoodieAvroPayload.class).initTable(configuration, basePath);
    boolean result = fileSystem.mkdirs(path);
    checkResult(result);
    ZonedDateTime dateTime = ZonedDateTime.now();
    HoodieCommitMetadata commitMetadata = createPartitions(numberOfPartitions, true, useSchemaFromCommitMetadata, dateTime, instantTime, basePath);
    createdTablesSet.add(databaseName + "." + tableName);
    createCommitFile(commitMetadata, instantTime, basePath);
}
Also used : Path(org.apache.hadoop.fs.Path) HoodieCommitMetadata(org.apache.hudi.common.model.HoodieCommitMetadata) ZonedDateTime(java.time.ZonedDateTime) HoodieBaseFile(org.apache.hudi.common.model.HoodieBaseFile) HoodieLogFile(org.apache.hudi.common.model.HoodieLogFile) File(java.io.File) HoodieAvroPayload(org.apache.hudi.common.model.HoodieAvroPayload)

Example 28 with HoodieCommitMetadata

use of org.apache.hudi.common.model.HoodieCommitMetadata in project hudi by apache.

the class TestCluster method createCOWTable.

public void createCOWTable(String commitTime, int numberOfPartitions, String dbName, String tableName) throws Exception {
    String tablePathStr = tablePath(dbName, tableName);
    Path path = new Path(tablePathStr);
    FileIOUtils.deleteDirectory(new File(path.toString()));
    HoodieTableMetaClient.withPropertyBuilder().setTableType(HoodieTableType.COPY_ON_WRITE).setTableName(tableName).setPayloadClass(HoodieAvroPayload.class).initTable(conf, path.toString());
    boolean result = dfsCluster.getFileSystem().mkdirs(path);
    if (!result) {
        throw new InitializationError("cannot initialize table");
    }
    ZonedDateTime dateTime = ZonedDateTime.now();
    HoodieCommitMetadata commitMetadata = createPartitions(numberOfPartitions, true, dateTime, commitTime, path.toString());
    createCommitFile(commitMetadata, commitTime, path.toString());
}
Also used : Path(org.apache.hadoop.fs.Path) HoodieCommitMetadata(org.apache.hudi.common.model.HoodieCommitMetadata) ZonedDateTime(java.time.ZonedDateTime) InitializationError(org.junit.runners.model.InitializationError) File(java.io.File) HoodieAvroPayload(org.apache.hudi.common.model.HoodieAvroPayload)

Example 29 with HoodieCommitMetadata

use of org.apache.hudi.common.model.HoodieCommitMetadata in project hudi by apache.

the class TestCluster method createPartitions.

private HoodieCommitMetadata createPartitions(int numberOfPartitions, boolean isParquetSchemaSimple, ZonedDateTime startFrom, String commitTime, String basePath) throws IOException, URISyntaxException {
    startFrom = startFrom.truncatedTo(ChronoUnit.DAYS);
    HoodieCommitMetadata commitMetadata = new HoodieCommitMetadata();
    for (int i = 0; i < numberOfPartitions; i++) {
        String partitionPath = startFrom.format(dtfOut);
        Path partPath = new Path(basePath + "/" + partitionPath);
        dfsCluster.getFileSystem().makeQualified(partPath);
        dfsCluster.getFileSystem().mkdirs(partPath);
        List<HoodieWriteStat> writeStats = createTestData(partPath, isParquetSchemaSimple, commitTime);
        startFrom = startFrom.minusDays(1);
        writeStats.forEach(s -> commitMetadata.addWriteStat(partitionPath, s));
    }
    return commitMetadata;
}
Also used : HoodieCommitMetadata(org.apache.hudi.common.model.HoodieCommitMetadata) Path(org.apache.hadoop.fs.Path) HoodieWriteStat(org.apache.hudi.common.model.HoodieWriteStat)

Example 30 with HoodieCommitMetadata

use of org.apache.hudi.common.model.HoodieCommitMetadata in project hudi by apache.

the class TestHiveSyncTool method testNotPickingOlderParquetFileWhenLatestCommitReadFailsForExistingTable.

@ParameterizedTest
@MethodSource("syncMode")
public void testNotPickingOlderParquetFileWhenLatestCommitReadFailsForExistingTable(String syncMode) throws Exception {
    hiveSyncConfig.syncMode = syncMode;
    HiveTestUtil.hiveSyncConfig.batchSyncNum = 2;
    final String commitTime = "100";
    HiveTestUtil.createCOWTable(commitTime, 1, true);
    HoodieCommitMetadata commitMetadata = new HoodieCommitMetadata();
    // create empty commit
    final String emptyCommitTime = "200";
    HiveTestUtil.createCommitFileWithSchema(commitMetadata, emptyCommitTime, true);
    // HiveTestUtil.createCommitFile(commitMetadata, emptyCommitTime);
    HoodieHiveClient hiveClient = new HoodieHiveClient(HiveTestUtil.hiveSyncConfig, HiveTestUtil.getHiveConf(), HiveTestUtil.fileSystem);
    assertFalse(hiveClient.doesTableExist(HiveTestUtil.hiveSyncConfig.tableName), "Table " + HiveTestUtil.hiveSyncConfig.tableName + " should not exist initially");
    HiveSyncTool tool = new HiveSyncTool(HiveTestUtil.hiveSyncConfig, HiveTestUtil.getHiveConf(), HiveTestUtil.fileSystem);
    tool.syncHoodieTable();
    verifyOldParquetFileTest(hiveClient, emptyCommitTime);
    // evolve the schema
    ZonedDateTime dateTime = ZonedDateTime.now().plusDays(6);
    String commitTime2 = "301";
    HiveTestUtil.addCOWPartitions(1, false, true, dateTime, commitTime2);
    // HiveTestUtil.createCommitFileWithSchema(commitMetadata, "400", false); // create another empty commit
    // HiveTestUtil.createCommitFile(commitMetadata, "400"); // create another empty commit
    tool = new HiveSyncTool(HiveTestUtil.hiveSyncConfig, HiveTestUtil.getHiveConf(), HiveTestUtil.fileSystem);
    HoodieHiveClient hiveClientLatest = new HoodieHiveClient(HiveTestUtil.hiveSyncConfig, HiveTestUtil.getHiveConf(), HiveTestUtil.fileSystem);
    // now delete the evolved commit instant
    Path fullPath = new Path(HiveTestUtil.hiveSyncConfig.basePath + "/" + HoodieTableMetaClient.METAFOLDER_NAME + "/" + hiveClientLatest.getActiveTimeline().getInstants().filter(inst -> inst.getTimestamp().equals(commitTime2)).findFirst().get().getFileName());
    assertTrue(HiveTestUtil.fileSystem.delete(fullPath, false));
    try {
        tool.syncHoodieTable();
    } catch (RuntimeException e) {
    // we expect the table sync to fail
    }
    // old sync values should be left intact
    verifyOldParquetFileTest(hiveClient, emptyCommitTime);
}
Also used : HoodieCommitMetadata(org.apache.hudi.common.model.HoodieCommitMetadata) Path(org.apache.hadoop.fs.Path) ImmutablePair(org.apache.hudi.common.util.collection.ImmutablePair) Assertions.assertThrows(org.junit.jupiter.api.Assertions.assertThrows) BeforeEach(org.junit.jupiter.api.BeforeEach) Arrays(java.util.Arrays) MetaException(org.apache.hadoop.hive.metastore.api.MetaException) URISyntaxException(java.net.URISyntaxException) ZonedDateTime(java.time.ZonedDateTime) Option(org.apache.hudi.common.util.Option) HashMap(java.util.HashMap) HiveTestUtil.ddlExecutor(org.apache.hudi.hive.testutils.HiveTestUtil.ddlExecutor) Partition(org.apache.hadoop.hive.metastore.api.Partition) ArrayList(java.util.ArrayList) AfterAll(org.junit.jupiter.api.AfterAll) StringUtils(org.apache.hudi.common.util.StringUtils) Assertions.assertFalse(org.junit.jupiter.api.Assertions.assertFalse) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) Locale(java.util.Locale) Map(java.util.Map) HiveTestUtil.fileSystem(org.apache.hudi.hive.testutils.HiveTestUtil.fileSystem) SchemaTestUtil(org.apache.hudi.common.testutils.SchemaTestUtil) Path(org.apache.hadoop.fs.Path) Assertions.assertEquals(org.junit.jupiter.api.Assertions.assertEquals) MethodSource(org.junit.jupiter.params.provider.MethodSource) PartitionEventType(org.apache.hudi.sync.common.AbstractSyncHoodieClient.PartitionEvent.PartitionEventType) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) Schema(org.apache.avro.Schema) Field(org.apache.avro.Schema.Field) HoodieCommitMetadata(org.apache.hudi.common.model.HoodieCommitMetadata) Driver(org.apache.hadoop.hive.ql.Driver) IOException(java.io.IOException) SessionState(org.apache.hadoop.hive.ql.session.SessionState) Collectors(java.util.stream.Collectors) ConfigUtils(org.apache.hudi.hive.util.ConfigUtils) Test(org.junit.jupiter.api.Test) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) AfterEach(org.junit.jupiter.api.AfterEach) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest) List(java.util.List) HiveTestUtil(org.apache.hudi.hive.testutils.HiveTestUtil) NetworkTestUtils(org.apache.hudi.common.testutils.NetworkTestUtils) Assertions.assertTrue(org.junit.jupiter.api.Assertions.assertTrue) PartitionEvent(org.apache.hudi.sync.common.AbstractSyncHoodieClient.PartitionEvent) WriteOperationType(org.apache.hudi.common.model.WriteOperationType) HiveTestUtil.hiveSyncConfig(org.apache.hudi.hive.testutils.HiveTestUtil.hiveSyncConfig) Assertions.assertDoesNotThrow(org.junit.jupiter.api.Assertions.assertDoesNotThrow) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) ZonedDateTime(java.time.ZonedDateTime) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest) MethodSource(org.junit.jupiter.params.provider.MethodSource)

Aggregations

HoodieCommitMetadata (org.apache.hudi.common.model.HoodieCommitMetadata)139 HoodieInstant (org.apache.hudi.common.table.timeline.HoodieInstant)64 ArrayList (java.util.ArrayList)54 HashMap (java.util.HashMap)49 List (java.util.List)48 HoodieWriteStat (org.apache.hudi.common.model.HoodieWriteStat)44 IOException (java.io.IOException)42 Test (org.junit.jupiter.api.Test)41 HoodieTimeline (org.apache.hudi.common.table.timeline.HoodieTimeline)40 Map (java.util.Map)38 Path (org.apache.hadoop.fs.Path)36 HoodieActiveTimeline (org.apache.hudi.common.table.timeline.HoodieActiveTimeline)34 ParameterizedTest (org.junit.jupiter.params.ParameterizedTest)34 File (java.io.File)26 HoodieTableMetaClient (org.apache.hudi.common.table.HoodieTableMetaClient)26 Option (org.apache.hudi.common.util.Option)25 Schema (org.apache.avro.Schema)22 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)21 Collectors (java.util.stream.Collectors)20 HoodieLogFile (org.apache.hudi.common.model.HoodieLogFile)20