Search in sources :

Example 1 with HoodieFlinkWriteClient

use of org.apache.hudi.client.HoodieFlinkWriteClient in project hudi by apache.

the class FlinkHoodieBackedTableMetadataWriter method commit.

@Override
protected void commit(String instantTime, Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap, boolean canTriggerTableService) {
    ValidationUtils.checkState(enabled, "Metadata table cannot be committed to as it is not enabled");
    ValidationUtils.checkState(metadataMetaClient != null, "Metadata table is not fully initialized yet.");
    HoodieData<HoodieRecord> preppedRecords = prepRecords(partitionRecordsMap);
    List<HoodieRecord> preppedRecordList = HoodieList.getList(preppedRecords);
    try (HoodieFlinkWriteClient writeClient = new HoodieFlinkWriteClient(engineContext, metadataWriteConfig)) {
        if (!metadataMetaClient.getActiveTimeline().filterCompletedInstants().containsInstant(instantTime)) {
            // if this is a new commit being applied to metadata for the first time
            writeClient.startCommitWithTime(instantTime);
            metadataMetaClient.getActiveTimeline().transitionRequestedToInflight(HoodieActiveTimeline.DELTA_COMMIT_ACTION, instantTime);
        } else {
            // this code path refers to a re-attempted commit that got committed to metadata table, but failed in datatable.
            // for eg, lets say compaction c1 on 1st attempt succeeded in metadata table and failed before committing to datatable.
            // when retried again, data table will first rollback pending compaction. these will be applied to metadata table, but all changes
            // are upserts to metadata table and so only a new delta commit will be created.
            // once rollback is complete, compaction will be retried again, which will eventually hit this code block where the respective commit is
            // already part of completed commit. So, we have to manually remove the completed instant and proceed.
            // and it is for the same reason we enabled withAllowMultiWriteOnSameInstant for metadata table.
            HoodieInstant alreadyCompletedInstant = metadataMetaClient.getActiveTimeline().filterCompletedInstants().filter(entry -> entry.getTimestamp().equals(instantTime)).lastInstant().get();
            HoodieActiveTimeline.deleteInstantFile(metadataMetaClient.getFs(), metadataMetaClient.getMetaPath(), alreadyCompletedInstant);
            metadataMetaClient.reloadActiveTimeline();
        }
        List<WriteStatus> statuses = preppedRecordList.size() > 0 ? writeClient.upsertPreppedRecords(preppedRecordList, instantTime) : Collections.emptyList();
        statuses.forEach(writeStatus -> {
            if (writeStatus.hasErrors()) {
                throw new HoodieMetadataException("Failed to commit metadata table records at instant " + instantTime);
            }
        });
        // flink does not support auto-commit yet, also the auto commit logic is not complete as BaseHoodieWriteClient now.
        writeClient.commit(instantTime, statuses, Option.empty(), HoodieActiveTimeline.DELTA_COMMIT_ACTION, Collections.emptyMap());
        // reload timeline
        metadataMetaClient.reloadActiveTimeline();
        if (canTriggerTableService) {
            compactIfNecessary(writeClient, instantTime);
            cleanIfNecessary(writeClient, instantTime);
            writeClient.archive();
        }
    }
    // Update total size of the metadata and count of base/log files
    metrics.ifPresent(m -> m.updateSizeMetrics(metadataMetaClient, metadata));
}
Also used : HoodieInstant(org.apache.hudi.common.table.timeline.HoodieInstant) HoodieMetadataException(org.apache.hudi.exception.HoodieMetadataException) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieFlinkWriteClient(org.apache.hudi.client.HoodieFlinkWriteClient) WriteStatus(org.apache.hudi.client.WriteStatus)

Example 2 with HoodieFlinkWriteClient

use of org.apache.hudi.client.HoodieFlinkWriteClient in project hudi by apache.

the class StreamerUtil method createWriteClient.

/**
 * Creates the Flink write client.
 *
 * <p>This expects to be used by client, set flag {@code loadFsViewStorageConfig} to use
 * remote filesystem view storage config, or an in-memory filesystem view storage is used.
 */
@SuppressWarnings("rawtypes")
public static HoodieFlinkWriteClient createWriteClient(Configuration conf, RuntimeContext runtimeContext, boolean loadFsViewStorageConfig) {
    HoodieFlinkEngineContext context = new HoodieFlinkEngineContext(new SerializableConfiguration(getHadoopConf()), new FlinkTaskContextSupplier(runtimeContext));
    HoodieWriteConfig writeConfig = getHoodieClientConfig(conf, loadFsViewStorageConfig);
    return new HoodieFlinkWriteClient<>(context, writeConfig);
}
Also used : SerializableConfiguration(org.apache.hudi.common.config.SerializableConfiguration) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) HoodieFlinkWriteClient(org.apache.hudi.client.HoodieFlinkWriteClient) HoodieFlinkEngineContext(org.apache.hudi.client.common.HoodieFlinkEngineContext) FlinkTaskContextSupplier(org.apache.hudi.client.FlinkTaskContextSupplier)

Example 3 with HoodieFlinkWriteClient

use of org.apache.hudi.client.HoodieFlinkWriteClient in project hudi by apache.

the class StreamerUtil method createWriteClient.

/**
 * Creates the Flink write client.
 *
 * <p>This expects to be used by the driver, the client can then send requests for files view.
 *
 * <p>The task context supplier is a constant: the write token is always '0-1-0'.
 */
@SuppressWarnings("rawtypes")
public static HoodieFlinkWriteClient createWriteClient(Configuration conf) throws IOException {
    HoodieWriteConfig writeConfig = getHoodieClientConfig(conf, true, false);
    // build the write client to start the embedded timeline server
    final HoodieFlinkWriteClient writeClient = new HoodieFlinkWriteClient<>(HoodieFlinkEngineContext.DEFAULT, writeConfig);
    // create the filesystem view storage properties for client
    final FileSystemViewStorageConfig viewStorageConfig = writeConfig.getViewStorageConfig();
    // rebuild the view storage config with simplified options.
    FileSystemViewStorageConfig rebuilt = FileSystemViewStorageConfig.newBuilder().withStorageType(viewStorageConfig.getStorageType()).withRemoteServerHost(viewStorageConfig.getRemoteViewServerHost()).withRemoteServerPort(viewStorageConfig.getRemoteViewServerPort()).withRemoteTimelineClientTimeoutSecs(viewStorageConfig.getRemoteTimelineClientTimeoutSecs()).build();
    ViewStorageProperties.createProperties(conf.getString(FlinkOptions.PATH), rebuilt);
    return writeClient;
}
Also used : FileSystemViewStorageConfig(org.apache.hudi.common.table.view.FileSystemViewStorageConfig) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) HoodieFlinkWriteClient(org.apache.hudi.client.HoodieFlinkWriteClient)

Example 4 with HoodieFlinkWriteClient

use of org.apache.hudi.client.HoodieFlinkWriteClient in project hudi by apache.

the class TestWriteCopyOnWrite method testReuseEmbeddedServer.

@Test
public void testReuseEmbeddedServer() throws IOException {
    conf.setInteger("hoodie.filesystem.view.remote.timeout.secs", 500);
    HoodieFlinkWriteClient writeClient = StreamerUtil.createWriteClient(conf);
    FileSystemViewStorageConfig viewStorageConfig = writeClient.getConfig().getViewStorageConfig();
    assertSame(viewStorageConfig.getStorageType(), FileSystemViewStorageType.REMOTE_FIRST);
    // get another write client
    writeClient = StreamerUtil.createWriteClient(conf);
    assertSame(writeClient.getConfig().getViewStorageConfig().getStorageType(), FileSystemViewStorageType.REMOTE_FIRST);
    assertEquals(viewStorageConfig.getRemoteViewServerPort(), writeClient.getConfig().getViewStorageConfig().getRemoteViewServerPort());
    assertEquals(viewStorageConfig.getRemoteTimelineClientTimeoutSecs(), 500);
}
Also used : FileSystemViewStorageConfig(org.apache.hudi.common.table.view.FileSystemViewStorageConfig) HoodieFlinkWriteClient(org.apache.hudi.client.HoodieFlinkWriteClient) Test(org.junit.jupiter.api.Test)

Example 5 with HoodieFlinkWriteClient

use of org.apache.hudi.client.HoodieFlinkWriteClient in project hudi by apache.

the class ITTestHoodieFlinkCompactor method testHoodieFlinkCompactor.

@ParameterizedTest
@ValueSource(booleans = { true, false })
public void testHoodieFlinkCompactor(boolean enableChangelog) throws Exception {
    // Create hoodie table and insert into data.
    EnvironmentSettings settings = EnvironmentSettings.newInstance().inBatchMode().build();
    TableEnvironment tableEnv = TableEnvironmentImpl.create(settings);
    tableEnv.getConfig().getConfiguration().setInteger(ExecutionConfigOptions.TABLE_EXEC_RESOURCE_DEFAULT_PARALLELISM, 1);
    Map<String, String> options = new HashMap<>();
    options.put(FlinkOptions.COMPACTION_ASYNC_ENABLED.key(), "false");
    options.put(FlinkOptions.PATH.key(), tempFile.getAbsolutePath());
    options.put(FlinkOptions.TABLE_TYPE.key(), "MERGE_ON_READ");
    options.put(FlinkOptions.CHANGELOG_ENABLED.key(), enableChangelog + "");
    String hoodieTableDDL = TestConfigurations.getCreateHoodieTableDDL("t1", options);
    tableEnv.executeSql(hoodieTableDDL);
    tableEnv.executeSql(TestSQL.INSERT_T1).await();
    // wait for the asynchronous commit to finish
    TimeUnit.SECONDS.sleep(3);
    // Make configuration and setAvroSchema.
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    FlinkCompactionConfig cfg = new FlinkCompactionConfig();
    cfg.path = tempFile.getAbsolutePath();
    Configuration conf = FlinkCompactionConfig.toFlinkConfig(cfg);
    conf.setString(FlinkOptions.TABLE_TYPE.key(), "MERGE_ON_READ");
    // create metaClient
    HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
    // set the table name
    conf.setString(FlinkOptions.TABLE_NAME, metaClient.getTableConfig().getTableName());
    // set table schema
    CompactionUtil.setAvroSchema(conf, metaClient);
    // infer changelog mode
    CompactionUtil.inferChangelogMode(conf, metaClient);
    HoodieFlinkWriteClient writeClient = StreamerUtil.createWriteClient(conf);
    boolean scheduled = false;
    // judge whether have operation
    // To compute the compaction instant time and do compaction.
    Option<String> compactionInstantTimeOption = CompactionUtil.getCompactionInstantTime(metaClient);
    if (compactionInstantTimeOption.isPresent()) {
        scheduled = writeClient.scheduleCompactionAtInstant(compactionInstantTimeOption.get(), Option.empty());
    }
    String compactionInstantTime = compactionInstantTimeOption.get();
    assertTrue(scheduled, "The compaction plan should be scheduled");
    HoodieFlinkTable<?> table = writeClient.getHoodieTable();
    // generate compaction plan
    // should support configurable commit metadata
    HoodieCompactionPlan compactionPlan = CompactionUtils.getCompactionPlan(table.getMetaClient(), compactionInstantTime);
    HoodieInstant instant = HoodieTimeline.getCompactionRequestedInstant(compactionInstantTime);
    // Mark instant as compaction inflight
    table.getActiveTimeline().transitionCompactionRequestedToInflight(instant);
    env.addSource(new CompactionPlanSourceFunction(compactionPlan, compactionInstantTime)).name("compaction_source").uid("uid_compaction_source").rebalance().transform("compact_task", TypeInformation.of(CompactionCommitEvent.class), new ProcessOperator<>(new CompactFunction(conf))).setParallelism(compactionPlan.getOperations().size()).addSink(new CompactionCommitSink(conf)).name("clean_commits").uid("uid_clean_commits").setParallelism(1);
    env.execute("flink_hudi_compaction");
    writeClient.close();
    TestData.checkWrittenFullData(tempFile, EXPECTED1);
}
Also used : HoodieInstant(org.apache.hudi.common.table.timeline.HoodieInstant) ProcessOperator(org.apache.flink.streaming.api.operators.ProcessOperator) EnvironmentSettings(org.apache.flink.table.api.EnvironmentSettings) Configuration(org.apache.flink.configuration.Configuration) HashMap(java.util.HashMap) TableEnvironment(org.apache.flink.table.api.TableEnvironment) HoodieFlinkWriteClient(org.apache.hudi.client.HoodieFlinkWriteClient) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) HoodieCompactionPlan(org.apache.hudi.avro.model.HoodieCompactionPlan) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) ValueSource(org.junit.jupiter.params.provider.ValueSource) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest)

Aggregations

HoodieFlinkWriteClient (org.apache.hudi.client.HoodieFlinkWriteClient)5 HoodieInstant (org.apache.hudi.common.table.timeline.HoodieInstant)2 FileSystemViewStorageConfig (org.apache.hudi.common.table.view.FileSystemViewStorageConfig)2 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)2 HashMap (java.util.HashMap)1 Configuration (org.apache.flink.configuration.Configuration)1 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)1 ProcessOperator (org.apache.flink.streaming.api.operators.ProcessOperator)1 EnvironmentSettings (org.apache.flink.table.api.EnvironmentSettings)1 TableEnvironment (org.apache.flink.table.api.TableEnvironment)1 HoodieCompactionPlan (org.apache.hudi.avro.model.HoodieCompactionPlan)1 FlinkTaskContextSupplier (org.apache.hudi.client.FlinkTaskContextSupplier)1 WriteStatus (org.apache.hudi.client.WriteStatus)1 HoodieFlinkEngineContext (org.apache.hudi.client.common.HoodieFlinkEngineContext)1 SerializableConfiguration (org.apache.hudi.common.config.SerializableConfiguration)1 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)1 HoodieTableMetaClient (org.apache.hudi.common.table.HoodieTableMetaClient)1 HoodieMetadataException (org.apache.hudi.exception.HoodieMetadataException)1 Test (org.junit.jupiter.api.Test)1 ParameterizedTest (org.junit.jupiter.params.ParameterizedTest)1