Search in sources :

Example 1 with HoodieVirtualKeyInfo

use of org.apache.hudi.hadoop.realtime.HoodieVirtualKeyInfo in project hudi by apache.

the class HoodieRealtimeInputFormatUtils method addRequiredProjectionFields.

public static void addRequiredProjectionFields(Configuration configuration, Option<HoodieVirtualKeyInfo> hoodieVirtualKeyInfo) {
    // Need this to do merge records in HoodieRealtimeRecordReader
    if (!hoodieVirtualKeyInfo.isPresent()) {
        addProjectionField(configuration, HoodieRecord.RECORD_KEY_METADATA_FIELD, HoodieInputFormatUtils.HOODIE_RECORD_KEY_COL_POS);
        addProjectionField(configuration, HoodieRecord.COMMIT_TIME_METADATA_FIELD, HoodieInputFormatUtils.HOODIE_COMMIT_TIME_COL_POS);
        addProjectionField(configuration, HoodieRecord.PARTITION_PATH_METADATA_FIELD, HoodieInputFormatUtils.HOODIE_PARTITION_PATH_COL_POS);
    } else {
        HoodieVirtualKeyInfo hoodieVirtualKey = hoodieVirtualKeyInfo.get();
        addProjectionField(configuration, hoodieVirtualKey.getRecordKeyField(), hoodieVirtualKey.getRecordKeyFieldIndex());
        addProjectionField(configuration, hoodieVirtualKey.getPartitionPathField(), hoodieVirtualKey.getPartitionPathFieldIndex());
    }
}
Also used : HoodieVirtualKeyInfo(org.apache.hudi.hadoop.realtime.HoodieVirtualKeyInfo)

Example 2 with HoodieVirtualKeyInfo

use of org.apache.hudi.hadoop.realtime.HoodieVirtualKeyInfo in project hudi by apache.

the class HoodieCopyOnWriteTableInputFormat method getHoodieVirtualKeyInfo.

protected static Option<HoodieVirtualKeyInfo> getHoodieVirtualKeyInfo(HoodieTableMetaClient metaClient) {
    HoodieTableConfig tableConfig = metaClient.getTableConfig();
    if (tableConfig.populateMetaFields()) {
        return Option.empty();
    }
    TableSchemaResolver tableSchemaResolver = new TableSchemaResolver(metaClient);
    try {
        Schema schema = tableSchemaResolver.getTableAvroSchema();
        return Option.of(new HoodieVirtualKeyInfo(tableConfig.getRecordKeyFieldProp(), tableConfig.getPartitionFieldProp(), schema.getField(tableConfig.getRecordKeyFieldProp()).pos(), schema.getField(tableConfig.getPartitionFieldProp()).pos()));
    } catch (Exception exception) {
        throw new HoodieException("Fetching table schema failed with exception ", exception);
    }
}
Also used : HoodieTableConfig(org.apache.hudi.common.table.HoodieTableConfig) Schema(org.apache.avro.Schema) TableSchemaResolver(org.apache.hudi.common.table.TableSchemaResolver) HoodieException(org.apache.hudi.exception.HoodieException) HoodieVirtualKeyInfo(org.apache.hudi.hadoop.realtime.HoodieVirtualKeyInfo) HoodieException(org.apache.hudi.exception.HoodieException) IOException(java.io.IOException) HoodieIOException(org.apache.hudi.exception.HoodieIOException) UnsupportedEncodingException(java.io.UnsupportedEncodingException)

Example 3 with HoodieVirtualKeyInfo

use of org.apache.hudi.hadoop.realtime.HoodieVirtualKeyInfo in project hudi by apache.

the class HoodieCopyOnWriteTableInputFormat method listStatusForSnapshotMode.

@Nonnull
private List<FileStatus> listStatusForSnapshotMode(JobConf job, Map<String, HoodieTableMetaClient> tableMetaClientMap, List<Path> snapshotPaths) throws IOException {
    HoodieLocalEngineContext engineContext = new HoodieLocalEngineContext(job);
    List<FileStatus> targetFiles = new ArrayList<>();
    TypedProperties props = new TypedProperties(new Properties());
    Map<HoodieTableMetaClient, List<Path>> groupedPaths = HoodieInputFormatUtils.groupSnapshotPathsByMetaClient(tableMetaClientMap.values(), snapshotPaths);
    for (Map.Entry<HoodieTableMetaClient, List<Path>> entry : groupedPaths.entrySet()) {
        HoodieTableMetaClient tableMetaClient = entry.getKey();
        List<Path> partitionPaths = entry.getValue();
        // Hive job might specify a max commit instant up to which table's state
        // should be examined. We simply pass it as query's instant to the file-index
        Option<String> queryCommitInstant = HoodieHiveUtils.getMaxCommit(job, tableMetaClient.getTableConfig().getTableName());
        boolean shouldIncludePendingCommits = HoodieHiveUtils.shouldIncludePendingCommits(job, tableMetaClient.getTableConfig().getTableName());
        HiveHoodieTableFileIndex fileIndex = new HiveHoodieTableFileIndex(engineContext, tableMetaClient, props, HoodieTableQueryType.SNAPSHOT, partitionPaths, queryCommitInstant, shouldIncludePendingCommits);
        Map<String, List<FileSlice>> partitionedFileSlices = fileIndex.listFileSlices();
        Option<HoodieVirtualKeyInfo> virtualKeyInfoOpt = getHoodieVirtualKeyInfo(tableMetaClient);
        targetFiles.addAll(partitionedFileSlices.values().stream().flatMap(Collection::stream).map(fileSlice -> createFileStatusUnchecked(fileSlice, fileIndex, virtualKeyInfoOpt)).collect(Collectors.toList()));
    }
    return targetFiles;
}
Also used : Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) ArrayList(java.util.ArrayList) HoodieLocalEngineContext(org.apache.hudi.common.engine.HoodieLocalEngineContext) TypedProperties(org.apache.hudi.common.config.TypedProperties) Properties(java.util.Properties) TypedProperties(org.apache.hudi.common.config.TypedProperties) HoodieVirtualKeyInfo(org.apache.hudi.hadoop.realtime.HoodieVirtualKeyInfo) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) ArrayList(java.util.ArrayList) List(java.util.List) Map(java.util.Map) Nonnull(javax.annotation.Nonnull)

Aggregations

HoodieVirtualKeyInfo (org.apache.hudi.hadoop.realtime.HoodieVirtualKeyInfo)3 IOException (java.io.IOException)1 UnsupportedEncodingException (java.io.UnsupportedEncodingException)1 ArrayList (java.util.ArrayList)1 List (java.util.List)1 Map (java.util.Map)1 Properties (java.util.Properties)1 Nonnull (javax.annotation.Nonnull)1 Schema (org.apache.avro.Schema)1 FileStatus (org.apache.hadoop.fs.FileStatus)1 Path (org.apache.hadoop.fs.Path)1 TypedProperties (org.apache.hudi.common.config.TypedProperties)1 HoodieLocalEngineContext (org.apache.hudi.common.engine.HoodieLocalEngineContext)1 HoodieTableConfig (org.apache.hudi.common.table.HoodieTableConfig)1 HoodieTableMetaClient (org.apache.hudi.common.table.HoodieTableMetaClient)1 TableSchemaResolver (org.apache.hudi.common.table.TableSchemaResolver)1 HoodieException (org.apache.hudi.exception.HoodieException)1 HoodieIOException (org.apache.hudi.exception.HoodieIOException)1