Search in sources :

Example 1 with HiveTablePartition

use of org.apache.flink.connectors.hive.HiveTablePartition in project flink by apache.

the class HivePartitionUtils method toHiveTablePartition.

public static HiveTablePartition toHiveTablePartition(List<String> partitionKeys, Properties tableProps, Partition partition) {
    StorageDescriptor sd = partition.getSd();
    Map<String, String> partitionSpec = new HashMap<>();
    for (int i = 0; i < partitionKeys.size(); i++) {
        String partitionColName = partitionKeys.get(i);
        String partitionValue = partition.getValues().get(i);
        partitionSpec.put(partitionColName, partitionValue);
    }
    return new HiveTablePartition(sd, partitionSpec, tableProps);
}
Also used : HiveTablePartition(org.apache.flink.connectors.hive.HiveTablePartition) HashMap(java.util.HashMap) StorageDescriptor(org.apache.hadoop.hive.metastore.api.StorageDescriptor)

Example 2 with HiveTablePartition

use of org.apache.flink.connectors.hive.HiveTablePartition in project flink by apache.

the class HivePartitionUtils method getAllPartitions.

/**
 * Returns all HiveTablePartitions of a hive table, returns single HiveTablePartition if the
 * hive table is not partitioned.
 */
public static List<HiveTablePartition> getAllPartitions(JobConf jobConf, String hiveVersion, ObjectPath tablePath, List<String> partitionColNames, List<Map<String, String>> remainingPartitions) {
    List<HiveTablePartition> allHivePartitions = new ArrayList<>();
    try (HiveMetastoreClientWrapper client = HiveMetastoreClientFactory.create(HiveConfUtils.create(jobConf), hiveVersion)) {
        String dbName = tablePath.getDatabaseName();
        String tableName = tablePath.getObjectName();
        Table hiveTable = client.getTable(dbName, tableName);
        Properties tableProps = HiveReflectionUtils.getTableMetadata(HiveShimLoader.loadHiveShim(hiveVersion), hiveTable);
        if (partitionColNames != null && partitionColNames.size() > 0) {
            List<Partition> partitions = new ArrayList<>();
            if (remainingPartitions != null) {
                for (Map<String, String> spec : remainingPartitions) {
                    partitions.add(client.getPartition(dbName, tableName, partitionSpecToValues(spec, partitionColNames)));
                }
            } else {
                partitions.addAll(client.listPartitions(dbName, tableName, (short) -1));
            }
            for (Partition partition : partitions) {
                HiveTablePartition hiveTablePartition = toHiveTablePartition(partitionColNames, tableProps, partition);
                allHivePartitions.add(hiveTablePartition);
            }
        } else {
            allHivePartitions.add(new HiveTablePartition(hiveTable.getSd(), tableProps));
        }
    } catch (TException e) {
        throw new FlinkHiveException("Failed to collect all partitions from hive metaStore", e);
    }
    return allHivePartitions;
}
Also used : TException(org.apache.thrift.TException) Partition(org.apache.hadoop.hive.metastore.api.Partition) HiveTablePartition(org.apache.flink.connectors.hive.HiveTablePartition) HiveTablePartition(org.apache.flink.connectors.hive.HiveTablePartition) Table(org.apache.hadoop.hive.metastore.api.Table) HiveMetastoreClientWrapper(org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper) FlinkHiveException(org.apache.flink.connectors.hive.FlinkHiveException) ArrayList(java.util.ArrayList) Properties(java.util.Properties)

Example 3 with HiveTablePartition

use of org.apache.flink.connectors.hive.HiveTablePartition in project flink by apache.

the class HiveTableInputFormat method open.

@Override
public void open(HiveTableInputSplit split) throws IOException {
    HiveTablePartition partition = split.getHiveTablePartition();
    if (!useMapRedReader && useOrcVectorizedRead(partition)) {
        this.reader = new HiveVectorizedOrcSplitReader(hiveVersion, jobConf.conf(), fieldNames, fieldTypes, selectedFields, split);
    } else if (!useMapRedReader && useParquetVectorizedRead(partition)) {
        this.reader = new HiveVectorizedParquetSplitReader(hiveVersion, jobConf.conf(), fieldNames, fieldTypes, selectedFields, split);
    } else {
        JobConf clonedConf = new JobConf(jobConf.conf());
        addSchemaToConf(clonedConf);
        this.reader = new HiveMapredSplitReader(clonedConf, partitionKeys, fieldTypes, selectedFields, split, HiveShimLoader.loadHiveShim(hiveVersion));
    }
    currentReadCount = 0L;
}
Also used : HiveTablePartition(org.apache.flink.connectors.hive.HiveTablePartition) JobConf(org.apache.hadoop.mapred.JobConf)

Example 4 with HiveTablePartition

use of org.apache.flink.connectors.hive.HiveTablePartition in project flink by apache.

the class HiveInputFormatPartitionReaderITCase method testReadFormat.

private void testReadFormat(TableEnvironment tableEnv, HiveCatalog hiveCatalog, String format) throws Exception {
    String tableName = prepareData(tableEnv, format);
    ObjectPath tablePath = new ObjectPath("default", tableName);
    TableSchema tableSchema = hiveCatalog.getTable(tablePath).getSchema();
    // create partition reader
    HiveInputFormatPartitionReader partitionReader = new HiveInputFormatPartitionReader(new Configuration(), new JobConf(hiveCatalog.getHiveConf()), hiveCatalog.getHiveVersion(), tablePath, tableSchema.getFieldDataTypes(), tableSchema.getFieldNames(), Collections.emptyList(), null, false);
    Table hiveTable = hiveCatalog.getHiveTable(tablePath);
    // create HiveTablePartition to read from
    HiveTablePartition tablePartition = new HiveTablePartition(hiveTable.getSd(), HiveReflectionUtils.getTableMetadata(HiveShimLoader.loadHiveShim(hiveCatalog.getHiveVersion()), hiveTable));
    partitionReader.open(Collections.singletonList(tablePartition));
    GenericRowData reuse = new GenericRowData(tableSchema.getFieldCount());
    int count = 0;
    // this follows the way the partition reader is used during lookup join
    while (partitionReader.read(reuse) != null) {
        count++;
    }
    assertEquals(CollectionUtil.iteratorToList(tableEnv.executeSql("select * from " + tableName).collect()).size(), count);
}
Also used : ObjectPath(org.apache.flink.table.catalog.ObjectPath) HiveTablePartition(org.apache.flink.connectors.hive.HiveTablePartition) Table(org.apache.hadoop.hive.metastore.api.Table) TableSchema(org.apache.flink.table.api.TableSchema) Configuration(org.apache.flink.configuration.Configuration) GenericRowData(org.apache.flink.table.data.GenericRowData) JobConf(org.apache.hadoop.mapred.JobConf)

Aggregations

HiveTablePartition (org.apache.flink.connectors.hive.HiveTablePartition)4 Table (org.apache.hadoop.hive.metastore.api.Table)2 JobConf (org.apache.hadoop.mapred.JobConf)2 ArrayList (java.util.ArrayList)1 HashMap (java.util.HashMap)1 Properties (java.util.Properties)1 Configuration (org.apache.flink.configuration.Configuration)1 FlinkHiveException (org.apache.flink.connectors.hive.FlinkHiveException)1 TableSchema (org.apache.flink.table.api.TableSchema)1 ObjectPath (org.apache.flink.table.catalog.ObjectPath)1 HiveMetastoreClientWrapper (org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper)1 GenericRowData (org.apache.flink.table.data.GenericRowData)1 Partition (org.apache.hadoop.hive.metastore.api.Partition)1 StorageDescriptor (org.apache.hadoop.hive.metastore.api.StorageDescriptor)1 TException (org.apache.thrift.TException)1