Examples with HiveReadEntry - org.apache.drill.exec.store.hive.HiveReadEntry

Example 6 with HiveReadEntry

use of org.apache.drill.exec.store.hive.HiveReadEntry in project drill by apache.

the class TableEntryCacheLoader method load.

@Override
@SuppressWarnings("NullableProblems")
public HiveReadEntry load(TableName key) throws Exception {
    Table table;
    List<Partition> partitions;
    synchronized (client) {
        table = getTable(key);
        partitions = getPartitions(key);
    }
    HiveTableWithColumnCache hiveTable = new HiveTableWithColumnCache(table, new ColumnListsCache(table));
    List<HiveTableWrapper.HivePartitionWrapper> partitionWrappers = getPartitionWrappers(partitions, hiveTable);
    return new HiveReadEntry(new HiveTableWrapper(hiveTable), partitionWrappers);
}

Also used : Partition(org.apache.hadoop.hive.metastore.api.Partition) HiveReadEntry(org.apache.drill.exec.store.hive.HiveReadEntry) Table(org.apache.hadoop.hive.metastore.api.Table) ColumnListsCache(org.apache.drill.exec.store.hive.ColumnListsCache) HiveTableWithColumnCache(org.apache.drill.exec.store.hive.HiveTableWithColumnCache) HiveTableWrapper(org.apache.drill.exec.store.hive.HiveTableWrapper)

Example 7 with HiveReadEntry

use of org.apache.drill.exec.store.hive.HiveReadEntry in project drill by apache.

the class ConvertHiveParquetScanToDrillParquetScan method onMatch.

@Override
public void onMatch(RelOptRuleCall call) {
    try {
        final DrillScanRel hiveScanRel = call.rel(0);
        final HiveScan hiveScan = (HiveScan) hiveScanRel.getGroupScan();
        final PlannerSettings settings = PrelUtil.getPlannerSettings(call.getPlanner());
        final String partitionColumnLabel = settings.getFsPartitionColumnLabel();
        final Table hiveTable = hiveScan.getHiveReadEntry().getTable();
        final HiveReadEntry hiveReadEntry = hiveScan.getHiveReadEntry();
        final HiveMetadataProvider hiveMetadataProvider = new HiveMetadataProvider(hiveScan.getUserName(), hiveReadEntry, hiveScan.getHiveConf());
        final List<HiveMetadataProvider.LogicalInputSplit> logicalInputSplits = hiveMetadataProvider.getInputSplits(hiveReadEntry);
        if (logicalInputSplits.isEmpty()) {
            // table is empty, use original scan
            return;
        }
        final Map<String, String> partitionColMapping = getPartitionColMapping(hiveTable, partitionColumnLabel);
        final DrillScanRel nativeScanRel = createNativeScanRel(partitionColMapping, hiveScanRel, logicalInputSplits, settings.getOptions());
        if (hiveScanRel.getRowType().getFieldCount() == 0) {
            call.transformTo(nativeScanRel);
        } else {
            final DrillProjectRel projectRel = createProjectRel(hiveScanRel, partitionColMapping, nativeScanRel);
            call.transformTo(projectRel);
        }
        /*
        Drill native scan should take precedence over Hive since it's more efficient and faster.
        Hive does not always give correct costing (i.e. for external tables Hive does not have number of rows
        and we calculate them approximately). On the contrary, Drill calculates number of rows exactly
        and thus Hive Scan can be chosen instead of Drill native scan because costings allegedly lower for Hive.
        To ensure Drill native scan will be chosen, reduce Hive scan importance to 0.
       */
        call.getPlanner().setImportance(hiveScanRel, 0.0);
    } catch (final Exception e) {
        logger.warn("Failed to convert HiveScan to HiveDrillNativeParquetScan", e);
    }
}

Also used : DrillScanRel(org.apache.drill.exec.planner.logical.DrillScanRel) HiveReadEntry(org.apache.drill.exec.store.hive.HiveReadEntry) Table(org.apache.hadoop.hive.metastore.api.Table) PlannerSettings(org.apache.drill.exec.planner.physical.PlannerSettings) DrillProjectRel(org.apache.drill.exec.planner.logical.DrillProjectRel) HiveScan(org.apache.drill.exec.store.hive.HiveScan) HiveMetadataProvider(org.apache.drill.exec.store.hive.HiveMetadataProvider) IOException(java.io.IOException)

Example 8 with HiveReadEntry

use of org.apache.drill.exec.store.hive.HiveReadEntry in project drill by apache.

the class HivePartitionDescriptor method createPartitionSublists.

@Override
protected void createPartitionSublists() {
    List<PartitionLocation> locations = new LinkedList<>();
    HiveReadEntry origEntry = ((HiveScan) scanRel.getGroupScan()).getHiveReadEntry();
    for (Partition partition : origEntry.getPartitions()) {
        locations.add(new HivePartitionLocation(partition.getValues(), new Path(partition.getSd().getLocation())));
    }
    locationSuperList = Lists.partition(locations, PartitionDescriptor.PARTITION_BATCH_SIZE);
    sublistsCreated = true;
}

Also used : Path(org.apache.hadoop.fs.Path) SchemaPath(org.apache.drill.common.expression.SchemaPath) Partition(org.apache.hadoop.hive.metastore.api.Partition) HiveReadEntry(org.apache.drill.exec.store.hive.HiveReadEntry) HiveScan(org.apache.drill.exec.store.hive.HiveScan) PartitionLocation(org.apache.drill.exec.planner.PartitionLocation) LinkedList(java.util.LinkedList)

Aggregations

HiveReadEntry (org.apache.drill.exec.store.hive.HiveReadEntry)8 HiveScan (org.apache.drill.exec.store.hive.HiveScan)7 PartitionLocation (org.apache.drill.exec.planner.PartitionLocation)4 IOException (java.io.IOException)3 SchemaPath (org.apache.drill.common.expression.SchemaPath)3 DrillScanRel (org.apache.drill.exec.planner.logical.DrillScanRel)3 PlannerSettings (org.apache.drill.exec.planner.physical.PlannerSettings)3 HiveMetadataProvider (org.apache.drill.exec.store.hive.HiveMetadataProvider)3 HiveTableWrapper (org.apache.drill.exec.store.hive.HiveTableWrapper)3 Partition (org.apache.hadoop.hive.metastore.api.Partition)3 LinkedList (java.util.LinkedList)2 Path (org.apache.hadoop.fs.Path)2 Table (org.apache.hadoop.hive.metastore.api.Table)2 List (java.util.List)1 Map (java.util.Map)1 Collectors (java.util.stream.Collectors)1 RelOptRuleCall (org.apache.calcite.plan.RelOptRuleCall)1 RelDataType (org.apache.calcite.rel.type.RelDataType)1 RelDataTypeFactory (org.apache.calcite.rel.type.RelDataTypeFactory)1 RelDataTypeField (org.apache.calcite.rel.type.RelDataTypeField)1