Search in sources :

Example 1 with HiveMetadataProvider

use of org.apache.drill.exec.store.hive.HiveMetadataProvider in project drill by apache.

the class ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan method onMatch.

@Override
public void onMatch(RelOptRuleCall call) {
    try {
        DrillScanRel hiveScanRel = call.rel(0);
        PlannerSettings settings = PrelUtil.getPlannerSettings(call.getPlanner());
        HiveScan hiveScan = (HiveScan) hiveScanRel.getGroupScan();
        HiveReadEntry hiveReadEntry = hiveScan.getHiveReadEntry();
        HiveMetadataProvider hiveMetadataProvider = new HiveMetadataProvider(hiveScan.getUserName(), hiveReadEntry, hiveScan.getHiveConf());
        if (hiveMetadataProvider.getInputSplits(hiveReadEntry).isEmpty()) {
            // table is empty, use original scan
            return;
        }
        if (hiveScan.getHiveReadEntry().getTable().isSetPartitionKeys()) {
            logger.warn("Hive MapR-DB JSON Handler doesn't support table partitioning. Consider recreating table without " + "partitions");
        }
        DrillScanRel nativeScanRel = createNativeScanRel(hiveScanRel, settings);
        call.transformTo(nativeScanRel);
        /*
        Drill native scan should take precedence over Hive since it's more efficient and faster.
        Hive does not always give correct costing (i.e. for external tables Hive does not have number of rows
        and we calculate them approximately). On the contrary, Drill calculates number of rows exactly
        and thus Hive Scan can be chosen instead of Drill native scan because costings allegedly lower for Hive.
        To ensure Drill MapR-DB Json scan will be chosen, reduce Hive scan importance to 0.
       */
        call.getPlanner().setImportance(hiveScanRel, 0.0);
    } catch (Exception e) {
        // TODO: Improve error handling after allowing to throw IOException from StoragePlugin.getFormatPlugin()
        logger.warn("Failed to convert HiveScan to JsonScanSpec. Fallback to HiveMapR-DB connector.", e);
    }
}
Also used : DrillScanRel(org.apache.drill.exec.planner.logical.DrillScanRel) HiveReadEntry(org.apache.drill.exec.store.hive.HiveReadEntry) PlannerSettings(org.apache.drill.exec.planner.physical.PlannerSettings) HiveScan(org.apache.drill.exec.store.hive.HiveScan) HiveMetadataProvider(org.apache.drill.exec.store.hive.HiveMetadataProvider) IOException(java.io.IOException)

Example 2 with HiveMetadataProvider

use of org.apache.drill.exec.store.hive.HiveMetadataProvider in project drill by apache.

the class ConvertHiveParquetScanToDrillParquetScan method onMatch.

@Override
public void onMatch(RelOptRuleCall call) {
    try {
        final DrillScanRel hiveScanRel = call.rel(0);
        final HiveScan hiveScan = (HiveScan) hiveScanRel.getGroupScan();
        final PlannerSettings settings = PrelUtil.getPlannerSettings(call.getPlanner());
        final String partitionColumnLabel = settings.getFsPartitionColumnLabel();
        final Table hiveTable = hiveScan.getHiveReadEntry().getTable();
        final HiveReadEntry hiveReadEntry = hiveScan.getHiveReadEntry();
        final HiveMetadataProvider hiveMetadataProvider = new HiveMetadataProvider(hiveScan.getUserName(), hiveReadEntry, hiveScan.getHiveConf());
        final List<HiveMetadataProvider.LogicalInputSplit> logicalInputSplits = hiveMetadataProvider.getInputSplits(hiveReadEntry);
        if (logicalInputSplits.isEmpty()) {
            // table is empty, use original scan
            return;
        }
        final Map<String, String> partitionColMapping = getPartitionColMapping(hiveTable, partitionColumnLabel);
        final DrillScanRel nativeScanRel = createNativeScanRel(partitionColMapping, hiveScanRel, logicalInputSplits, settings.getOptions());
        if (hiveScanRel.getRowType().getFieldCount() == 0) {
            call.transformTo(nativeScanRel);
        } else {
            final DrillProjectRel projectRel = createProjectRel(hiveScanRel, partitionColMapping, nativeScanRel);
            call.transformTo(projectRel);
        }
        /*
        Drill native scan should take precedence over Hive since it's more efficient and faster.
        Hive does not always give correct costing (i.e. for external tables Hive does not have number of rows
        and we calculate them approximately). On the contrary, Drill calculates number of rows exactly
        and thus Hive Scan can be chosen instead of Drill native scan because costings allegedly lower for Hive.
        To ensure Drill native scan will be chosen, reduce Hive scan importance to 0.
       */
        call.getPlanner().setImportance(hiveScanRel, 0.0);
    } catch (final Exception e) {
        logger.warn("Failed to convert HiveScan to HiveDrillNativeParquetScan", e);
    }
}
Also used : DrillScanRel(org.apache.drill.exec.planner.logical.DrillScanRel) HiveReadEntry(org.apache.drill.exec.store.hive.HiveReadEntry) Table(org.apache.hadoop.hive.metastore.api.Table) PlannerSettings(org.apache.drill.exec.planner.physical.PlannerSettings) DrillProjectRel(org.apache.drill.exec.planner.logical.DrillProjectRel) HiveScan(org.apache.drill.exec.store.hive.HiveScan) HiveMetadataProvider(org.apache.drill.exec.store.hive.HiveMetadataProvider) IOException(java.io.IOException)

Aggregations

IOException (java.io.IOException)2 DrillScanRel (org.apache.drill.exec.planner.logical.DrillScanRel)2 PlannerSettings (org.apache.drill.exec.planner.physical.PlannerSettings)2 HiveMetadataProvider (org.apache.drill.exec.store.hive.HiveMetadataProvider)2 HiveReadEntry (org.apache.drill.exec.store.hive.HiveReadEntry)2 HiveScan (org.apache.drill.exec.store.hive.HiveScan)2 DrillProjectRel (org.apache.drill.exec.planner.logical.DrillProjectRel)1 Table (org.apache.hadoop.hive.metastore.api.Table)1