Search in sources :

Example 1 with ScanStats

use of org.apache.drill.exec.physical.base.ScanStats in project drill by axbaretto.

the class OpenTSDBGroupScan method getScanStats.

@Override
public ScanStats getScanStats() {
    ServiceImpl client = storagePlugin.getClient();
    Map<String, String> params = fromRowData(openTSDBScanSpec.getTableName());
    Set<MetricDTO> allMetrics = client.getAllMetrics(params);
    long numMetrics = allMetrics.size();
    float approxDiskCost = 0;
    if (numMetrics != 0) {
        MetricDTO metricDTO = allMetrics.iterator().next();
        // This method estimates the sizes of Java objects (number of bytes of memory they occupy).
        // more detailed information about how this estimation method work you can find in this article
        // http://www.javaworld.com/javaworld/javaqa/2003-12/02-qa-1226-sizeof.html
        approxDiskCost = SizeEstimator.estimate(metricDTO) * numMetrics;
    }
    return new ScanStats(ScanStats.GroupScanProperty.EXACT_ROW_COUNT, numMetrics, 1, approxDiskCost);
}
Also used : MetricDTO(org.apache.drill.exec.store.openTSDB.dto.MetricDTO) ServiceImpl(org.apache.drill.exec.store.openTSDB.client.services.ServiceImpl) ScanStats(org.apache.drill.exec.physical.base.ScanStats)

Example 2 with ScanStats

use of org.apache.drill.exec.physical.base.ScanStats in project drill by axbaretto.

the class ScanPrel method computeSelfCost.

@Override
public RelOptCost computeSelfCost(final RelOptPlanner planner, RelMetadataQuery mq) {
    final PlannerSettings settings = PrelUtil.getPlannerSettings(planner);
    final ScanStats stats = this.groupScan.getScanStats(settings);
    final int columnCount = this.getRowType().getFieldCount();
    if (PrelUtil.getSettings(getCluster()).useDefaultCosting()) {
        return planner.getCostFactory().makeCost(stats.getRecordCount() * columnCount, stats.getCpuCost(), stats.getDiskCost());
    }
    // double rowCount = RelMetadataQuery.getRowCount(this);
    double rowCount = stats.getRecordCount();
    // As DRILL-4083 points out, when columnCount == 0, cpuCost becomes zero,
    // which makes the costs of HiveScan and HiveDrillNativeParquetScan the same
    // For now, assume cpu cost is proportional to row count.
    double cpuCost = rowCount * Math.max(columnCount, 1);
    // If a positive value for CPU cost is given multiply the default CPU cost by given CPU cost.
    if (stats.getCpuCost() > 0) {
        cpuCost *= stats.getCpuCost();
    }
    // Even though scan is reading from disk, in the currently generated plans all plans will
    // need to read the same amount of data, so keeping the disk io cost 0 is ok for now.
    // In the future we might consider alternative scans that go against projections or
    // different compression schemes etc that affect the amount of data read. Such alternatives
    // would affect both cpu and io cost.
    double ioCost = 0;
    DrillCostFactory costFactory = (DrillCostFactory) planner.getCostFactory();
    return costFactory.makeCost(rowCount, cpuCost, ioCost, 0);
}
Also used : DrillCostFactory(org.apache.drill.exec.planner.cost.DrillCostBase.DrillCostFactory) ScanStats(org.apache.drill.exec.physical.base.ScanStats)

Example 3 with ScanStats

use of org.apache.drill.exec.physical.base.ScanStats in project drill by apache.

the class IcebergGroupScan method getScanStats.

@Override
public ScanStats getScanStats() {
    int expectedRecordsPerChunk = 1_000_000;
    if (maxRecords >= 0) {
        expectedRecordsPerChunk = Math.max(maxRecords, 1);
    }
    int estimatedRecords = chunks.size() * expectedRecordsPerChunk;
    return new ScanStats(ScanStats.GroupScanProperty.NO_EXACT_ROW_COUNT, estimatedRecords, 1, 0);
}
Also used : DrillbitEndpoint(org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint) ScanStats(org.apache.drill.exec.physical.base.ScanStats)

Example 4 with ScanStats

use of org.apache.drill.exec.physical.base.ScanStats in project drill by apache.

the class OpenTSDBGroupScan method getScanStats.

@Override
public ScanStats getScanStats() {
    ServiceImpl client = storagePlugin.getClient();
    Map<String, String> params = fromRowData(openTSDBScanSpec.getTableName());
    Set<MetricDTO> allMetrics = client.getAllMetrics(params);
    long numMetrics = allMetrics.size();
    float approxDiskCost = 0;
    if (numMetrics != 0) {
        MetricDTO metricDTO = allMetrics.iterator().next();
        // This method estimates the sizes of Java objects (number of bytes of memory they occupy).
        // more detailed information about how this estimation method work you can find in this article
        // http://www.javaworld.com/javaworld/javaqa/2003-12/02-qa-1226-sizeof.html
        approxDiskCost = SizeEstimator.estimate(metricDTO) * numMetrics;
    }
    return new ScanStats(ScanStats.GroupScanProperty.EXACT_ROW_COUNT, numMetrics, 1, approxDiskCost);
}
Also used : MetricDTO(org.apache.drill.exec.store.openTSDB.dto.MetricDTO) ServiceImpl(org.apache.drill.exec.store.openTSDB.client.services.ServiceImpl) ScanStats(org.apache.drill.exec.physical.base.ScanStats)

Example 5 with ScanStats

use of org.apache.drill.exec.physical.base.ScanStats in project drill by apache.

the class DrillScanRel method computeSelfCost.

// TODO: this method is same as the one for ScanPrel...eventually we should consolidate
// this and few other methods in a common base class which would be extended
// by both logical and physical rels.
// TODO: Further changes may have caused the versions to diverge.
// TODO: Does not compute IO cost by default, but should. Changing that may break
// existing plugins.
@Override
public RelOptCost computeSelfCost(final RelOptPlanner planner, RelMetadataQuery mq) {
    final ScanStats stats = getGroupScan().getScanStats(settings);
    int columnCount = Utilities.isStarQuery(columns) ? STAR_COLUMN_COST : getRowType().getFieldCount();
    // double rowCount = RelMetadataQuery.getRowCount(this);
    double rowCount = Math.max(1, stats.getRecordCount());
    double valueCount = rowCount * columnCount;
    if (PrelUtil.getSettings(getCluster()).useDefaultCosting()) {
        // the planner to control the cost model. That is, remove this path.
        return planner.getCostFactory().makeCost(valueCount, stats.getCpuCost(), stats.getDiskCost());
    }
    double cpuCost;
    double ioCost;
    if (stats.getGroupScanProperty().hasFullCost()) {
        cpuCost = stats.getCpuCost();
        ioCost = stats.getDiskCost();
    } else {
        // for now, assume cpu cost is proportional to row count and number of columns
        cpuCost = valueCount;
        // Default io cost should be proportional to valueCount
        ioCost = 0;
    }
    return planner.getCostFactory().makeCost(rowCount, cpuCost, ioCost);
}
Also used : ScanStats(org.apache.drill.exec.physical.base.ScanStats)

Aggregations

ScanStats (org.apache.drill.exec.physical.base.ScanStats)23 IOException (java.io.IOException)4 DrillRuntimeException (org.apache.drill.common.exceptions.DrillRuntimeException)4 DrillCostFactory (org.apache.drill.exec.planner.cost.DrillCostBase.DrillCostFactory)4 DynamicPojoRecordReader (org.apache.drill.exec.store.pojo.DynamicPojoRecordReader)4 CompleteFileWork (org.apache.drill.exec.store.schedule.CompleteFileWork)4 ArrayList (java.util.ArrayList)3 RelDataType (org.apache.calcite.rel.type.RelDataType)3 GroupScan (org.apache.drill.exec.physical.base.GroupScan)3 PluginCost (org.apache.drill.exec.planner.cost.PluginCost)3 MetadataDirectGroupScan (org.apache.drill.exec.store.direct.MetadataDirectGroupScan)3 MongoDatabase (com.mongodb.client.MongoDatabase)2 List (java.util.List)2 SchemaPath (org.apache.drill.common.expression.SchemaPath)2 DrillAggregateRel (org.apache.drill.exec.planner.logical.DrillAggregateRel)2 DrillProjectRel (org.apache.drill.exec.planner.logical.DrillProjectRel)2 DrillScanRel (org.apache.drill.exec.planner.logical.DrillScanRel)2 PlannerSettings (org.apache.drill.exec.planner.physical.PlannerSettings)2 FormatSelection (org.apache.drill.exec.store.dfs.FormatSelection)2 HiveStats (org.apache.drill.exec.store.hive.HiveMetadataProvider.HiveStats)2