Examples with FactScan - io.cdap.cdap.data2.dataset2.lib.timeseries.FactScan

Example 6 with FactScan

use of io.cdap.cdap.data2.dataset2.lib.timeseries.FactScan in project cdap by caskdata.

the class DefaultCube method query.

@Override
public Collection<TimeSeries> query(CubeQuery query) {
    /*
      CubeQuery example: "dataset read ops for app per dataset". Or:

      SELECT count('read.ops')                                           << measure name and type
      FROM aggregation1.1min_resolution                                  << aggregation and resolution
      GROUP BY dataset,                                                  << groupByDimensions
      WHERE namespace='ns1' AND app='myApp' AND program='myFlow' AND     << dimensionValues
            ts>=1423370200 AND ts{@literal<}1423398198                   << startTs and endTs
      LIMIT 100                                                          << limit

      Execution:

      1) (optional, if aggregation to query in is not provided) find aggregation to supply results

      Here, we need aggregation that has following dimensions: 'namespace', 'app', 'program', 'dataset'.

      Ideally (to reduce the scan range), 'dataset' should be in the end, other dimensions as close to the beginning
      as possible, and minimal number of other "unspecified" dimensions.

      Let's say we found aggregation: 'namespace', 'app', 'program', 'instance', 'dataset'

      2) build a scan in the aggregation

      For scan we set "any" into the dimension values that aggregation has but query doesn't define value for:

      'namespace'='ns1', 'app'='myApp', 'program'='myFlow', 'instance'=*, 'dataset'=*

      Plus specified measure & aggregation?:

      'measureName'='read.ops'
      'measureType'='COUNTER'

      3) While scanning build a table: dimension values -> time -> value. Use measureType as values aggregate
         function if needed.
    */
    incrementMetric("cube.query.request.count", 1);
    if (!resolutionToFactTable.containsKey(query.getResolution())) {
        incrementMetric("cube.query.request.failure.count", 1);
        throw new IllegalArgumentException("There's no data aggregated for specified resolution to satisfy the query: " + query.toString());
    }
    // 1) find aggregation to query
    Aggregation agg;
    String aggName;
    if (query.getAggregation() != null) {
        aggName = query.getAggregation();
        agg = aggregations.get(query.getAggregation());
        if (agg == null) {
            incrementMetric("cube.query.request.failure.count", 1);
            throw new IllegalArgumentException(String.format("Specified aggregation %s is not found in cube aggregations: %s", query.getAggregation(), aggregations.keySet().toString()));
        }
    } else {
        ImmutablePair<String, Aggregation> aggregation = findAggregation(query);
        if (aggregation == null) {
            incrementMetric("cube.query.request.failure.count", 1);
            throw new IllegalArgumentException("There's no data aggregated for specified dimensions " + "to satisfy the query: " + query.toString());
        }
        agg = aggregation.getSecond();
        aggName = aggregation.getFirst();
    }
    // tell how many queries end up querying specific pre-aggregated views and resolutions
    incrementMetric("cube.query.agg." + aggName + ".count", 1);
    incrementMetric("cube.query.res." + query.getResolution() + ".count", 1);
    // 2) build a scan for a query
    List<DimensionValue> dimensionValues = Lists.newArrayList();
    for (String dimensionName : agg.getDimensionNames()) {
        // if not defined in query, will be set as null, which means "any"
        dimensionValues.add(new DimensionValue(dimensionName, query.getDimensionValues().get(dimensionName)));
    }
    FactScan scan = new FactScan(query.getStartTs(), query.getEndTs(), query.getMeasurements().keySet(), dimensionValues);
    // 3) execute scan query
    FactTable table = resolutionToFactTable.get(query.getResolution());
    FactScanner scanner = table.scan(scan);
    Table<Map<String, String>, String, Map<Long, Long>> resultMap = getTimeSeries(query, scanner);
    incrementMetric("cube.query.request.success.count", 1);
    incrementMetric("cube.query.result.size", resultMap.size());
    Collection<TimeSeries> timeSeries = convertToQueryResult(query, resultMap);
    incrementMetric("cube.query.result.timeseries.count", timeSeries.size());
    return timeSeries;
}

Also used : FactScan(co.cask.cdap.data2.dataset2.lib.timeseries.FactScan) TimeSeries(co.cask.cdap.api.dataset.lib.cube.TimeSeries) FactScanner(co.cask.cdap.data2.dataset2.lib.timeseries.FactScanner) FactTable(co.cask.cdap.data2.dataset2.lib.timeseries.FactTable) DimensionValue(co.cask.cdap.api.dataset.lib.cube.DimensionValue) LinkedHashMap(java.util.LinkedHashMap) Map(java.util.Map)

Example 7 with FactScan

use of io.cdap.cdap.data2.dataset2.lib.timeseries.FactScan in project cdap by caskdata.

the class DefaultCube method query.

@Override
public Collection<TimeSeries> query(CubeQuery query) {
    /*
      CubeQuery example: "dataset read ops for app per dataset". Or:

      SELECT count('read.ops')                                           << measure name and type
      FROM aggregation1.1min_resolution                                  << aggregation and resolution
      GROUP BY dataset,                                                  << groupByDimensions
      WHERE namespace='ns1' AND app='myApp' AND program='myFlow' AND     << dimensionValues
            ts>=1423370200 AND ts{@literal<}1423398198                   << startTs and endTs
      LIMIT 100                                                          << limit

      Execution:

      1) (optional, if aggregation to query in is not provided) find aggregation to supply results

      Here, we need aggregation that has following dimensions: 'namespace', 'app', 'program', 'dataset'.

      Ideally (to reduce the scan range), 'dataset' should be in the end, other dimensions as close to the beginning
      as possible, and minimal number of other "unspecified" dimensions.

      Let's say we found aggregation: 'namespace', 'app', 'program', 'instance', 'dataset'

      2) build a scan in the aggregation

      For scan we set "any" into the dimension values that aggregation has but query doesn't define value for:

      'namespace'='ns1', 'app'='myApp', 'program'='myFlow', 'instance'=*, 'dataset'=*

      Plus specified measure & aggregation?:

      'measureName'='read.ops'
      'measureType'='COUNTER'

      3) While scanning build a table: dimension values -> time -> value. Use measureType as values aggregate
         function if needed.
    */
    incrementMetric("cube.query.request.count", 1);
    if (!resolutionToFactTable.containsKey(query.getResolution())) {
        incrementMetric("cube.query.request.failure.count", 1);
        throw new IllegalArgumentException("There's no data aggregated for specified resolution to satisfy the query: " + query.toString());
    }
    // 1) find aggregation to query
    Aggregation agg;
    String aggName;
    if (query.getAggregation() != null) {
        aggName = query.getAggregation();
        agg = aggregations.get(query.getAggregation());
        if (agg == null) {
            incrementMetric("cube.query.request.failure.count", 1);
            throw new IllegalArgumentException(String.format("Specified aggregation %s is not found in cube aggregations: %s", query.getAggregation(), aggregations.keySet().toString()));
        }
    } else {
        ImmutablePair<String, Aggregation> aggregation = findAggregation(query);
        if (aggregation == null) {
            incrementMetric("cube.query.request.failure.count", 1);
            throw new IllegalArgumentException("There's no data aggregated for specified dimensions " + "to satisfy the query: " + query.toString());
        }
        agg = aggregation.getSecond();
        aggName = aggregation.getFirst();
    }
    // tell how many queries end up querying specific pre-aggregated views and resolutions
    incrementMetric("cube.query.agg." + aggName + ".count", 1);
    incrementMetric("cube.query.res." + query.getResolution() + ".count", 1);
    // 2) build a scan for a query
    List<DimensionValue> dimensionValues = Lists.newArrayList();
    for (String dimensionName : agg.getDimensionNames()) {
        // if not defined in query, will be set as null, which means "any"
        dimensionValues.add(new DimensionValue(dimensionName, query.getDimensionValues().get(dimensionName)));
    }
    FactScan scan = new FactScan(query.getStartTs(), query.getEndTs(), query.getMeasurements().keySet(), dimensionValues);
    // 3) execute scan query
    FactTable table = resolutionToFactTable.get(query.getResolution());
    FactScanner scanner = table.scan(scan);
    Table<Map<String, String>, String, Map<Long, Long>> resultMap = getTimeSeries(query, scanner);
    incrementMetric("cube.query.request.success.count", 1);
    incrementMetric("cube.query.result.size", resultMap.size());
    Collection<TimeSeries> timeSeries = convertToQueryResult(query, resultMap);
    incrementMetric("cube.query.result.timeseries.count", timeSeries.size());
    return timeSeries;
}

Also used : FactScan(io.cdap.cdap.data2.dataset2.lib.timeseries.FactScan) TimeSeries(io.cdap.cdap.api.dataset.lib.cube.TimeSeries) FactScanner(io.cdap.cdap.data2.dataset2.lib.timeseries.FactScanner) FactTable(io.cdap.cdap.data2.dataset2.lib.timeseries.FactTable) DimensionValue(io.cdap.cdap.api.dataset.lib.cube.DimensionValue) Map(java.util.Map) HashMap(java.util.HashMap) LinkedHashMap(java.util.LinkedHashMap)

Example 8 with FactScan

use of io.cdap.cdap.data2.dataset2.lib.timeseries.FactScan in project cdap by caskdata.

the class FactTableTest method testMaxResolution.

@Test
public void testMaxResolution() throws Exception {
    // we use Integer.MAX_VALUE as resolution to compute all-time total values
    InMemoryTableService.create("TotalsEntityTable");
    InMemoryTableService.create("TotalsDataTable");
    int resolution = Integer.MAX_VALUE;
    // should not matter when resolution is max
    int rollTimebaseInterval = 3600;
    FactTable table = new FactTable(new InMemoryMetricsTable("TotalsDataTable"), new EntityTable(new InMemoryMetricsTable("TotalsEntityTable")), resolution, rollTimebaseInterval);
    // ts is expected in seconds
    long ts = System.currentTimeMillis() / 1000;
    int count = 1000;
    for (int i = 0; i < count; i++) {
        for (int k = 0; k < 10; k++) {
            // shift one day
            writeInc(table, "metric" + k, ts + i * 60 * 60 * 24, i * k, "dim" + k, "value" + k);
        }
    }
    for (int k = 0; k < 10; k++) {
        // 0, 0 should match timestamp of all data points
        FactScan scan = new FactScan(0, 0, "metric" + k, dimValues("dim" + k, "value" + k));
        Table<String, List<DimensionValue>, List<TimeValue>> expected = HashBasedTable.create();
        expected.put("metric" + k, dimValues("dim" + k, "value" + k), ImmutableList.of(new TimeValue(0, k * count * (count - 1) / 2)));
        assertScan(table, expected, scan);
    }
}

Also used : InMemoryMetricsTable(io.cdap.cdap.data2.dataset2.lib.table.inmemory.InMemoryMetricsTable) ArrayList(java.util.ArrayList) ImmutableList(com.google.common.collect.ImmutableList) List(java.util.List) TimeValue(io.cdap.cdap.api.dataset.lib.cube.TimeValue) Test(org.junit.Test)

Example 9 with FactScan

use of io.cdap.cdap.data2.dataset2.lib.timeseries.FactScan in project cdap by caskdata.

the class FactTable method findMeasureNames.

/**
 * Finds all measure names of the facts that match given {@link DimensionValue}s and time range.
 * @param allDimensionNames list of all dimension names to be present in the fact record
 * @param dimensionSlice dimension values to filter by, {@code null} means any non-null value.
 * @param startTs start timestamp, in sec
 * @param endTs end timestamp, in sec
 * @return {@link Set} of measure names
 */
// todo: pass a limit on number of measures returned
public Set<String> findMeasureNames(List<String> allDimensionNames, Map<String, String> dimensionSlice, long startTs, long endTs) {
    List<DimensionValue> allDimensions = Lists.newArrayList();
    for (String dimensionName : allDimensionNames) {
        allDimensions.add(new DimensionValue(dimensionName, dimensionSlice.get(dimensionName)));
    }
    byte[] startRow = codec.createStartRowKey(allDimensions, null, startTs, false);
    byte[] endRow = codec.createEndRowKey(allDimensions, null, endTs, false);
    endRow = Bytes.stopKeyForPrefix(endRow);
    FuzzyRowFilter fuzzyRowFilter = createFuzzyRowFilter(new FactScan(startTs, endTs, Collections.emptyList(), allDimensions), startRow);
    Set<String> measureNames = Sets.newHashSet();
    int scannedRecords = 0;
    try (Scanner scanner = timeSeriesTable.scan(startRow, endRow, fuzzyRowFilter)) {
        Row rowResult;
        while ((rowResult = scanner.next()) != null) {
            scannedRecords++;
            if (scannedRecords > MAX_RECORDS_TO_SCAN_DURING_SEARCH) {
                break;
            }
            byte[] rowKey = rowResult.getRow();
            // filter out columns by time range (scan configuration only filters whole rows)
            if (codec.getTimestamp(rowKey, codec.createColumn(startTs)) < startTs) {
                continue;
            }
            if (codec.getTimestamp(rowKey, codec.createColumn(endTs)) > endTs) {
                // we're done with scanner
                break;
            }
            measureNames.add(codec.getMeasureName(rowResult.getRow()));
        }
    }
    LOG.trace("search for measures completed, scanned records: {}", scannedRecords);
    return measureNames;
}

Also used : Scanner(io.cdap.cdap.api.dataset.table.Scanner) DimensionValue(io.cdap.cdap.api.dataset.lib.cube.DimensionValue) Row(io.cdap.cdap.api.dataset.table.Row) FuzzyRowFilter(io.cdap.cdap.data2.dataset2.lib.table.FuzzyRowFilter)

Example 10 with FactScan

use of io.cdap.cdap.data2.dataset2.lib.timeseries.FactScan in project cdap by caskdata.

the class FactTable method getScanner.

private Scanner getScanner(FactScan scan) {
    // sort the measures based on their entity ids and based on that get the start and end row key metric names
    List<String> measureNames = getSortedMeasures(scan.getMeasureNames());
    byte[] startRow = codec.createStartRowKey(scan.getDimensionValues(), measureNames.isEmpty() ? null : measureNames.get(0), scan.getStartTs(), false);
    byte[] endRow = codec.createEndRowKey(scan.getDimensionValues(), measureNames.isEmpty() ? null : measureNames.get(measureNames.size() - 1), scan.getEndTs(), false);
    byte[][] columns;
    if (Arrays.equals(startRow, endRow)) {
        // If on the same timebase, we only need subset of columns
        long timeBase = scan.getStartTs() / rollTime * rollTime;
        int startCol = (int) (scan.getStartTs() - timeBase) / resolution;
        int endCol = (int) (scan.getEndTs() - timeBase) / resolution;
        columns = new byte[endCol - startCol + 1][];
        for (int i = 0; i < columns.length; i++) {
            columns[i] = Bytes.toBytes((short) (startCol + i));
        }
    }
    endRow = Bytes.stopKeyForPrefix(endRow);
    FuzzyRowFilter fuzzyRowFilter = measureNames.isEmpty() ? createFuzzyRowFilter(scan, startRow) : createFuzzyRowFilter(scan, measureNames);
    if (LOG.isTraceEnabled()) {
        LOG.trace("Scanning fact table {} with scan: {}; constructed startRow: {}, endRow: {}, fuzzyRowFilter: {}", timeSeriesTable, scan, Bytes.toHexString(startRow), endRow == null ? null : Bytes.toHexString(endRow), fuzzyRowFilter);
    }
    return timeSeriesTable.scan(startRow, endRow, fuzzyRowFilter);
}

Also used : FuzzyRowFilter(io.cdap.cdap.data2.dataset2.lib.table.FuzzyRowFilter)

Aggregations

ImmutableList (com.google.common.collect.ImmutableList)6 List (java.util.List)6 Test (org.junit.Test)6 DimensionValue (co.cask.cdap.api.dataset.lib.cube.DimensionValue)5 DimensionValue (io.cdap.cdap.api.dataset.lib.cube.DimensionValue)5 FuzzyRowFilter (co.cask.cdap.data2.dataset2.lib.table.FuzzyRowFilter)3 InMemoryMetricsTable (co.cask.cdap.data2.dataset2.lib.table.inmemory.InMemoryMetricsTable)3 FuzzyRowFilter (io.cdap.cdap.data2.dataset2.lib.table.FuzzyRowFilter)3 InMemoryMetricsTable (io.cdap.cdap.data2.dataset2.lib.table.inmemory.InMemoryMetricsTable)3 ArrayList (java.util.ArrayList)3 TimeValue (co.cask.cdap.api.dataset.lib.cube.TimeValue)2 Row (co.cask.cdap.api.dataset.table.Row)2 Scanner (co.cask.cdap.api.dataset.table.Scanner)2 FactScan (co.cask.cdap.data2.dataset2.lib.timeseries.FactScan)2 FactTable (co.cask.cdap.data2.dataset2.lib.timeseries.FactTable)2 TimeValue (io.cdap.cdap.api.dataset.lib.cube.TimeValue)2 Row (io.cdap.cdap.api.dataset.table.Row)2 Scanner (io.cdap.cdap.api.dataset.table.Scanner)2 FactScan (io.cdap.cdap.data2.dataset2.lib.timeseries.FactScan)2 FactTable (io.cdap.cdap.data2.dataset2.lib.timeseries.FactTable)2