Search in sources :

Example 1 with HyperLogLog

use of com.clearspring.analytics.stream.cardinality.HyperLogLog in project pinot by linkedin.

the class DistinctCountHLLAggregationFunction method aggregate.

@Override
public void aggregate(int length, @Nonnull AggregationResultHolder aggregationResultHolder, @Nonnull BlockValSet... blockValSets) {
    HyperLogLog hyperLogLog = aggregationResultHolder.getResult();
    if (hyperLogLog == null) {
        hyperLogLog = new HyperLogLog(HllConstants.DEFAULT_LOG2M);
        aggregationResultHolder.setValue(hyperLogLog);
    }
    FieldSpec.DataType valueType = blockValSets[0].getValueType();
    switch(valueType) {
        case INT:
            int[] intValues = blockValSets[0].getIntValuesSV();
            for (int i = 0; i < length; i++) {
                hyperLogLog.offer(intValues[i]);
            }
            break;
        case LONG:
            long[] longValues = blockValSets[0].getLongValuesSV();
            for (int i = 0; i < length; i++) {
                hyperLogLog.offer(Long.valueOf(longValues[i]).hashCode());
            }
            break;
        case FLOAT:
            float[] floatValues = blockValSets[0].getFloatValuesSV();
            for (int i = 0; i < length; i++) {
                hyperLogLog.offer(Float.valueOf(floatValues[i]).hashCode());
            }
            break;
        case DOUBLE:
            double[] doubleValues = blockValSets[0].getDoubleValuesSV();
            for (int i = 0; i < length; i++) {
                hyperLogLog.offer(Double.valueOf(doubleValues[i]).hashCode());
            }
            break;
        case STRING:
            String[] stringValues = blockValSets[0].getStringValuesSV();
            for (int i = 0; i < length; i++) {
                hyperLogLog.offer(stringValues[i]);
            }
            break;
        default:
            throw new IllegalArgumentException("Illegal data type for distinct count aggregation function: " + valueType);
    }
}
Also used : HyperLogLog(com.clearspring.analytics.stream.cardinality.HyperLogLog) FieldSpec(com.linkedin.pinot.common.data.FieldSpec)

Example 2 with HyperLogLog

use of com.clearspring.analytics.stream.cardinality.HyperLogLog in project pinot by linkedin.

the class DistinctCountHLLMVAggregationFunction method getOrCreateHLLForKey.

/**
   * Returns the HLL for the given key. If one does not exist, creates a new one and returns that.
   *
   * @param groupByResultHolder Result holder
   * @param groupKey Group key for which to return the HLL
   * @return HLL for the group key
   */
private HyperLogLog getOrCreateHLLForKey(@Nonnull GroupByResultHolder groupByResultHolder, int groupKey) {
    HyperLogLog hyperLogLog = groupByResultHolder.getResult(groupKey);
    if (hyperLogLog == null) {
        hyperLogLog = new HyperLogLog(HllConstants.DEFAULT_LOG2M);
        groupByResultHolder.setValueForKey(groupKey, hyperLogLog);
    }
    return hyperLogLog;
}
Also used : HyperLogLog(com.clearspring.analytics.stream.cardinality.HyperLogLog)

Example 3 with HyperLogLog

use of com.clearspring.analytics.stream.cardinality.HyperLogLog in project pinot by linkedin.

the class FastHLLAggregationFunction method aggregate.

@Override
public void aggregate(int length, @Nonnull AggregationResultHolder aggregationResultHolder, @Nonnull BlockValSet... blockValSets) {
    String[] valueArray = blockValSets[0].getStringValuesSV();
    HyperLogLog hyperLogLog = aggregationResultHolder.getResult();
    if (hyperLogLog == null) {
        hyperLogLog = new HyperLogLog(_log2m);
        aggregationResultHolder.setValue(hyperLogLog);
    }
    for (int i = 0; i < length; i++) {
        try {
            hyperLogLog.addAll(HllUtil.convertStringToHll(valueArray[i]));
        } catch (CardinalityMergeException e) {
            throw new RuntimeException("Caught exception while aggregating HyperLogLog.", e);
        }
    }
}
Also used : HyperLogLog(com.clearspring.analytics.stream.cardinality.HyperLogLog) CardinalityMergeException(com.clearspring.analytics.stream.cardinality.CardinalityMergeException)

Example 4 with HyperLogLog

use of com.clearspring.analytics.stream.cardinality.HyperLogLog in project pinot by linkedin.

the class FastHLLAggregationFunction method aggregateGroupByMV.

@Override
public void aggregateGroupByMV(int length, @Nonnull int[][] groupKeysArray, @Nonnull GroupByResultHolder groupByResultHolder, @Nonnull BlockValSet... blockValSets) {
    String[] valueArray = blockValSets[0].getStringValuesSV();
    for (int i = 0; i < length; i++) {
        String value = valueArray[i];
        for (int groupKey : groupKeysArray[i]) {
            HyperLogLog hyperLogLog = groupByResultHolder.getResult(groupKey);
            if (hyperLogLog == null) {
                hyperLogLog = new HyperLogLog(_log2m);
                groupByResultHolder.setValueForKey(groupKey, hyperLogLog);
            }
            try {
                hyperLogLog.addAll(HllUtil.convertStringToHll(value));
            } catch (CardinalityMergeException e) {
                throw new RuntimeException("Caught exception while aggregating HyperLogLog.", e);
            }
        }
    }
}
Also used : HyperLogLog(com.clearspring.analytics.stream.cardinality.HyperLogLog) CardinalityMergeException(com.clearspring.analytics.stream.cardinality.CardinalityMergeException)

Example 5 with HyperLogLog

use of com.clearspring.analytics.stream.cardinality.HyperLogLog in project pinot by linkedin.

the class FastHLLMVAggregationFunction method aggregateGroupByMV.

@Override
public void aggregateGroupByMV(int length, @Nonnull int[][] groupKeysArray, @Nonnull GroupByResultHolder groupByResultHolder, @Nonnull BlockValSet... blockValSets) {
    String[][] valuesArray = blockValSets[0].getStringValuesMV();
    for (int i = 0; i < length; i++) {
        String[] values = valuesArray[i];
        for (int groupKey : groupKeysArray[i]) {
            HyperLogLog hyperLogLog = groupByResultHolder.getResult(groupKey);
            if (hyperLogLog == null) {
                hyperLogLog = new HyperLogLog(_log2m);
                groupByResultHolder.setValueForKey(groupKey, hyperLogLog);
            }
            try {
                for (String value : values) {
                    hyperLogLog.addAll(HllUtil.convertStringToHll(value));
                }
            } catch (CardinalityMergeException e) {
                throw new RuntimeException("Caught exception while aggregating HyperLogLog.", e);
            }
        }
    }
}
Also used : HyperLogLog(com.clearspring.analytics.stream.cardinality.HyperLogLog) CardinalityMergeException(com.clearspring.analytics.stream.cardinality.CardinalityMergeException)

Aggregations

HyperLogLog (com.clearspring.analytics.stream.cardinality.HyperLogLog)18 CardinalityMergeException (com.clearspring.analytics.stream.cardinality.CardinalityMergeException)8 ValueVector (org.apache.drill.exec.vector.ValueVector)4 IOException (java.io.IOException)3 NullableVarBinaryVector (org.apache.drill.exec.vector.NullableVarBinaryVector)3 FieldSpec (com.linkedin.pinot.common.data.FieldSpec)2 ByteArrayInputStream (java.io.ByteArrayInputStream)2 DataInputStream (java.io.DataInputStream)2 Dictionary (com.linkedin.pinot.core.segment.index.readers.Dictionary)1 NullableBigIntVector (org.apache.drill.exec.vector.NullableBigIntVector)1 Test (org.testng.annotations.Test)1