Search in sources :

Example 11 with CardinalityMergeException

use of com.clearspring.analytics.stream.cardinality.CardinalityMergeException in project shifu by ShifuML.

the class AutoTypeDistinctCountReducer method reduce.

@Override
protected void reduce(IntWritable key, Iterable<CountAndFrequentItemsWritable> values, Context context) throws IOException, InterruptedException {
    HyperLogLogPlus hyperLogLogPlus = null;
    Set<String> fis = new HashSet<String>();
    long count = 0, invalidCount = 0, validNumCount = 0;
    for (CountAndFrequentItemsWritable cfiw : values) {
        count += cfiw.getCount();
        invalidCount += cfiw.getInvalidCount();
        validNumCount += cfiw.getValidNumCount();
        fis.addAll(cfiw.getFrequetItems());
        if (hyperLogLogPlus == null) {
            hyperLogLogPlus = HyperLogLogPlus.Builder.build(cfiw.getHyperBytes());
        } else {
            try {
                hyperLogLogPlus = (HyperLogLogPlus) hyperLogLogPlus.merge(HyperLogLogPlus.Builder.build(cfiw.getHyperBytes()));
            } catch (CardinalityMergeException e) {
                throw new RuntimeException(e);
            }
        }
    }
    outputValue.set(count + ":" + invalidCount + ":" + validNumCount + ":" + hyperLogLogPlus.cardinality() + ":" + limitedFrequentItems(fis));
    context.write(key, outputValue);
}
Also used : HyperLogLogPlus(com.clearspring.analytics.stream.cardinality.HyperLogLogPlus) CardinalityMergeException(com.clearspring.analytics.stream.cardinality.CardinalityMergeException) HashSet(java.util.HashSet)

Example 12 with CardinalityMergeException

use of com.clearspring.analytics.stream.cardinality.CardinalityMergeException in project drill by apache.

the class NDVMergedStatistic method merge.

@Override
public void merge(MapVector input) {
    // Check the input is a Map Vector
    assert (input.getField().getType().getMinorType() == TypeProtos.MinorType.MAP);
    // Dependencies have been configured correctly
    assert (state == State.MERGE);
    for (ValueVector vv : input) {
        String colName = vv.getField().getName();
        HyperLogLog colHLLHolder = null;
        if (hllHolder.get(colName) != null) {
            colHLLHolder = hllHolder.get(colName);
        }
        NullableVarBinaryVector hllVector = (NullableVarBinaryVector) vv;
        NullableVarBinaryVector.Accessor accessor = hllVector.getAccessor();
        try {
            if (!accessor.isNull(0)) {
                ByteArrayInputStream bais = new ByteArrayInputStream(accessor.get(0), 0, vv.getBufferSize());
                HyperLogLog other = HyperLogLog.Builder.build(new DataInputStream(bais));
                if (colHLLHolder != null) {
                    colHLLHolder.addAll(other);
                    hllHolder.put(colName, colHLLHolder);
                } else {
                    hllHolder.put(colName, other);
                }
            }
        } catch (CardinalityMergeException ex) {
            throw new IllegalStateException("Failed to merge the NDV statistics");
        } catch (Exception ex) {
            throw new IllegalStateException(ex);
        }
    }
}
Also used : ValueVector(org.apache.drill.exec.vector.ValueVector) ByteArrayInputStream(java.io.ByteArrayInputStream) NullableVarBinaryVector(org.apache.drill.exec.vector.NullableVarBinaryVector) DataInputStream(java.io.DataInputStream) HyperLogLog(com.clearspring.analytics.stream.cardinality.HyperLogLog) CardinalityMergeException(com.clearspring.analytics.stream.cardinality.CardinalityMergeException) IOException(java.io.IOException) CardinalityMergeException(com.clearspring.analytics.stream.cardinality.CardinalityMergeException)

Aggregations

CardinalityMergeException (com.clearspring.analytics.stream.cardinality.CardinalityMergeException)12 HyperLogLog (com.clearspring.analytics.stream.cardinality.HyperLogLog)8 HyperLogLogPlus (com.clearspring.analytics.stream.cardinality.HyperLogLogPlus)2 ICardinality (com.clearspring.analytics.stream.cardinality.ICardinality)2 IOException (java.io.IOException)2 ByteArrayInputStream (java.io.ByteArrayInputStream)1 DataInputStream (java.io.DataInputStream)1 HashSet (java.util.HashSet)1 ColumnConfig (ml.shifu.shifu.container.obj.ColumnConfig)1 ColumnMetrics (ml.shifu.shifu.core.ColumnStatsCalculator.ColumnMetrics)1 CountAndFrequentItemsWritable (ml.shifu.shifu.core.autotype.CountAndFrequentItemsWritable)1 NullableVarBinaryVector (org.apache.drill.exec.vector.NullableVarBinaryVector)1 ValueVector (org.apache.drill.exec.vector.ValueVector)1