Search in sources :

Example 1 with Union

use of com.yahoo.sketches.hll.Union in project sketches-pig by DataSketches.

the class AlgebraicIntermediate method exec.

@Override
public Tuple exec(final Tuple inputTuple) throws IOException {
    if (isFirstCall_) {
        Logger.getLogger(getClass()).info("Algebraic was used");
        isFirstCall_ = false;
    }
    if (inputTuple == null || inputTuple.size() == 0) {
        return getEmptySketchTuple();
    }
    final DataBag outerBag = (DataBag) inputTuple.get(0);
    if (outerBag == null) {
        return getEmptySketchTuple();
    }
    final Union union = new Union(lgK_);
    for (final Tuple dataTuple : outerBag) {
        // inputTuple.bag0.dataTupleN.f0
        final Object f0 = dataTuple.get(0);
        if (f0 == null) {
            continue;
        }
        if (f0 instanceof DataBag) {
            // inputTuple.bag0.dataTupleN.f0:bag
            final DataBag innerBag = (DataBag) f0;
            if (innerBag.size() == 0) {
                continue;
            }
            // If field 0 of a dataTuple is a Bag, all innerTuples of this inner bag
            // will be passed into the union.
            // It is due to system bagged outputs from multiple mapper Initial functions.
            // The Intermediate stage was bypassed.
            updateUnion(innerBag, union);
        } else if (f0 instanceof DataByteArray) {
            // inputTuple.bag0.dataTupleN.f0:DBA
            // If field 0 of a dataTuple is a DataByteArray, we assume it is a sketch
            // due to system bagged outputs from multiple mapper Intermediate functions.
            // Each dataTuple.DBA:sketch will merged into the union.
            final DataByteArray dba = (DataByteArray) f0;
            union.update(HllSketch.wrap(Memory.wrap(dba.get())));
        } else {
            // we should never get here
            throw new IllegalArgumentException("dataTuple.Field0 is not a DataBag or DataByteArray: " + f0.getClass().getName());
        }
    }
    return tupleFactory_.newTuple(new DataByteArray(union.getResult(tgtHllType_).toCompactByteArray()));
}
Also used : DataBag(org.apache.pig.data.DataBag) DataByteArray(org.apache.pig.data.DataByteArray) Union(com.yahoo.sketches.hll.Union) Tuple(org.apache.pig.data.Tuple)

Example 2 with Union

use of com.yahoo.sketches.hll.Union in project sketches-pig by DataSketches.

the class DataToSketch method accumulate.

/**
 * An <i>Accumulator</i> version of the standard <i>exec()</i> method. Like <i>exec()</i>,
 * accumulator is called with a bag of Datum Tuples. Unlike <i>exec()</i>, it doesn't serialize the
 * result at the end. Instead, it can be called multiple times, each time with another bag of
 * Datum Tuples to be input to the sketch.
 *
 * @param inputTuple A tuple containing a single bag, containing Datum Tuples.
 * @see #exec
 * @see "org.apache.pig.Accumulator.accumulate(org.apache.pig.data.Tuple)"
 * @throws IOException by Pig
 */
@Override
public void accumulate(final Tuple inputTuple) throws IOException {
    if (isFirstCall_) {
        Logger.getLogger(getClass()).info("Accumulator was used");
        isFirstCall_ = false;
    }
    if (inputTuple == null || inputTuple.size() == 0) {
        return;
    }
    final DataBag bag = (DataBag) inputTuple.get(0);
    if (bag == null) {
        return;
    }
    if (accumUnion_ == null) {
        accumUnion_ = new Union(lgK_);
    }
    updateUnion(bag, accumUnion_);
}
Also used : DataBag(org.apache.pig.data.DataBag) Union(com.yahoo.sketches.hll.Union)

Example 3 with Union

use of com.yahoo.sketches.hll.Union in project sketches-pig by DataSketches.

the class UnionSketch method accumulate.

/**
 * An <i>Accumulator</i> version of the standard <i>exec()</i> method. Like <i>exec()</i>,
 * accumulator is called with a bag of Sketch Tuples. Unlike <i>exec()</i>, it doesn't serialize the
 * result at the end. Instead, it can be called multiple times, each time with another bag of
 * Sketch Tuples to be input to the union.
 *
 * @param inputTuple A tuple containing a single bag, containing Sketch Tuples.
 * @see #exec
 * @see "org.apache.pig.Accumulator.accumulate(org.apache.pig.data.Tuple)"
 * @throws IOException by Pig
 */
@Override
public void accumulate(final Tuple inputTuple) throws IOException {
    if (isFirstCall_) {
        Logger.getLogger(getClass()).info("Accumulator was used");
        isFirstCall_ = false;
    }
    if (inputTuple == null || inputTuple.size() == 0) {
        return;
    }
    final DataBag bag = (DataBag) inputTuple.get(0);
    if (bag == null) {
        return;
    }
    if (accumUnion_ == null) {
        accumUnion_ = new Union(lgK_);
    }
    updateUnion(bag, accumUnion_);
}
Also used : DataBag(org.apache.pig.data.DataBag) Union(com.yahoo.sketches.hll.Union)

Example 4 with Union

use of com.yahoo.sketches.hll.Union in project sketches-pig by DataSketches.

the class UnionSketch method exec.

/**
 * Top-level exec function.
 * This method accepts an input Tuple containing a Bag of one or more inner <b>Sketch Tuples</b>
 * and returns a single serialized HllSketch as a DataByteArray.
 *
 * <b>Sketch Tuple</b> is a Tuple containing a single DataByteArray (BYTEARRAY in Pig), which
 * is a serialized HllSketch.
 *
 * @param inputTuple A tuple containing a single bag, containing Sketch Tuples.
 * @return serialized HllSketch
 * @see "org.apache.pig.EvalFunc.exec(org.apache.pig.data.Tuple)"
 * @throws IOException from Pig.
 */
@Override
public DataByteArray exec(final Tuple inputTuple) throws IOException {
    if (isFirstCall_) {
        Logger.getLogger(getClass()).info("Exec was used");
        isFirstCall_ = false;
    }
    if (inputTuple == null || inputTuple.size() == 0) {
        if (emptySketch_ == null) {
            emptySketch_ = new DataByteArray(new HllSketch(lgK_, tgtHllType_).toCompactByteArray());
        }
        return emptySketch_;
    }
    final Union union = new Union(lgK_);
    final DataBag bag = (DataBag) inputTuple.get(0);
    updateUnion(bag, union);
    return new DataByteArray(union.getResult(tgtHllType_).toCompactByteArray());
}
Also used : HllSketch(com.yahoo.sketches.hll.HllSketch) DataBag(org.apache.pig.data.DataBag) DataByteArray(org.apache.pig.data.DataByteArray) Union(com.yahoo.sketches.hll.Union)

Example 5 with Union

use of com.yahoo.sketches.hll.Union in project Gaffer by gchq.

the class HllUnionSerialiserTest method testSerialiseAndDeserialise.

@Test
public void testSerialiseAndDeserialise() {
    final Union sketch = new Union(15);
    sketch.update("A");
    sketch.update("B");
    sketch.update("C");
    testSerialiser(sketch);
    final Union emptySketch = new Union(15);
    testSerialiser(emptySketch);
}
Also used : Union(com.yahoo.sketches.hll.Union) Test(org.junit.jupiter.api.Test)

Aggregations

Union (com.yahoo.sketches.hll.Union)11 DataBag (org.apache.pig.data.DataBag)6 DataByteArray (org.apache.pig.data.DataByteArray)4 HllSketch (com.yahoo.sketches.hll.HllSketch)2 Tuple (org.apache.pig.data.Tuple)2 Test (org.junit.jupiter.api.Test)2 SerialisationException (uk.gov.gchq.gaffer.exception.SerialisationException)1 BinaryOperatorTest (uk.gov.gchq.koryphe.binaryoperator.BinaryOperatorTest)1