use of com.yahoo.sketches.hll.Union in project sketches-pig by DataSketches.
the class AlgebraicIntermediate method exec.
@Override
public Tuple exec(final Tuple inputTuple) throws IOException {
if (isFirstCall_) {
Logger.getLogger(getClass()).info("Algebraic was used");
isFirstCall_ = false;
}
if (inputTuple == null || inputTuple.size() == 0) {
return getEmptySketchTuple();
}
final DataBag outerBag = (DataBag) inputTuple.get(0);
if (outerBag == null) {
return getEmptySketchTuple();
}
final Union union = new Union(lgK_);
for (final Tuple dataTuple : outerBag) {
// inputTuple.bag0.dataTupleN.f0
final Object f0 = dataTuple.get(0);
if (f0 == null) {
continue;
}
if (f0 instanceof DataBag) {
// inputTuple.bag0.dataTupleN.f0:bag
final DataBag innerBag = (DataBag) f0;
if (innerBag.size() == 0) {
continue;
}
// If field 0 of a dataTuple is a Bag, all innerTuples of this inner bag
// will be passed into the union.
// It is due to system bagged outputs from multiple mapper Initial functions.
// The Intermediate stage was bypassed.
updateUnion(innerBag, union);
} else if (f0 instanceof DataByteArray) {
// inputTuple.bag0.dataTupleN.f0:DBA
// If field 0 of a dataTuple is a DataByteArray, we assume it is a sketch
// due to system bagged outputs from multiple mapper Intermediate functions.
// Each dataTuple.DBA:sketch will merged into the union.
final DataByteArray dba = (DataByteArray) f0;
union.update(HllSketch.wrap(Memory.wrap(dba.get())));
} else {
// we should never get here
throw new IllegalArgumentException("dataTuple.Field0 is not a DataBag or DataByteArray: " + f0.getClass().getName());
}
}
return tupleFactory_.newTuple(new DataByteArray(union.getResult(tgtHllType_).toCompactByteArray()));
}
use of com.yahoo.sketches.hll.Union in project sketches-pig by DataSketches.
the class DataToSketch method accumulate.
/**
* An <i>Accumulator</i> version of the standard <i>exec()</i> method. Like <i>exec()</i>,
* accumulator is called with a bag of Datum Tuples. Unlike <i>exec()</i>, it doesn't serialize the
* result at the end. Instead, it can be called multiple times, each time with another bag of
* Datum Tuples to be input to the sketch.
*
* @param inputTuple A tuple containing a single bag, containing Datum Tuples.
* @see #exec
* @see "org.apache.pig.Accumulator.accumulate(org.apache.pig.data.Tuple)"
* @throws IOException by Pig
*/
@Override
public void accumulate(final Tuple inputTuple) throws IOException {
if (isFirstCall_) {
Logger.getLogger(getClass()).info("Accumulator was used");
isFirstCall_ = false;
}
if (inputTuple == null || inputTuple.size() == 0) {
return;
}
final DataBag bag = (DataBag) inputTuple.get(0);
if (bag == null) {
return;
}
if (accumUnion_ == null) {
accumUnion_ = new Union(lgK_);
}
updateUnion(bag, accumUnion_);
}
use of com.yahoo.sketches.hll.Union in project sketches-pig by DataSketches.
the class UnionSketch method accumulate.
/**
* An <i>Accumulator</i> version of the standard <i>exec()</i> method. Like <i>exec()</i>,
* accumulator is called with a bag of Sketch Tuples. Unlike <i>exec()</i>, it doesn't serialize the
* result at the end. Instead, it can be called multiple times, each time with another bag of
* Sketch Tuples to be input to the union.
*
* @param inputTuple A tuple containing a single bag, containing Sketch Tuples.
* @see #exec
* @see "org.apache.pig.Accumulator.accumulate(org.apache.pig.data.Tuple)"
* @throws IOException by Pig
*/
@Override
public void accumulate(final Tuple inputTuple) throws IOException {
if (isFirstCall_) {
Logger.getLogger(getClass()).info("Accumulator was used");
isFirstCall_ = false;
}
if (inputTuple == null || inputTuple.size() == 0) {
return;
}
final DataBag bag = (DataBag) inputTuple.get(0);
if (bag == null) {
return;
}
if (accumUnion_ == null) {
accumUnion_ = new Union(lgK_);
}
updateUnion(bag, accumUnion_);
}
use of com.yahoo.sketches.hll.Union in project sketches-pig by DataSketches.
the class UnionSketch method exec.
/**
* Top-level exec function.
* This method accepts an input Tuple containing a Bag of one or more inner <b>Sketch Tuples</b>
* and returns a single serialized HllSketch as a DataByteArray.
*
* <b>Sketch Tuple</b> is a Tuple containing a single DataByteArray (BYTEARRAY in Pig), which
* is a serialized HllSketch.
*
* @param inputTuple A tuple containing a single bag, containing Sketch Tuples.
* @return serialized HllSketch
* @see "org.apache.pig.EvalFunc.exec(org.apache.pig.data.Tuple)"
* @throws IOException from Pig.
*/
@Override
public DataByteArray exec(final Tuple inputTuple) throws IOException {
if (isFirstCall_) {
Logger.getLogger(getClass()).info("Exec was used");
isFirstCall_ = false;
}
if (inputTuple == null || inputTuple.size() == 0) {
if (emptySketch_ == null) {
emptySketch_ = new DataByteArray(new HllSketch(lgK_, tgtHllType_).toCompactByteArray());
}
return emptySketch_;
}
final Union union = new Union(lgK_);
final DataBag bag = (DataBag) inputTuple.get(0);
updateUnion(bag, union);
return new DataByteArray(union.getResult(tgtHllType_).toCompactByteArray());
}
use of com.yahoo.sketches.hll.Union in project Gaffer by gchq.
the class HllUnionSerialiserTest method testSerialiseAndDeserialise.
@Test
public void testSerialiseAndDeserialise() {
final Union sketch = new Union(15);
sketch.update("A");
sketch.update("B");
sketch.update("C");
testSerialiser(sketch);
final Union emptySketch = new Union(15);
testSerialiser(emptySketch);
}
Aggregations