Search in sources :

Example 1 with CompactSketch

use of com.yahoo.sketches.theta.CompactSketch in project sketches-core by DataSketches.

the class BoundsOnRatiosInThetaSketchedSetsTest method checkNormalReturns.

@Test
public void checkNormalReturns() {
    //4K
    UpdateSketch skA = Sketches.updateSketchBuilder().build();
    UpdateSketch skC = Sketches.updateSketchBuilder().build();
    int uA = 10000;
    int uC = 100000;
    for (int i = 0; i < uA; i++) {
        skA.update(i);
    }
    for (int i = 0; i < uC; i++) {
        skC.update(i + uA / 2);
    }
    Intersection inter = Sketches.setOperationBuilder().buildIntersection();
    inter.update(skA);
    inter.update(skC);
    CompactSketch skB = inter.getResult();
    double est = BoundsOnRatiosInThetaSketchedSets.getEstimateOfBoverA(skA, skB);
    double lb = BoundsOnRatiosInThetaSketchedSets.getLowerBoundForBoverA(skA, skB);
    double ub = BoundsOnRatiosInThetaSketchedSets.getUpperBoundForBoverA(skA, skB);
    assertTrue(ub > est);
    assertTrue(est > lb);
    assertEquals(est, 0.5, .03);
    println("ub : " + ub);
    println("est: " + est);
    println("lb : " + lb);
    //skA is now empty
    skA.reset();
    est = BoundsOnRatiosInThetaSketchedSets.getEstimateOfBoverA(skA, skB);
    lb = BoundsOnRatiosInThetaSketchedSets.getLowerBoundForBoverA(skA, skB);
    ub = BoundsOnRatiosInThetaSketchedSets.getUpperBoundForBoverA(skA, skB);
    println("ub : " + ub);
    println("est: " + est);
    println("lb : " + lb);
    //Now both are empty
    skC.reset();
    est = BoundsOnRatiosInThetaSketchedSets.getEstimateOfBoverA(skA, skC);
    lb = BoundsOnRatiosInThetaSketchedSets.getLowerBoundForBoverA(skA, skC);
    ub = BoundsOnRatiosInThetaSketchedSets.getUpperBoundForBoverA(skA, skC);
    println("ub : " + ub);
    println("est: " + est);
    println("lb : " + lb);
}
Also used : Intersection(com.yahoo.sketches.theta.Intersection) CompactSketch(com.yahoo.sketches.theta.CompactSketch) UpdateSketch(com.yahoo.sketches.theta.UpdateSketch) Test(org.testng.annotations.Test)

Example 2 with CompactSketch

use of com.yahoo.sketches.theta.CompactSketch in project sketches-pig by DataSketches.

the class PigUtilTest method checkCompOrdSketchToTuple.

@Test(expectedExceptions = IllegalArgumentException.class)
public void checkCompOrdSketchToTuple() {
    UpdateSketch usk = UpdateSketch.builder().setNominalEntries(16).build();
    for (int i = 0; i < 16; i++) usk.update(i);
    CompactSketch csk = usk.compact(false, null);
    compactOrderedSketchToTuple(csk);
}
Also used : CompactSketch(com.yahoo.sketches.theta.CompactSketch) UpdateSketch(com.yahoo.sketches.theta.UpdateSketch) Test(org.testng.annotations.Test)

Example 3 with CompactSketch

use of com.yahoo.sketches.theta.CompactSketch in project sketches-pig by DataSketches.

the class AexcludeB method exec.

// @formatter:off
/**
 * Top Level Exec Function.
 * <p>
 * This method accepts a <b>Sketch AnotB Input Tuple</b> and returns a
 * <b>Sketch Tuple</b>.
 * </p>
 *
 * <b>Sketch AnotB Input Tuple</b>
 * <ul>
 *   <li>Tuple: TUPLE (Must contain 2 fields): <br>
 *   Java data type: Pig DataType: Description
 *     <ul>
 *       <li>index 0: DataByteArray: BYTEARRAY: Sketch A</li>
 *       <li>index 1: DataByteArray: BYTEARRAY: Sketch B</li>
 *     </ul>
 *   </li>
 * </ul>
 *
 * <p>
 * Any other input tuple will throw an exception!
 * </p>
 *
 * <b>Sketch Tuple</b>
 * <ul>
 *   <li>Tuple: TUPLE (Contains exactly 1 field)
 *     <ul>
 *       <li>index 0: DataByteArray: BYTEARRAY = The serialization of a Sketch object.</li>
 *     </ul>
 *   </li>
 * </ul>
 *
 * @throws ExecException from Pig.
 */
// @formatter:on
// TOP LEVEL EXEC
@Override
public Tuple exec(final Tuple inputTuple) throws IOException {
    // The exec is a stateless function.  It operates on the input and returns a result.
    // It can only call static functions.
    final Object objA = extractFieldAtIndex(inputTuple, 0);
    Sketch sketchA = null;
    if (objA != null) {
        final DataByteArray dbaA = (DataByteArray) objA;
        final Memory srcMem = Memory.wrap(dbaA.get());
        sketchA = Sketch.wrap(srcMem, seed_);
    }
    final Object objB = extractFieldAtIndex(inputTuple, 1);
    Sketch sketchB = null;
    if (objB != null) {
        final DataByteArray dbaB = (DataByteArray) objB;
        final Memory srcMem = Memory.wrap(dbaB.get());
        sketchB = Sketch.wrap(srcMem, seed_);
    }
    final AnotB aNOTb = SetOperation.builder().setSeed(seed_).buildANotB();
    aNOTb.update(sketchA, sketchB);
    final CompactSketch compactSketch = aNOTb.getResult(true, null);
    return compactOrderedSketchToTuple(compactSketch);
}
Also used : CompactSketch(com.yahoo.sketches.theta.CompactSketch) Memory(com.yahoo.memory.Memory) AnotB(com.yahoo.sketches.theta.AnotB) CompactSketch(com.yahoo.sketches.theta.CompactSketch) Sketch(com.yahoo.sketches.theta.Sketch) DataByteArray(org.apache.pig.data.DataByteArray)

Example 4 with CompactSketch

use of com.yahoo.sketches.theta.CompactSketch in project sketches-pig by DataSketches.

the class DataToSketch method exec.

// @formatter:off
/**
 ***********************************************************************************************
 * Top-level exec function.
 * This method accepts an input Tuple containing a Bag of one or more inner <b>Datum Tuples</b>
 * and returns a single updated <b>Sketch</b> as a <b>Sketch Tuple</b>.
 *
 * <p>If a large number of calls is anticipated, leveraging either the <i>Algebraic</i> or
 * <i>Accumulator</i> interfaces is recommended. Pig normally handles this automatically.
 *
 * <p>Internally, this method presents the inner <b>Datum Tuples</b> to a new <b>Sketch</b>,
 * which is returned as a <b>Sketch Tuple</b>
 *
 * <p><b>Input Tuple</b>
 * <ul>
 *   <li>Tuple: TUPLE (Must contain only one field)
 *     <ul>
 *       <li>index 0: DataBag: BAG (May contain 0 or more Inner Tuples)
 *         <ul>
 *           <li>index 0: Tuple: TUPLE <b>Datum Tuple</b></li>
 *           <li>...</li>
 *           <li>index n-1: Tuple: TUPLE <b>Datum Tuple</b></li>
 *         </ul>
 *       </li>
 *     </ul>
 *   </li>
 * </ul>
 *
 * <b>Datum Tuple</b>
 * <ul>
 *   <li>Tuple: TUPLE (Must contain only one field)
 *     <ul>
 *       <li>index 0: Java data type : Pig DataType: may be any one of:
 *         <ul>
 *           <li>Byte: BYTE</li>
 *           <li>Integer: INTEGER</li>
 *           <li>Long: LONG</li>
 *           <li>Float: FLOAT</li>
 *           <li>Double: DOUBLE</li>
 *           <li>String: CHARARRAY</li>
 *           <li>DataByteArray: BYTEARRAY</li>
 *         </ul>
 *       </li>
 *     </ul>
 *   </li>
 * </ul>
 *
 * <b>Sketch Tuple</b>
 * <ul>
 *   <li>Tuple: TUPLE (Contains exactly 1 field)
 *     <ul>
 *       <li>index 0: DataByteArray: BYTEARRAY = The serialization of a Sketch object.</li>
 *     </ul>
 *   </li>
 * </ul>
 *
 * @param inputTuple A tuple containing a single bag, containing Datum Tuples.
 * @return Sketch Tuple. If inputTuple is null or empty, returns empty sketch (8 bytes).
 * @see "org.apache.pig.EvalFunc.exec(org.apache.pig.data.Tuple)"
 * @throws IOException from Pig.
 */
// @formatter:on
// TOP LEVEL EXEC
@Override
public Tuple exec(final Tuple inputTuple) throws IOException {
    // throws is in API
    // The exec is a stateless function.  It operates on the input and returns a result.
    // It can only call static functions.
    final Union union = newUnion(nomEntries_, p_, seed_);
    final DataBag bag = extractBag(inputTuple);
    if (bag == null) {
        // Configured with parent
        return emptyCompactOrderedSketchTuple_;
    }
    // updates union with all elements of the bag
    updateUnion(bag, union);
    final CompactSketch compOrdSketch = union.getResult(true, null);
    return compactOrderedSketchToTuple(compOrdSketch);
}
Also used : CompactSketch(com.yahoo.sketches.theta.CompactSketch) DataBag(org.apache.pig.data.DataBag) Union(com.yahoo.sketches.theta.Union)

Example 5 with CompactSketch

use of com.yahoo.sketches.theta.CompactSketch in project sketches-pig by DataSketches.

the class PigUtil method emptySketchTuple.

/**
 * Return an empty Compact Ordered Sketch Tuple. Empty sketch is only 8 bytes.
 * @param seed the given seed
 * @return an empty compact ordered sketch tuple
 */
static final Tuple emptySketchTuple(final long seed) {
    final UpdateSketch sketch = UpdateSketch.builder().setSeed(seed).setResizeFactor(RF).setNominalEntries(16).build();
    final CompactSketch compOrdSketch = sketch.compact(true, null);
    return compactOrderedSketchToTuple(compOrdSketch);
}
Also used : CompactSketch(com.yahoo.sketches.theta.CompactSketch) UpdateSketch(com.yahoo.sketches.theta.UpdateSketch)

Aggregations

CompactSketch (com.yahoo.sketches.theta.CompactSketch)8 Intersection (com.yahoo.sketches.theta.Intersection)3 UpdateSketch (com.yahoo.sketches.theta.UpdateSketch)3 DataBag (org.apache.pig.data.DataBag)3 Union (com.yahoo.sketches.theta.Union)2 Test (org.testng.annotations.Test)2 Memory (com.yahoo.memory.Memory)1 AnotB (com.yahoo.sketches.theta.AnotB)1 Sketch (com.yahoo.sketches.theta.Sketch)1 DataByteArray (org.apache.pig.data.DataByteArray)1 Entity (uk.gov.gchq.gaffer.data.element.Entity)1 DataGenerator13 (uk.gov.gchq.gaffer.example.gettingstarted.generator.DataGenerator13)1 Graph (uk.gov.gchq.gaffer.graph.Graph)1 OperationChain (uk.gov.gchq.gaffer.operation.OperationChain)1 AddElements (uk.gov.gchq.gaffer.operation.impl.add.AddElements)1 GetAllEntities (uk.gov.gchq.gaffer.operation.impl.get.GetAllEntities)1 User (uk.gov.gchq.gaffer.user.User)1