Search in sources :

Example 86 with DataBag

use of org.apache.pig.data.DataBag in project sketches-pig by DataSketches.

the class DataToFrequentStringsSketchTest method execWrongCountType.

@Test(expectedExceptions = ClassCastException.class)
public void execWrongCountType() throws Exception {
    EvalFunc<Tuple> func = new DataToFrequentStringsSketch("8");
    DataBag bag = BagFactory.getInstance().newDefaultBag();
    // integer count is not supported
    bag.add(PigUtil.objectsToTuple("a", 1));
    Tuple inputTuple = PigUtil.objectsToTuple(bag);
    func.exec(inputTuple);
}
Also used : DataBag(org.apache.pig.data.DataBag) Tuple(org.apache.pig.data.Tuple) Test(org.testng.annotations.Test)

Example 87 with DataBag

use of org.apache.pig.data.DataBag in project sketches-pig by DataSketches.

the class DataToFrequentStringsSketchTest method algebraicInitial.

@Test
public void algebraicInitial() throws Exception {
    EvalFunc<Tuple> func = new DataToFrequentStringsSketch.Initial(null);
    Tuple inputTuple = TupleFactory.getInstance().newTuple(1);
    DataBag bag = BagFactory.getInstance().newDefaultBag();
    bag.add(PigUtil.objectsToTuple(null, null));
    bag.add(PigUtil.objectsToTuple(null, null));
    bag.add(PigUtil.objectsToTuple(null, null));
    inputTuple.set(0, bag);
    Tuple resultTuple = func.exec(inputTuple);
    Assert.assertNotNull(resultTuple);
    Assert.assertEquals(resultTuple.size(), 1);
    DataBag resultBag = (DataBag) resultTuple.get(0);
    Assert.assertEquals(resultBag.size(), 3);
}
Also used : DataBag(org.apache.pig.data.DataBag) Tuple(org.apache.pig.data.Tuple) Test(org.testng.annotations.Test)

Example 88 with DataBag

use of org.apache.pig.data.DataBag in project sketches-pig by DataSketches.

the class FrequentStringsSketchToEstimatesTest method estimation.

@Test
public void estimation() throws Exception {
    ItemsSketch<String> sketch = new ItemsSketch<String>(8);
    sketch.update("1", 1000);
    sketch.update("2", 500);
    sketch.update("3", 200);
    sketch.update("4", 100);
    sketch.update("5", 50);
    sketch.update("6", 20);
    sketch.update("7", 10);
    sketch.update("8", 5);
    sketch.update("9", 2);
    sketch.update("10");
    Tuple inputTuple = PigUtil.objectsToTuple(new DataByteArray(sketch.toByteArray(new ArrayOfStringsSerDe())));
    EvalFunc<DataBag> func1 = new FrequentStringsSketchToEstimates("NO_FALSE_POSITIVES");
    DataBag bag1 = func1.exec(inputTuple);
    Assert.assertNotNull(bag1);
    Assert.assertTrue(bag1.size() < 10);
    EvalFunc<DataBag> func2 = new FrequentStringsSketchToEstimates("NO_FALSE_NEGATIVES");
    DataBag bag2 = func2.exec(inputTuple);
    Assert.assertNotNull(bag2);
    Assert.assertTrue(bag2.size() < 10);
    Assert.assertTrue(bag1.size() < bag2.size());
}
Also used : ArrayOfStringsSerDe(com.yahoo.sketches.ArrayOfStringsSerDe) DataBag(org.apache.pig.data.DataBag) ItemsSketch(com.yahoo.sketches.frequencies.ItemsSketch) DataByteArray(org.apache.pig.data.DataByteArray) Tuple(org.apache.pig.data.Tuple) Test(org.testng.annotations.Test)

Example 89 with DataBag

use of org.apache.pig.data.DataBag in project sketches-pig by DataSketches.

the class FrequentStringsSketchToEstimatesTest method schema.

@Test
public void schema() throws Exception {
    EvalFunc<DataBag> func = new FrequentStringsSketchToEstimates();
    Schema schema = func.outputSchema(null);
    Assert.assertNotNull(schema);
    Assert.assertEquals(schema.size(), 1);
    Assert.assertEquals(schema.getField(0).type, DataType.BAG);
    Assert.assertEquals(schema.getField(0).schema.size(), 1);
    Assert.assertEquals(schema.getField(0).schema.getField(0).type, DataType.TUPLE);
    Assert.assertEquals(schema.getField(0).schema.getField(0).schema.size(), 4);
    Assert.assertEquals(schema.getField(0).schema.getField(0).schema.getField(0).type, DataType.CHARARRAY);
    Assert.assertEquals(schema.getField(0).schema.getField(0).schema.getField(1).type, DataType.LONG);
    Assert.assertEquals(schema.getField(0).schema.getField(0).schema.getField(2).type, DataType.LONG);
    Assert.assertEquals(schema.getField(0).schema.getField(0).schema.getField(3).type, DataType.LONG);
}
Also used : DataBag(org.apache.pig.data.DataBag) Schema(org.apache.pig.impl.logicalLayer.schema.Schema) Test(org.testng.annotations.Test)

Example 90 with DataBag

use of org.apache.pig.data.DataBag in project sketches-pig by DataSketches.

the class FrequentStringsSketchToEstimatesTest method nullInput.

@Test
public void nullInput() throws Exception {
    EvalFunc<DataBag> func = new FrequentStringsSketchToEstimates();
    DataBag bag = func.exec(null);
    Assert.assertNull(bag);
}
Also used : DataBag(org.apache.pig.data.DataBag) Test(org.testng.annotations.Test)

Aggregations

DataBag (org.apache.pig.data.DataBag)266 Tuple (org.apache.pig.data.Tuple)223 Test (org.testng.annotations.Test)142 DataByteArray (org.apache.pig.data.DataByteArray)103 IOException (java.io.IOException)20 Estimate (com.yahoo.sketches.pig.theta.Estimate)19 EvalFunc (org.apache.pig.EvalFunc)16 HllSketch (com.yahoo.sketches.hll.HllSketch)14 DoubleSummary (com.yahoo.sketches.tuple.DoubleSummary)13 DoubleSummaryDeserializer (com.yahoo.sketches.tuple.DoubleSummaryDeserializer)13 Test (org.junit.Test)13 ArrayOfStringsSerDe (com.yahoo.sketches.ArrayOfStringsSerDe)12 ArrayOfDoublesSketch (com.yahoo.sketches.tuple.ArrayOfDoublesSketch)12 ExecException (org.apache.pig.backend.executionengine.ExecException)12 ItemsSketch (com.yahoo.sketches.frequencies.ItemsSketch)11 ArrayOfDoublesUpdatableSketchBuilder (com.yahoo.sketches.tuple.ArrayOfDoublesUpdatableSketchBuilder)11 Map (java.util.Map)11 ArrayOfDoublesUpdatableSketch (com.yahoo.sketches.tuple.ArrayOfDoublesUpdatableSketch)10 ArrayList (java.util.ArrayList)10 HashMap (java.util.HashMap)10