Search in sources :

Example 41 with BufferedString

use of water.parser.BufferedString in project h2o-3 by h2oai.

the class WordCountTask method map.

@Override
public void map(Chunk cs) {
    _counts = new IcedHashMap<>();
    for (int i = 0; i < cs._len; i++) {
        if (cs.isNA(i))
            continue;
        BufferedString str = cs.atStr(new BufferedString(), i);
        IcedLong count = _counts.get(str);
        if (count != null)
            count._val++;
        else
            _counts.put(str, new IcedLong(1));
    }
}
Also used : IcedLong(water.util.IcedLong) BufferedString(water.parser.BufferedString)

Example 42 with BufferedString

use of water.parser.BufferedString in project h2o-3 by h2oai.

the class AstEntropy method entropyStringCol.

private Vec entropyStringCol(Vec vec) {
    return new MRTask() {

        @Override
        public void map(Chunk chk, NewChunk newChk) {
            if (//all NAs
            chk instanceof C0DChunk)
                newChk.addNAs(chk.len());
            else if (//fast-path operations
            ((CStrChunk) chk)._isAllASCII)
                ((CStrChunk) chk).asciiEntropy(newChk);
            else {
                //UTF requires Java string methods
                BufferedString tmpStr = new BufferedString();
                for (int i = 0; i < chk._len; i++) {
                    if (chk.isNA(i))
                        newChk.addNA();
                    else {
                        String str = chk.atStr(tmpStr, i).toString();
                        newChk.addNum(calcEntropy(str));
                    }
                }
            }
        }
    }.doAll(new byte[] { Vec.T_NUM }, vec).outputFrame().anyVec();
}
Also used : MRTask(water.MRTask) BufferedString(water.parser.BufferedString) BufferedString(water.parser.BufferedString)

Example 43 with BufferedString

use of water.parser.BufferedString in project h2o-3 by h2oai.

the class AstBinOp method frame_op_scalar.

/**
   * Auto-widen the scalar to every element of the frame
   */
private ValFrame frame_op_scalar(Frame fr, final String str) {
    Frame res = new MRTask() {

        @Override
        public void map(Chunk[] chks, NewChunk[] cress) {
            BufferedString vstr = new BufferedString();
            for (int c = 0; c < chks.length; c++) {
                Chunk chk = chks[c];
                NewChunk cres = cress[c];
                Vec vec = chk.vec();
                // String Vectors: apply str_op as BufferedStrings to all elements
                if (vec.isString()) {
                    final BufferedString conStr = new BufferedString(str);
                    for (int i = 0; i < chk._len; i++) cres.addNum(str_op(chk.atStr(vstr, i), conStr));
                } else if (vec.isCategorical()) {
                    // categorical Vectors: convert string to domain value; apply op (not
                    // str_op).  Not sure what the "right" behavior here is, can
                    // easily argue that should instead apply str_op to the categorical
                    // string domain value - except that this whole operation only
                    // makes sense for EQ/NE, and is much faster when just comparing
                    // doubles vs comparing strings.  Note that if the string is not
                    // part of the categorical domain, the find op returns -1 which is never
                    // equal to any categorical dense integer (which are always 0+).
                    final double d = (double) ArrayUtils.find(vec.domain(), str);
                    for (int i = 0; i < chk._len; i++) cres.addNum(op(chk.atd(i), d));
                } else {
                    // mixing string and numeric
                    // false or true only
                    final double d = op(1, 2);
                    for (int i = 0; i < chk._len; i++) cres.addNum(d);
                }
            }
        }
    }.doAll(fr.numCols(), Vec.T_NUM, fr).outputFrame(fr._names, null);
    return new ValFrame(res);
}
Also used : ValFrame(water.rapids.vals.ValFrame) ValFrame(water.rapids.vals.ValFrame) Frame(water.fvec.Frame) Vec(water.fvec.Vec) MRTask(water.MRTask) BufferedString(water.parser.BufferedString) Chunk(water.fvec.Chunk) NewChunk(water.fvec.NewChunk) NewChunk(water.fvec.NewChunk)

Aggregations

BufferedString (water.parser.BufferedString)43 Frame (water.fvec.Frame)12 Test (org.junit.Test)9 MRTask (water.MRTask)8 Vec (water.fvec.Vec)8 Chunk (water.fvec.Chunk)7 NewChunk (water.fvec.NewChunk)6 ValFrame (water.rapids.vals.ValFrame)5 IcedLong (water.util.IcedLong)5 IOException (java.io.IOException)2 ByteBuffer (java.nio.ByteBuffer)2 Random (java.util.Random)2 DateTimeFormatter (org.joda.time.format.DateTimeFormatter)2 TestFrameBuilder (water.fvec.TestFrameBuilder)2 BackendModel (deepwater.backends.BackendModel)1 BackendParams (deepwater.backends.BackendParams)1 RuntimeOptions (deepwater.backends.RuntimeOptions)1 ImageDataSet (deepwater.datasets.ImageDataSet)1 GenModel (hex.genmodel.GenModel)1 EasyPredictModelWrapper (hex.genmodel.easy.EasyPredictModelWrapper)1