Search in sources :

Example 6 with RequiredField

use of org.apache.pig.LoadPushDown.RequiredField in project elephant-bird by twitter.

the class TestProtoToPig method evenFields.

private static RequiredFieldList evenFields(List<FieldDescriptor> protoFields) {
    RequiredFieldList reqList = new RequiredFieldList();
    int i = 0;
    for (FieldDescriptor fd : protoFields) {
        if (i % 2 == 0) {
            RequiredField field = new RequiredField();
            field.setAlias(fd.getName());
            field.setIndex(i);
            // field.setType() type is not used
            reqList.add(field);
        }
        i++;
    }
    return reqList;
}
Also used : RequiredFieldList(org.apache.pig.LoadPushDown.RequiredFieldList) RequiredField(org.apache.pig.LoadPushDown.RequiredField) FieldDescriptor(com.google.protobuf.Descriptors.FieldDescriptor)

Example 7 with RequiredField

use of org.apache.pig.LoadPushDown.RequiredField in project elephant-bird by twitter.

the class TestThriftToPig method thriftToPig.

static <M extends TBase<?, ?>> Tuple thriftToPig(M obj) throws TException {
    // it is very inefficient to create one ThriftToPig for each Thrift object,
    // but good enough for unit testing.
    TypeRef<M> typeRef = new TypeRef<M>(obj.getClass()) {
    };
    ThriftToPig<M> thriftToPig = ThriftToPig.newInstance(typeRef);
    Tuple t = thriftToPig.getPigTuple(obj);
    // test projected tuple. project a subset of fields based on field name.
    List<Field> tFields = thriftToPig.getTStructDescriptor().getFields();
    List<Integer> idxList = Lists.newArrayList();
    RequiredFieldList reqFieldList = new RequiredFieldList();
    for (int i = 0; i < tFields.size(); i++) {
        String name = tFields.get(i).getName();
        if (name.hashCode() % 2 == 0) {
            RequiredField rf = new RequiredField();
            rf.setAlias(name);
            rf.setIndex(i);
            reqFieldList.add(rf);
            idxList.add(i);
        }
    }
    try {
        Tuple pt = new ProjectedThriftTupleFactory<M>(typeRef, reqFieldList).newTuple(obj);
        int pidx = 0;
        for (int idx : idxList) {
            if (t.get(idx) != pt.get(pidx)) {
                // if both are not nulls
                assertEquals(t.get(idx).toString(), pt.get(pidx).toString());
            }
            pidx++;
        }
    } catch (ExecException e) {
        // not expected
        throw new TException(e);
    }
    // return the full tuple
    return t;
}
Also used : TException(org.apache.thrift.TException) TypeRef(com.twitter.elephantbird.util.TypeRef) ExecException(org.apache.pig.backend.executionengine.ExecException) Field(com.twitter.elephantbird.thrift.TStructDescriptor.Field) RequiredField(org.apache.pig.LoadPushDown.RequiredField) RequiredFieldList(org.apache.pig.LoadPushDown.RequiredFieldList) RequiredField(org.apache.pig.LoadPushDown.RequiredField) ThriftBytesToTuple(com.twitter.elephantbird.pig.piggybank.ThriftBytesToTuple) Tuple(org.apache.pig.data.Tuple)

Aggregations

RequiredField (org.apache.pig.LoadPushDown.RequiredField)7 RequiredFieldList (org.apache.pig.LoadPushDown.RequiredFieldList)4 FieldDescriptor (com.google.protobuf.Descriptors.FieldDescriptor)1 ThriftBytesToTuple (com.twitter.elephantbird.pig.piggybank.ThriftBytesToTuple)1 Field (com.twitter.elephantbird.thrift.TStructDescriptor.Field)1 TypeRef (com.twitter.elephantbird.util.TypeRef)1 ArrayList (java.util.ArrayList)1 Properties (java.util.Properties)1 GuaguaMapReduceClient (ml.shifu.guagua.mapreduce.GuaguaMapReduceClient)1 ColumnConfig (ml.shifu.shifu.container.obj.ColumnConfig)1 SourceType (ml.shifu.shifu.container.obj.RawSourceData.SourceType)1 FeatureSubsetStrategy (ml.shifu.shifu.core.dtrain.FeatureSubsetStrategy)1 BasicFloatNetwork (ml.shifu.shifu.core.dtrain.dataset.BasicFloatNetwork)1 GridSearch (ml.shifu.shifu.core.dtrain.gs.GridSearch)1 GuaguaParquetMapReduceClient (ml.shifu.shifu.guagua.GuaguaParquetMapReduceClient)1 MutablePair (org.apache.commons.lang3.tuple.MutablePair)1 Configuration (org.apache.hadoop.conf.Configuration)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 Path (org.apache.hadoop.fs.Path)1 Job (org.apache.hadoop.mapreduce.Job)1