Search in sources :

Example 6 with Tuple

use of cascading.tuple.Tuple in project SpyGlass by ParallelAI.

the class JDBCScheme method sink.

@Override
public void sink(FlowProcess<JobConf> flowProcess, SinkCall<Object[], OutputCollector> sinkCall) throws IOException {
    // it's ok to use NULL here so the collector does not write anything
    TupleEntry tupleEntry = sinkCall.getOutgoingEntry();
    OutputCollector outputCollector = sinkCall.getOutput();
    if (updateBy != null) {
        Tuple allValues = tupleEntry.selectTuple(updateValueFields);
        Tuple updateValues = tupleEntry.selectTuple(updateByFields);
        allValues = cleanTuple(allValues);
        TupleRecord key = new TupleRecord(allValues);
        if (updateValues.equals(updateIfTuple))
            outputCollector.collect(key, null);
        else
            outputCollector.collect(key, key);
        return;
    }
    Tuple result = tupleEntry.selectTuple(getSinkFields());
    result = cleanTuple(result);
    outputCollector.collect(new TupleRecord(result), null);
}
Also used : OutputCollector(org.apache.hadoop.mapred.OutputCollector) TupleEntry(cascading.tuple.TupleEntry) Tuple(cascading.tuple.Tuple)

Example 7 with Tuple

use of cascading.tuple.Tuple in project SpyGlass by ParallelAI.

the class HBaseScheme method source.

@Override
public boolean source(FlowProcess<JobConf> flowProcess, SourceCall<Object[], RecordReader> sourceCall) throws IOException {
    Tuple result = new Tuple();
    Object key = sourceCall.getContext()[0];
    Object value = sourceCall.getContext()[1];
    boolean hasNext = sourceCall.getInput().next(key, value);
    if (!hasNext) {
        return false;
    }
    // Skip nulls
    if (key == null || value == null) {
        return true;
    }
    ImmutableBytesWritable keyWritable = (ImmutableBytesWritable) key;
    Result row = (Result) value;
    result.add(keyWritable);
    for (int i = 0; i < this.familyNames.length; i++) {
        String familyName = this.familyNames[i];
        byte[] familyNameBytes = Bytes.toBytes(familyName);
        Fields fields = this.valueFields[i];
        for (int k = 0; k < fields.size(); k++) {
            String fieldName = (String) fields.get(k);
            byte[] fieldNameBytes = Bytes.toBytes(fieldName);
            byte[] cellValue = row.getValue(familyNameBytes, fieldNameBytes);
            result.add(cellValue != null ? new ImmutableBytesWritable(cellValue) : null);
        }
    }
    sourceCall.getIncomingEntry().setTuple(result);
    return true;
}
Also used : ImmutableBytesWritable(org.apache.hadoop.hbase.io.ImmutableBytesWritable) Fields(cascading.tuple.Fields) Tuple(cascading.tuple.Tuple) Result(org.apache.hadoop.hbase.client.Result)

Example 8 with Tuple

use of cascading.tuple.Tuple in project SpyGlass by ParallelAI.

the class HBaseRawScheme method source.

@SuppressWarnings("unchecked")
@Override
public boolean source(FlowProcess<JobConf> flowProcess, SourceCall<Object[], RecordReader> sourceCall) throws IOException {
    Tuple result = new Tuple();
    Object key = sourceCall.getContext()[0];
    Object value = sourceCall.getContext()[1];
    boolean hasNext = sourceCall.getInput().next(key, value);
    if (!hasNext) {
        return false;
    }
    // Skip nulls
    if (key == null || value == null) {
        return true;
    }
    ImmutableBytesWritable keyWritable = (ImmutableBytesWritable) key;
    Result row = (Result) value;
    result.add(keyWritable);
    result.add(row);
    sourceCall.getIncomingEntry().setTuple(result);
    return true;
}
Also used : ImmutableBytesWritable(org.apache.hadoop.hbase.io.ImmutableBytesWritable) Tuple(cascading.tuple.Tuple) Result(org.apache.hadoop.hbase.client.Result)

Example 9 with Tuple

use of cascading.tuple.Tuple in project SpyGlass by ParallelAI.

the class HBaseRawScheme method sink.

@SuppressWarnings("unchecked")
@Override
public void sink(FlowProcess<JobConf> flowProcess, SinkCall<Object[], OutputCollector> sinkCall) throws IOException {
    TupleEntry tupleEntry = sinkCall.getOutgoingEntry();
    OutputCollector outputCollector = sinkCall.getOutput();
    Tuple key = tupleEntry.selectTuple(RowKeyField);
    Object okey = key.getObject(0);
    ImmutableBytesWritable keyBytes = getBytes(okey);
    Put put = new Put(keyBytes.get());
    Fields outFields = tupleEntry.getFields().subtract(RowKeyField);
    if (null != outFields) {
        TupleEntry values = tupleEntry.selectEntry(outFields);
        for (int n = 0; n < values.getFields().size(); n++) {
            Object o = values.get(n);
            ImmutableBytesWritable valueBytes = getBytes(o);
            Comparable field = outFields.get(n);
            ColumnName cn = parseColumn((String) field);
            if (null == cn.family) {
                if (n >= familyNames.length)
                    cn.family = familyNames[familyNames.length - 1];
                else
                    cn.family = familyNames[n];
            }
            if (null != o || writeNulls)
                put.add(Bytes.toBytes(cn.family), Bytes.toBytes(cn.name), valueBytes.get());
        }
    }
    outputCollector.collect(null, put);
}
Also used : OutputCollector(org.apache.hadoop.mapred.OutputCollector) ImmutableBytesWritable(org.apache.hadoop.hbase.io.ImmutableBytesWritable) Fields(cascading.tuple.Fields) TupleEntry(cascading.tuple.TupleEntry) Tuple(cascading.tuple.Tuple) Put(org.apache.hadoop.hbase.client.Put)

Aggregations

Tuple (cascading.tuple.Tuple)9 TupleEntry (cascading.tuple.TupleEntry)5 Fields (cascading.tuple.Fields)4 ImmutableBytesWritable (org.apache.hadoop.hbase.io.ImmutableBytesWritable)4 OutputCollector (org.apache.hadoop.mapred.OutputCollector)3 Put (org.apache.hadoop.hbase.client.Put)2 Result (org.apache.hadoop.hbase.client.Result)2 Function (cascading.operation.Function)1 TupleListCollector (cascading.tuple.TupleListCollector)1 ArrayList (java.util.ArrayList)1 Test (org.junit.Test)1