Search in sources :

Example 1 with LoadFuncTupleIterator

use of com.twitter.elephantbird.pig.util.LoadFuncTupleIterator in project elephant-bird by twitter.

the class TestSequenceFileStorage method readOutsidePig.

@Test
public void readOutsidePig() throws ClassCastException, ParseException, ClassNotFoundException, InstantiationException, IllegalAccessException, IOException, InterruptedException {
    // simulate Pig front-end runtime
    final SequenceFileLoader<IntWritable, Text> storage = new SequenceFileLoader<IntWritable, Text>("-c " + IntWritableConverter.class.getName(), "-c " + TextConverter.class.getName());
    Job job = new Job();
    storage.setUDFContextSignature("12345");
    storage.setLocation(tempFilename, job);
    // simulate Pig back-end runtime
    RecordReader<DataInputBuffer, DataInputBuffer> reader = new RawSequenceFileRecordReader();
    FileSplit fileSplit = new FileSplit(new Path(tempFilename), 0, new File(tempFilename).length(), new String[] { "localhost" });
    TaskAttemptContext context = HadoopCompat.newTaskAttemptContext(HadoopCompat.getConfiguration(job), new TaskAttemptID());
    reader.initialize(fileSplit, context);
    InputSplit[] wrappedSplits = new InputSplit[] { fileSplit };
    int inputIndex = 0;
    List<OperatorKey> targetOps = Arrays.asList(new OperatorKey("54321", 0));
    int splitIndex = 0;
    PigSplit split = new PigSplit(wrappedSplits, inputIndex, targetOps, splitIndex);
    split.setConf(HadoopCompat.getConfiguration(job));
    storage.prepareToRead(reader, split);
    // read tuples and validate
    validate(new LoadFuncTupleIterator(storage));
}
Also used : Path(org.apache.hadoop.fs.Path) OperatorKey(org.apache.pig.impl.plan.OperatorKey) RawSequenceFileRecordReader(com.twitter.elephantbird.mapreduce.input.RawSequenceFileRecordReader) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) Text(org.apache.hadoop.io.Text) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) FileSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit) DataInputBuffer(org.apache.hadoop.io.DataInputBuffer) LoadFuncTupleIterator(com.twitter.elephantbird.pig.util.LoadFuncTupleIterator) PigSplit(org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit) SequenceFileLoader(com.twitter.elephantbird.pig.load.SequenceFileLoader) Job(org.apache.hadoop.mapreduce.Job) SequenceFile(org.apache.hadoop.io.SequenceFile) File(java.io.File) InputSplit(org.apache.hadoop.mapreduce.InputSplit) IntWritable(org.apache.hadoop.io.IntWritable) Test(org.junit.Test)

Aggregations

RawSequenceFileRecordReader (com.twitter.elephantbird.mapreduce.input.RawSequenceFileRecordReader)1 SequenceFileLoader (com.twitter.elephantbird.pig.load.SequenceFileLoader)1 LoadFuncTupleIterator (com.twitter.elephantbird.pig.util.LoadFuncTupleIterator)1 File (java.io.File)1 Path (org.apache.hadoop.fs.Path)1 DataInputBuffer (org.apache.hadoop.io.DataInputBuffer)1 IntWritable (org.apache.hadoop.io.IntWritable)1 SequenceFile (org.apache.hadoop.io.SequenceFile)1 Text (org.apache.hadoop.io.Text)1 InputSplit (org.apache.hadoop.mapreduce.InputSplit)1 Job (org.apache.hadoop.mapreduce.Job)1 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)1 TaskAttemptID (org.apache.hadoop.mapreduce.TaskAttemptID)1 FileSplit (org.apache.hadoop.mapreduce.lib.input.FileSplit)1 PigSplit (org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit)1 OperatorKey (org.apache.pig.impl.plan.OperatorKey)1 Test (org.junit.Test)1