Examples with SequenceFileLoader - com.twitter.elephantbird.pig.load.SequenceFileLoader

Example 1 with SequenceFileLoader

use of com.twitter.elephantbird.pig.load.SequenceFileLoader in project elephant-bird by twitter.

the class AbstractTestWritableConverter method readOutsidePig.

@Test
public void readOutsidePig() throws ClassCastException, ParseException, ClassNotFoundException, InstantiationException, IllegalAccessException, IOException, InterruptedException {
    // simulate Pig front-end runtime
    final SequenceFileLoader<IntWritable, Text> loader = new SequenceFileLoader<IntWritable, Text>(String.format("-c %s", IntWritableConverter.class.getName()), String.format("-c %s %s", writableConverterClass.getName(), writableConverterArguments));
    Job job = new Job();
    loader.setUDFContextSignature("12345");
    loader.setLocation(tempFilename, job);
    // simulate Pig back-end runtime
    final RecordReader<DataInputBuffer, DataInputBuffer> reader = new RawSequenceFileRecordReader();
    final FileSplit fileSplit = new FileSplit(new Path(tempFilename), 0, new File(tempFilename).length(), new String[] { "localhost" });
    final TaskAttemptContext context = HadoopCompat.newTaskAttemptContext(HadoopCompat.getConfiguration(job), new TaskAttemptID());
    reader.initialize(fileSplit, context);
    final InputSplit[] wrappedSplits = new InputSplit[] { fileSplit };
    final int inputIndex = 0;
    final List<OperatorKey> targetOps = Arrays.asList(new OperatorKey("54321", 0));
    final int splitIndex = 0;
    final PigSplit split = new PigSplit(wrappedSplits, inputIndex, targetOps, splitIndex);
    split.setConf(HadoopCompat.getConfiguration(job));
    loader.prepareToRead(reader, split);
    // read tuples and validate
    validate(new LoadFuncTupleIterator(loader));
}

Also used : Path(org.apache.hadoop.fs.Path) OperatorKey(org.apache.pig.impl.plan.OperatorKey) RawSequenceFileRecordReader(com.twitter.elephantbird.mapreduce.input.RawSequenceFileRecordReader) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) Text(org.apache.hadoop.io.Text) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) FileSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit) DataInputBuffer(org.apache.hadoop.io.DataInputBuffer) PigSplit(org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit) SequenceFileLoader(com.twitter.elephantbird.pig.load.SequenceFileLoader) Job(org.apache.hadoop.mapreduce.Job) SequenceFile(org.apache.hadoop.io.SequenceFile) File(java.io.File) InputSplit(org.apache.hadoop.mapreduce.InputSplit) IntWritable(org.apache.hadoop.io.IntWritable) Test(org.junit.Test)

Example 2 with SequenceFileLoader

use of com.twitter.elephantbird.pig.load.SequenceFileLoader in project elephant-bird by twitter.

the class TestSequenceFileStorage method readOutsidePig.

@Test
public void readOutsidePig() throws ClassCastException, ParseException, ClassNotFoundException, InstantiationException, IllegalAccessException, IOException, InterruptedException {
    // simulate Pig front-end runtime
    final SequenceFileLoader<IntWritable, Text> storage = new SequenceFileLoader<IntWritable, Text>("-c " + IntWritableConverter.class.getName(), "-c " + TextConverter.class.getName());
    Job job = new Job();
    storage.setUDFContextSignature("12345");
    storage.setLocation(tempFilename, job);
    // simulate Pig back-end runtime
    RecordReader<DataInputBuffer, DataInputBuffer> reader = new RawSequenceFileRecordReader();
    FileSplit fileSplit = new FileSplit(new Path(tempFilename), 0, new File(tempFilename).length(), new String[] { "localhost" });
    TaskAttemptContext context = HadoopCompat.newTaskAttemptContext(HadoopCompat.getConfiguration(job), new TaskAttemptID());
    reader.initialize(fileSplit, context);
    InputSplit[] wrappedSplits = new InputSplit[] { fileSplit };
    int inputIndex = 0;
    List<OperatorKey> targetOps = Arrays.asList(new OperatorKey("54321", 0));
    int splitIndex = 0;
    PigSplit split = new PigSplit(wrappedSplits, inputIndex, targetOps, splitIndex);
    split.setConf(HadoopCompat.getConfiguration(job));
    storage.prepareToRead(reader, split);
    // read tuples and validate
    validate(new LoadFuncTupleIterator(storage));
}

Also used : Path(org.apache.hadoop.fs.Path) OperatorKey(org.apache.pig.impl.plan.OperatorKey) RawSequenceFileRecordReader(com.twitter.elephantbird.mapreduce.input.RawSequenceFileRecordReader) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) Text(org.apache.hadoop.io.Text) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) FileSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit) DataInputBuffer(org.apache.hadoop.io.DataInputBuffer) LoadFuncTupleIterator(com.twitter.elephantbird.pig.util.LoadFuncTupleIterator) PigSplit(org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit) SequenceFileLoader(com.twitter.elephantbird.pig.load.SequenceFileLoader) Job(org.apache.hadoop.mapreduce.Job) SequenceFile(org.apache.hadoop.io.SequenceFile) File(java.io.File) InputSplit(org.apache.hadoop.mapreduce.InputSplit) IntWritable(org.apache.hadoop.io.IntWritable) Test(org.junit.Test)

Aggregations

RawSequenceFileRecordReader (com.twitter.elephantbird.mapreduce.input.RawSequenceFileRecordReader)2 SequenceFileLoader (com.twitter.elephantbird.pig.load.SequenceFileLoader)2 File (java.io.File)2 Path (org.apache.hadoop.fs.Path)2 DataInputBuffer (org.apache.hadoop.io.DataInputBuffer)2 IntWritable (org.apache.hadoop.io.IntWritable)2 SequenceFile (org.apache.hadoop.io.SequenceFile)2 Text (org.apache.hadoop.io.Text)2 InputSplit (org.apache.hadoop.mapreduce.InputSplit)2 Job (org.apache.hadoop.mapreduce.Job)2 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)2 TaskAttemptID (org.apache.hadoop.mapreduce.TaskAttemptID)2 FileSplit (org.apache.hadoop.mapreduce.lib.input.FileSplit)2 PigSplit (org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit)2 OperatorKey (org.apache.pig.impl.plan.OperatorKey)2 Test (org.junit.Test)2 LoadFuncTupleIterator (com.twitter.elephantbird.pig.util.LoadFuncTupleIterator)1