Examples with PigSplit - org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit

Example 1 with PigSplit

use of org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit in project elephant-bird by twitter.

the class AbstractTestWritableConverter method readOutsidePig.

@Test
public void readOutsidePig() throws ClassCastException, ParseException, ClassNotFoundException, InstantiationException, IllegalAccessException, IOException, InterruptedException {
    // simulate Pig front-end runtime
    final SequenceFileLoader<IntWritable, Text> loader = new SequenceFileLoader<IntWritable, Text>(String.format("-c %s", IntWritableConverter.class.getName()), String.format("-c %s %s", writableConverterClass.getName(), writableConverterArguments));
    Job job = new Job();
    loader.setUDFContextSignature("12345");
    loader.setLocation(tempFilename, job);
    // simulate Pig back-end runtime
    final RecordReader<DataInputBuffer, DataInputBuffer> reader = new RawSequenceFileRecordReader();
    final FileSplit fileSplit = new FileSplit(new Path(tempFilename), 0, new File(tempFilename).length(), new String[] { "localhost" });
    final TaskAttemptContext context = HadoopCompat.newTaskAttemptContext(HadoopCompat.getConfiguration(job), new TaskAttemptID());
    reader.initialize(fileSplit, context);
    final InputSplit[] wrappedSplits = new InputSplit[] { fileSplit };
    final int inputIndex = 0;
    final List<OperatorKey> targetOps = Arrays.asList(new OperatorKey("54321", 0));
    final int splitIndex = 0;
    final PigSplit split = new PigSplit(wrappedSplits, inputIndex, targetOps, splitIndex);
    split.setConf(HadoopCompat.getConfiguration(job));
    loader.prepareToRead(reader, split);
    // read tuples and validate
    validate(new LoadFuncTupleIterator(loader));
}

Also used : Path(org.apache.hadoop.fs.Path) OperatorKey(org.apache.pig.impl.plan.OperatorKey) RawSequenceFileRecordReader(com.twitter.elephantbird.mapreduce.input.RawSequenceFileRecordReader) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) Text(org.apache.hadoop.io.Text) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) FileSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit) DataInputBuffer(org.apache.hadoop.io.DataInputBuffer) PigSplit(org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit) SequenceFileLoader(com.twitter.elephantbird.pig.load.SequenceFileLoader) Job(org.apache.hadoop.mapreduce.Job) SequenceFile(org.apache.hadoop.io.SequenceFile) File(java.io.File) InputSplit(org.apache.hadoop.mapreduce.InputSplit) IntWritable(org.apache.hadoop.io.IntWritable) Test(org.junit.Test)

Example 2 with PigSplit

use of org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit in project elephant-bird by twitter.

the class TestSequenceFileStorage method readOutsidePig.

@Test
public void readOutsidePig() throws ClassCastException, ParseException, ClassNotFoundException, InstantiationException, IllegalAccessException, IOException, InterruptedException {
    // simulate Pig front-end runtime
    final SequenceFileLoader<IntWritable, Text> storage = new SequenceFileLoader<IntWritable, Text>("-c " + IntWritableConverter.class.getName(), "-c " + TextConverter.class.getName());
    Job job = new Job();
    storage.setUDFContextSignature("12345");
    storage.setLocation(tempFilename, job);
    // simulate Pig back-end runtime
    RecordReader<DataInputBuffer, DataInputBuffer> reader = new RawSequenceFileRecordReader();
    FileSplit fileSplit = new FileSplit(new Path(tempFilename), 0, new File(tempFilename).length(), new String[] { "localhost" });
    TaskAttemptContext context = HadoopCompat.newTaskAttemptContext(HadoopCompat.getConfiguration(job), new TaskAttemptID());
    reader.initialize(fileSplit, context);
    InputSplit[] wrappedSplits = new InputSplit[] { fileSplit };
    int inputIndex = 0;
    List<OperatorKey> targetOps = Arrays.asList(new OperatorKey("54321", 0));
    int splitIndex = 0;
    PigSplit split = new PigSplit(wrappedSplits, inputIndex, targetOps, splitIndex);
    split.setConf(HadoopCompat.getConfiguration(job));
    storage.prepareToRead(reader, split);
    // read tuples and validate
    validate(new LoadFuncTupleIterator(storage));
}

Also used : Path(org.apache.hadoop.fs.Path) OperatorKey(org.apache.pig.impl.plan.OperatorKey) RawSequenceFileRecordReader(com.twitter.elephantbird.mapreduce.input.RawSequenceFileRecordReader) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) Text(org.apache.hadoop.io.Text) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) FileSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit) DataInputBuffer(org.apache.hadoop.io.DataInputBuffer) LoadFuncTupleIterator(com.twitter.elephantbird.pig.util.LoadFuncTupleIterator) PigSplit(org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit) SequenceFileLoader(com.twitter.elephantbird.pig.load.SequenceFileLoader) Job(org.apache.hadoop.mapreduce.Job) SequenceFile(org.apache.hadoop.io.SequenceFile) File(java.io.File) InputSplit(org.apache.hadoop.mapreduce.InputSplit) IntWritable(org.apache.hadoop.io.IntWritable) Test(org.junit.Test)

Example 3 with PigSplit

use of org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit in project elephant-bird by twitter.

the class TestLocationAsTuple method testSimpleLoad.

@Test
public void testSimpleLoad() throws IOException {
    Configuration conf = new Configuration();
    Job job = EasyMock.createMock(Job.class);
    EasyMock.expect(HadoopCompat.getConfiguration(job)).andStubReturn(conf);
    EasyMock.replay(job);
    LoadFunc loader = new LocationAsTuple();
    loader.setUDFContextSignature("foo");
    loader.setLocation("a\tb", job);
    RecordReader reader = EasyMock.createMock(RecordReader.class);
    PigSplit split = EasyMock.createMock(PigSplit.class);
    EasyMock.expect(split.getConf()).andStubReturn(conf);
    loader.prepareToRead(reader, split);
    Tuple next = loader.getNext();
    assertEquals("a", next.get(0));
    assertEquals("b", next.get(1));
}

Also used : Configuration(org.apache.hadoop.conf.Configuration) RecordReader(org.apache.hadoop.mapreduce.RecordReader) PigSplit(org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit) Job(org.apache.hadoop.mapreduce.Job) Tuple(org.apache.pig.data.Tuple) LoadFunc(org.apache.pig.LoadFunc) Test(org.junit.Test)

Example 4 with PigSplit

use of org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit in project elephant-bird by twitter.

the class TestLocationAsTuple method testTokenizedLoad.

@Test
public void testTokenizedLoad() throws IOException {
    Configuration conf = new Configuration();
    Job job = EasyMock.createMock(Job.class);
    EasyMock.expect(HadoopCompat.getConfiguration(job)).andStubReturn(conf);
    EasyMock.replay(job);
    LoadFunc loader = new LocationAsTuple(",");
    loader.setUDFContextSignature("foo");
    loader.setLocation("a,b\tc", job);
    RecordReader reader = EasyMock.createMock(RecordReader.class);
    PigSplit split = EasyMock.createMock(PigSplit.class);
    EasyMock.expect(split.getConf()).andStubReturn(conf);
    loader.prepareToRead(reader, split);
    Tuple next = loader.getNext();
    assertEquals("a", next.get(0));
    assertEquals("b\tc", next.get(1));
}

Aggregations

Job (org.apache.hadoop.mapreduce.Job)4 PigSplit (org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit)4 Test (org.junit.Test)4 RawSequenceFileRecordReader (com.twitter.elephantbird.mapreduce.input.RawSequenceFileRecordReader)2 SequenceFileLoader (com.twitter.elephantbird.pig.load.SequenceFileLoader)2 File (java.io.File)2 Configuration (org.apache.hadoop.conf.Configuration)2 Path (org.apache.hadoop.fs.Path)2 DataInputBuffer (org.apache.hadoop.io.DataInputBuffer)2 IntWritable (org.apache.hadoop.io.IntWritable)2 SequenceFile (org.apache.hadoop.io.SequenceFile)2 Text (org.apache.hadoop.io.Text)2 InputSplit (org.apache.hadoop.mapreduce.InputSplit)2 RecordReader (org.apache.hadoop.mapreduce.RecordReader)2 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)2 TaskAttemptID (org.apache.hadoop.mapreduce.TaskAttemptID)2 FileSplit (org.apache.hadoop.mapreduce.lib.input.FileSplit)2 LoadFunc (org.apache.pig.LoadFunc)2 Tuple (org.apache.pig.data.Tuple)2 OperatorKey (org.apache.pig.impl.plan.OperatorKey)2