Search in sources :

Example 1 with Serializer

use of org.apache.hadoop.io.serializer.Serializer in project crunch by cloudera.

the class CrunchInputSplit method write.

public void write(DataOutput out) throws IOException {
    out.writeInt(nodeIndex);
    out.writeInt(extraConf.size());
    for (Map.Entry<String, String> e : extraConf.entrySet()) {
        out.writeUTF(e.getKey());
        out.writeUTF(e.getValue());
    }
    Text.writeString(out, inputFormatClass.getName());
    Text.writeString(out, inputSplit.getClass().getName());
    SerializationFactory factory = new SerializationFactory(conf);
    Serializer serializer = factory.getSerializer(inputSplit.getClass());
    serializer.open((DataOutputStream) out);
    serializer.serialize(inputSplit);
}
Also used : SerializationFactory(org.apache.hadoop.io.serializer.SerializationFactory) Map(java.util.Map) Serializer(org.apache.hadoop.io.serializer.Serializer)

Example 2 with Serializer

use of org.apache.hadoop.io.serializer.Serializer in project hadoop by apache.

the class TaggedInputSplit method write.

@SuppressWarnings("unchecked")
public void write(DataOutput out) throws IOException {
    Text.writeString(out, inputSplitClass.getName());
    Text.writeString(out, inputFormatClass.getName());
    Text.writeString(out, mapperClass.getName());
    SerializationFactory factory = new SerializationFactory(conf);
    Serializer serializer = factory.getSerializer(inputSplitClass);
    serializer.open((DataOutputStream) out);
    serializer.serialize(inputSplit);
}
Also used : SerializationFactory(org.apache.hadoop.io.serializer.SerializationFactory) Serializer(org.apache.hadoop.io.serializer.Serializer)

Example 3 with Serializer

use of org.apache.hadoop.io.serializer.Serializer in project tez by apache.

the class TestValuesIterator method createInMemStreams.

/**
 * create inmemory segments
 *
 * @return
 * @throws IOException
 */
@SuppressWarnings("unchecked")
public List<TezMerger.Segment> createInMemStreams() throws IOException {
    int numberOfStreams = Math.max(2, rnd.nextInt(10));
    LOG.info("No of streams : " + numberOfStreams);
    SerializationFactory serializationFactory = new SerializationFactory(conf);
    Serializer keySerializer = serializationFactory.getSerializer(keyClass);
    Serializer valueSerializer = serializationFactory.getSerializer(valClass);
    LocalDirAllocator localDirAllocator = new LocalDirAllocator(TezRuntimeFrameworkConfigs.LOCAL_DIRS);
    InputContext context = createTezInputContext();
    MergeManager mergeManager = new MergeManager(conf, fs, localDirAllocator, context, null, null, null, null, null, 1024 * 1024 * 10, null, false, -1);
    DataOutputBuffer keyBuf = new DataOutputBuffer();
    DataOutputBuffer valBuf = new DataOutputBuffer();
    DataInputBuffer keyIn = new DataInputBuffer();
    DataInputBuffer valIn = new DataInputBuffer();
    keySerializer.open(keyBuf);
    valueSerializer.open(valBuf);
    List<TezMerger.Segment> segments = new LinkedList<TezMerger.Segment>();
    for (int i = 0; i < numberOfStreams; i++) {
        BoundedByteArrayOutputStream bout = new BoundedByteArrayOutputStream(1024 * 1024);
        InMemoryWriter writer = new InMemoryWriter(bout);
        Map<Writable, Writable> data = createData();
        // write data
        for (Map.Entry<Writable, Writable> entry : data.entrySet()) {
            keySerializer.serialize(entry.getKey());
            valueSerializer.serialize(entry.getValue());
            keyIn.reset(keyBuf.getData(), 0, keyBuf.getLength());
            valIn.reset(valBuf.getData(), 0, valBuf.getLength());
            writer.append(keyIn, valIn);
            originalData.put(entry.getKey(), entry.getValue());
            keyBuf.reset();
            valBuf.reset();
            keyIn.reset();
            valIn.reset();
        }
        IFile.Reader reader = new InMemoryReader(mergeManager, null, bout.getBuffer(), 0, bout.getBuffer().length);
        segments.add(new TezMerger.Segment(reader, null));
        data.clear();
        writer.close();
    }
    return segments;
}
Also used : InMemoryReader(org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader) IFile(org.apache.tez.runtime.library.common.sort.impl.IFile) InputContext(org.apache.tez.runtime.api.InputContext) SerializationFactory(org.apache.hadoop.io.serializer.SerializationFactory) Writable(org.apache.hadoop.io.Writable) LongWritable(org.apache.hadoop.io.LongWritable) IntWritable(org.apache.hadoop.io.IntWritable) BytesWritable(org.apache.hadoop.io.BytesWritable) MergeManager(org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager) LinkedList(java.util.LinkedList) TezMerger(org.apache.tez.runtime.library.common.sort.impl.TezMerger) InMemoryWriter(org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryWriter) DataInputBuffer(org.apache.hadoop.io.DataInputBuffer) BoundedByteArrayOutputStream(org.apache.hadoop.io.BoundedByteArrayOutputStream) DataOutputBuffer(org.apache.hadoop.io.DataOutputBuffer) LocalDirAllocator(org.apache.hadoop.fs.LocalDirAllocator) Map(java.util.Map) TreeMap(java.util.TreeMap) Serializer(org.apache.hadoop.io.serializer.Serializer)

Example 4 with Serializer

use of org.apache.hadoop.io.serializer.Serializer in project cdap by caskdata.

the class TaggedInputSplit method write.

@SuppressWarnings("unchecked")
@Override
public final void write(DataOutput out) throws IOException {
    Class<? extends InputSplit> inputSplitClass = inputSplit.getClass();
    Text.writeString(out, inputSplitClass.getName());
    writeAdditionalFields(out);
    SerializationFactory factory = new SerializationFactory(conf);
    Serializer serializer = factory.getSerializer(inputSplitClass);
    serializer.open((DataOutputStream) out);
    serializer.serialize(inputSplit);
}
Also used : SerializationFactory(org.apache.hadoop.io.serializer.SerializationFactory) Serializer(org.apache.hadoop.io.serializer.Serializer)

Example 5 with Serializer

use of org.apache.hadoop.io.serializer.Serializer in project apex-malhar by apache.

the class MapOperatorTest method testNodeProcessingSchema.

public void testNodeProcessingSchema(MapOperator<LongWritable, Text, Text, IntWritable> oper) throws IOException {
    CollectorTestSink sortSink = new CollectorTestSink();
    oper.output.setSink(sortSink);
    oper.setMapClass(WordCount.Map.class);
    oper.setCombineClass(WordCount.Reduce.class);
    oper.setDirName(testMeta.testDir);
    oper.setConfigFile(null);
    oper.setInputFormatClass(TextInputFormat.class);
    Configuration conf = new Configuration();
    JobConf jobConf = new JobConf(conf);
    FileInputFormat.setInputPaths(jobConf, new Path(testMeta.testDir));
    TextInputFormat inputFormat = new TextInputFormat();
    inputFormat.configure(jobConf);
    InputSplit[] splits = inputFormat.getSplits(jobConf, 1);
    SerializationFactory serializationFactory = new SerializationFactory(conf);
    Serializer keySerializer = serializationFactory.getSerializer(splits[0].getClass());
    keySerializer.open(oper.getOutstream());
    keySerializer.serialize(splits[0]);
    oper.setInputSplitClass(splits[0].getClass());
    keySerializer.close();
    oper.setup(null);
    oper.beginWindow(0);
    oper.emitTuples();
    oper.emitTuples();
    oper.endWindow();
    oper.beginWindow(1);
    oper.emitTuples();
    oper.endWindow();
    Assert.assertEquals("number emitted tuples", 3, sortSink.collectedTuples.size());
    for (Object o : sortSink.collectedTuples) {
        LOG.debug(o.toString());
    }
    LOG.debug("Done testing round\n");
    oper.teardown();
}
Also used : Path(org.apache.hadoop.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) TextInputFormat(org.apache.hadoop.mapred.TextInputFormat) SerializationFactory(org.apache.hadoop.io.serializer.SerializationFactory) JobConf(org.apache.hadoop.mapred.JobConf) InputSplit(org.apache.hadoop.mapred.InputSplit) CollectorTestSink(org.apache.apex.malhar.lib.testbench.CollectorTestSink) Serializer(org.apache.hadoop.io.serializer.Serializer)

Aggregations

SerializationFactory (org.apache.hadoop.io.serializer.SerializationFactory)11 Serializer (org.apache.hadoop.io.serializer.Serializer)11 DataOutputBuffer (org.apache.hadoop.io.DataOutputBuffer)4 Configuration (org.apache.hadoop.conf.Configuration)3 JobConf (org.apache.hadoop.mapred.JobConf)3 SplitInfo (com.tencent.angel.split.SplitInfo)2 IOException (java.io.IOException)2 Map (java.util.Map)2 InputSplit (org.apache.hadoop.mapred.InputSplit)2 DefaultPartition (com.datatorrent.api.DefaultPartition)1 ArrayList (java.util.ArrayList)1 Collection (java.util.Collection)1 LinkedList (java.util.LinkedList)1 TreeMap (java.util.TreeMap)1 CollectorTestSink (org.apache.apex.malhar.lib.testbench.CollectorTestSink)1 LocalDirAllocator (org.apache.hadoop.fs.LocalDirAllocator)1 Path (org.apache.hadoop.fs.Path)1 BoundedByteArrayOutputStream (org.apache.hadoop.io.BoundedByteArrayOutputStream)1 BytesWritable (org.apache.hadoop.io.BytesWritable)1 DataInputBuffer (org.apache.hadoop.io.DataInputBuffer)1