Search in sources :

Example 1 with TypedBytesInput

use of org.apache.hadoop.typedbytes.TypedBytesInput in project hadoop by apache.

the class TypedBytesOutputReader method initialize.

@Override
public void initialize(PipeMapRed pipeMapRed) throws IOException {
    super.initialize(pipeMapRed);
    clientIn = pipeMapRed.getClientInput();
    key = new TypedBytesWritable();
    value = new TypedBytesWritable();
    in = new TypedBytesInput(clientIn);
}
Also used : TypedBytesInput(org.apache.hadoop.typedbytes.TypedBytesInput) TypedBytesWritable(org.apache.hadoop.typedbytes.TypedBytesWritable)

Example 2 with TypedBytesInput

use of org.apache.hadoop.typedbytes.TypedBytesInput in project hadoop by apache.

the class TestDumpTypedBytes method testDumping.

@Test
public void testDumping() throws Exception {
    Configuration conf = new Configuration();
    MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).numDataNodes(2).build();
    FileSystem fs = cluster.getFileSystem();
    PrintStream psBackup = System.out;
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    PrintStream psOut = new PrintStream(out);
    System.setOut(psOut);
    DumpTypedBytes dumptb = new DumpTypedBytes(conf);
    try {
        Path root = new Path("/typedbytestest");
        assertTrue(fs.mkdirs(root));
        assertTrue(fs.exists(root));
        OutputStreamWriter writer = new OutputStreamWriter(fs.create(new Path(root, "test.txt")));
        try {
            for (int i = 0; i < 100; i++) {
                writer.write("" + (10 * i) + "\n");
            }
        } finally {
            writer.close();
        }
        String[] args = new String[1];
        args[0] = "/typedbytestest";
        int ret = dumptb.run(args);
        assertEquals("Return value != 0.", 0, ret);
        ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray());
        TypedBytesInput tbinput = new TypedBytesInput(new DataInputStream(in));
        int counter = 0;
        Object key = tbinput.read();
        while (key != null) {
            // offset
            assertEquals(Long.class, key.getClass());
            Object value = tbinput.read();
            assertEquals(String.class, value.getClass());
            assertTrue("Invalid output.", Integer.parseInt(value.toString()) % 10 == 0);
            counter++;
            key = tbinput.read();
        }
        assertEquals("Wrong number of outputs.", 100, counter);
    } finally {
        try {
            fs.close();
        } catch (Exception e) {
        }
        System.setOut(psBackup);
        cluster.shutdown();
    }
}
Also used : Path(org.apache.hadoop.fs.Path) PrintStream(java.io.PrintStream) MiniDFSCluster(org.apache.hadoop.hdfs.MiniDFSCluster) Configuration(org.apache.hadoop.conf.Configuration) TypedBytesInput(org.apache.hadoop.typedbytes.TypedBytesInput) ByteArrayOutputStream(java.io.ByteArrayOutputStream) DataInputStream(java.io.DataInputStream) DumpTypedBytes(org.apache.hadoop.streaming.DumpTypedBytes) ByteArrayInputStream(java.io.ByteArrayInputStream) FileSystem(org.apache.hadoop.fs.FileSystem) OutputStreamWriter(java.io.OutputStreamWriter) Test(org.junit.Test)

Example 3 with TypedBytesInput

use of org.apache.hadoop.typedbytes.TypedBytesInput in project hadoop by apache.

the class TypedBytesMapApp method go.

public void go() throws IOException {
    TypedBytesInput tbinput = new TypedBytesInput(new DataInputStream(System.in));
    TypedBytesOutput tboutput = new TypedBytesOutput(new DataOutputStream(System.out));
    Object key = tbinput.readRaw();
    while (key != null) {
        Object value = tbinput.read();
        for (String part : value.toString().split(find)) {
            // write key
            tboutput.write(part);
            // write value
            tboutput.write(1);
        }
        System.err.println("reporter:counter:UserCounters,InputLines,1");
        key = tbinput.readRaw();
    }
    System.out.flush();
}
Also used : TypedBytesInput(org.apache.hadoop.typedbytes.TypedBytesInput) DataOutputStream(java.io.DataOutputStream) TypedBytesOutput(org.apache.hadoop.typedbytes.TypedBytesOutput) DataInputStream(java.io.DataInputStream)

Example 4 with TypedBytesInput

use of org.apache.hadoop.typedbytes.TypedBytesInput in project hadoop by apache.

the class TypedBytesReduceApp method go.

public void go() throws IOException {
    TypedBytesInput tbinput = new TypedBytesInput(new DataInputStream(System.in));
    TypedBytesOutput tboutput = new TypedBytesOutput(new DataOutputStream(System.out));
    Object prevKey = null;
    int sum = 0;
    Object key = tbinput.read();
    while (key != null) {
        if (prevKey != null && !key.equals(prevKey)) {
            // write key
            tboutput.write(prevKey);
            // write value
            tboutput.write(sum);
            sum = 0;
        }
        sum += (Integer) tbinput.read();
        prevKey = key;
        key = tbinput.read();
    }
    tboutput.write(prevKey);
    tboutput.write(sum);
    System.out.flush();
}
Also used : TypedBytesInput(org.apache.hadoop.typedbytes.TypedBytesInput) DataOutputStream(java.io.DataOutputStream) TypedBytesOutput(org.apache.hadoop.typedbytes.TypedBytesOutput) DataInputStream(java.io.DataInputStream)

Example 5 with TypedBytesInput

use of org.apache.hadoop.typedbytes.TypedBytesInput in project hadoop by apache.

the class LoadTypedBytes method run.

/**
   * The main driver for <code>LoadTypedBytes</code>.
   */
public int run(String[] args) throws Exception {
    if (args.length == 0) {
        System.err.println("Too few arguments!");
        printUsage();
        return 1;
    }
    Path path = new Path(args[0]);
    FileSystem fs = path.getFileSystem(getConf());
    if (fs.exists(path)) {
        System.err.println("given path exists already!");
        return -1;
    }
    TypedBytesInput tbinput = new TypedBytesInput(new DataInputStream(System.in));
    SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, path, TypedBytesWritable.class, TypedBytesWritable.class);
    try {
        TypedBytesWritable key = new TypedBytesWritable();
        TypedBytesWritable value = new TypedBytesWritable();
        byte[] rawKey = tbinput.readRaw();
        while (rawKey != null) {
            byte[] rawValue = tbinput.readRaw();
            key.set(rawKey, 0, rawKey.length);
            value.set(rawValue, 0, rawValue.length);
            writer.append(key, value);
            rawKey = tbinput.readRaw();
        }
    } finally {
        writer.close();
    }
    return 0;
}
Also used : Path(org.apache.hadoop.fs.Path) TypedBytesInput(org.apache.hadoop.typedbytes.TypedBytesInput) SequenceFile(org.apache.hadoop.io.SequenceFile) FileSystem(org.apache.hadoop.fs.FileSystem) DataInputStream(java.io.DataInputStream) TypedBytesWritable(org.apache.hadoop.typedbytes.TypedBytesWritable)

Aggregations

TypedBytesInput (org.apache.hadoop.typedbytes.TypedBytesInput)5 DataInputStream (java.io.DataInputStream)4 DataOutputStream (java.io.DataOutputStream)2 FileSystem (org.apache.hadoop.fs.FileSystem)2 Path (org.apache.hadoop.fs.Path)2 TypedBytesOutput (org.apache.hadoop.typedbytes.TypedBytesOutput)2 TypedBytesWritable (org.apache.hadoop.typedbytes.TypedBytesWritable)2 ByteArrayInputStream (java.io.ByteArrayInputStream)1 ByteArrayOutputStream (java.io.ByteArrayOutputStream)1 OutputStreamWriter (java.io.OutputStreamWriter)1 PrintStream (java.io.PrintStream)1 Configuration (org.apache.hadoop.conf.Configuration)1 MiniDFSCluster (org.apache.hadoop.hdfs.MiniDFSCluster)1 SequenceFile (org.apache.hadoop.io.SequenceFile)1 DumpTypedBytes (org.apache.hadoop.streaming.DumpTypedBytes)1 Test (org.junit.Test)1