Search in sources :

Example 6 with Scheme

use of cascading.scheme.Scheme in project parquet-mr by apache.

the class TestParquetTupleScheme method testFieldProjection.

@Test
public void testFieldProjection() throws Exception {
    createFileForRead();
    Path path = new Path(txtOutputPath);
    final FileSystem fs = path.getFileSystem(new Configuration());
    if (fs.exists(path))
        fs.delete(path, true);
    Scheme sourceScheme = new ParquetTupleScheme(new Fields("last_name"));
    Tap source = new Hfs(sourceScheme, parquetInputPath);
    Scheme sinkScheme = new TextLine(new Fields("last_name"));
    Tap sink = new Hfs(sinkScheme, txtOutputPath);
    Pipe assembly = new Pipe("namecp");
    assembly = new Each(assembly, new ProjectedTupleFunction());
    Flow flow = new HadoopFlowConnector().connect("namecp", source, sink, assembly);
    flow.complete();
    String result = FileUtils.readFileToString(new File(txtOutputPath + "/part-00000"));
    assertEquals("Practice\nHope\nHorse\n", result);
}
Also used : Path(org.apache.hadoop.fs.Path) Each(cascading.pipe.Each) Tap(cascading.tap.Tap) Scheme(cascading.scheme.Scheme) Configuration(org.apache.hadoop.conf.Configuration) TextLine(cascading.scheme.hadoop.TextLine) Pipe(cascading.pipe.Pipe) HadoopFlowConnector(cascading.flow.hadoop.HadoopFlowConnector) Flow(cascading.flow.Flow) Fields(cascading.tuple.Fields) Hfs(cascading.tap.hadoop.Hfs) FileSystem(org.apache.hadoop.fs.FileSystem) File(java.io.File) Test(org.junit.Test)

Example 7 with Scheme

use of cascading.scheme.Scheme in project parquet-mr by apache.

the class TestParquetTupleScheme method testReadWrite.

public void testReadWrite(String inputPath) throws Exception {
    createFileForRead();
    Path path = new Path(txtOutputPath);
    final FileSystem fs = path.getFileSystem(new Configuration());
    if (fs.exists(path))
        fs.delete(path, true);
    Scheme sourceScheme = new ParquetTupleScheme(new Fields("first_name", "last_name"));
    Tap source = new Hfs(sourceScheme, inputPath);
    Scheme sinkScheme = new TextLine(new Fields("first", "last"));
    Tap sink = new Hfs(sinkScheme, txtOutputPath);
    Pipe assembly = new Pipe("namecp");
    assembly = new Each(assembly, new UnpackTupleFunction());
    Flow flow = new HadoopFlowConnector().connect("namecp", source, sink, assembly);
    flow.complete();
    String result = FileUtils.readFileToString(new File(txtOutputPath + "/part-00000"));
    assertEquals("Alice\tPractice\nBob\tHope\nCharlie\tHorse\n", result);
}
Also used : Path(org.apache.hadoop.fs.Path) Each(cascading.pipe.Each) Tap(cascading.tap.Tap) Scheme(cascading.scheme.Scheme) Configuration(org.apache.hadoop.conf.Configuration) TextLine(cascading.scheme.hadoop.TextLine) Pipe(cascading.pipe.Pipe) HadoopFlowConnector(cascading.flow.hadoop.HadoopFlowConnector) Flow(cascading.flow.Flow) Fields(cascading.tuple.Fields) Hfs(cascading.tap.hadoop.Hfs) FileSystem(org.apache.hadoop.fs.FileSystem) File(java.io.File)

Aggregations

Flow (cascading.flow.Flow)7 HadoopFlowConnector (cascading.flow.hadoop.HadoopFlowConnector)7 Each (cascading.pipe.Each)7 Pipe (cascading.pipe.Pipe)7 Scheme (cascading.scheme.Scheme)7 TextLine (cascading.scheme.hadoop.TextLine)7 Tap (cascading.tap.Tap)7 Hfs (cascading.tap.hadoop.Hfs)7 Fields (cascading.tuple.Fields)7 Path (org.apache.hadoop.fs.Path)7 Configuration (org.apache.hadoop.conf.Configuration)6 FileSystem (org.apache.hadoop.fs.FileSystem)6 File (java.io.File)5 Test (org.junit.Test)2 JobConf (org.apache.hadoop.mapred.JobConf)1 Config (org.apache.parquet.cascading.ParquetValueScheme.Config)1