Search in sources :

Example 6 with Hfs

use of cascading.tap.hadoop.Hfs in project parquet-mr by apache.

the class ParquetScroogeSchemeTest method doRead.

private void doRead() throws Exception {
    Path path = new Path(txtOutputPath);
    final FileSystem fs = path.getFileSystem(new Configuration());
    if (fs.exists(path))
        fs.delete(path, true);
    Scheme sourceScheme = new ParquetScroogeScheme<Name>(Name.class);
    Tap source = new Hfs(sourceScheme, parquetOutputPath);
    Scheme sinkScheme = new TextLine(new Fields("first", "last"));
    Tap sink = new Hfs(sinkScheme, txtOutputPath);
    Pipe assembly = new Pipe("namecp");
    assembly = new Each(assembly, new UnpackThriftFunction());
    Flow flow = new HadoopFlowConnector().connect("namecp", source, sink, assembly);
    flow.complete();
    String result = FileUtils.readFileToString(new File(txtOutputPath + "/part-00000"));
    assertEquals("0\tAlice\tPractice\n15\tBob\tHope\n24\tCharlie\tHorse\n", result);
}
Also used : Path(org.apache.hadoop.fs.Path) Each(cascading.pipe.Each) Tap(cascading.tap.Tap) Scheme(cascading.scheme.Scheme) Configuration(org.apache.hadoop.conf.Configuration) TextLine(cascading.scheme.hadoop.TextLine) Pipe(cascading.pipe.Pipe) HadoopFlowConnector(cascading.flow.hadoop.HadoopFlowConnector) Flow(cascading.flow.Flow) Fields(cascading.tuple.Fields) Hfs(cascading.tap.hadoop.Hfs) FileSystem(org.apache.hadoop.fs.FileSystem) File(java.io.File)

Example 7 with Hfs

use of cascading.tap.hadoop.Hfs in project parquet-mr by apache.

the class TestParquetTupleScheme method testFieldProjection.

@Test
public void testFieldProjection() throws Exception {
    createFileForRead();
    Path path = new Path(txtOutputPath);
    final FileSystem fs = path.getFileSystem(new Configuration());
    if (fs.exists(path))
        fs.delete(path, true);
    Scheme sourceScheme = new ParquetTupleScheme(new Fields("last_name"));
    Tap source = new Hfs(sourceScheme, parquetInputPath);
    Scheme sinkScheme = new TextLine(new Fields("last_name"));
    Tap sink = new Hfs(sinkScheme, txtOutputPath);
    Pipe assembly = new Pipe("namecp");
    assembly = new Each(assembly, new ProjectedTupleFunction());
    Flow flow = new HadoopFlowConnector().connect("namecp", source, sink, assembly);
    flow.complete();
    String result = FileUtils.readFileToString(new File(txtOutputPath + "/part-00000"));
    assertEquals("Practice\nHope\nHorse\n", result);
}
Also used : Path(org.apache.hadoop.fs.Path) Each(cascading.pipe.Each) Tap(cascading.tap.Tap) Scheme(cascading.scheme.Scheme) Configuration(org.apache.hadoop.conf.Configuration) TextLine(cascading.scheme.hadoop.TextLine) Pipe(cascading.pipe.Pipe) HadoopFlowConnector(cascading.flow.hadoop.HadoopFlowConnector) Flow(cascading.flow.Flow) Fields(cascading.tuple.Fields) Hfs(cascading.tap.hadoop.Hfs) FileSystem(org.apache.hadoop.fs.FileSystem) File(java.io.File) Test(org.junit.Test)

Example 8 with Hfs

use of cascading.tap.hadoop.Hfs in project parquet-mr by apache.

the class TestParquetTupleScheme method testReadWrite.

public void testReadWrite(String inputPath) throws Exception {
    createFileForRead();
    Path path = new Path(txtOutputPath);
    final FileSystem fs = path.getFileSystem(new Configuration());
    if (fs.exists(path))
        fs.delete(path, true);
    Scheme sourceScheme = new ParquetTupleScheme(new Fields("first_name", "last_name"));
    Tap source = new Hfs(sourceScheme, inputPath);
    Scheme sinkScheme = new TextLine(new Fields("first", "last"));
    Tap sink = new Hfs(sinkScheme, txtOutputPath);
    Pipe assembly = new Pipe("namecp");
    assembly = new Each(assembly, new UnpackTupleFunction());
    Flow flow = new HadoopFlowConnector().connect("namecp", source, sink, assembly);
    flow.complete();
    String result = FileUtils.readFileToString(new File(txtOutputPath + "/part-00000"));
    assertEquals("Alice\tPractice\nBob\tHope\nCharlie\tHorse\n", result);
}
Also used : Path(org.apache.hadoop.fs.Path) Each(cascading.pipe.Each) Tap(cascading.tap.Tap) Scheme(cascading.scheme.Scheme) Configuration(org.apache.hadoop.conf.Configuration) TextLine(cascading.scheme.hadoop.TextLine) Pipe(cascading.pipe.Pipe) HadoopFlowConnector(cascading.flow.hadoop.HadoopFlowConnector) Flow(cascading.flow.Flow) Fields(cascading.tuple.Fields) Hfs(cascading.tap.hadoop.Hfs) FileSystem(org.apache.hadoop.fs.FileSystem) File(java.io.File)

Example 9 with Hfs

use of cascading.tap.hadoop.Hfs in project parquet-mr by apache.

the class ParquetTupleScheme method readSchema.

private MessageType readSchema(FlowProcess<? extends JobConf> flowProcess, Tap tap) {
    try {
        Hfs hfs;
        if (tap instanceof CompositeTap)
            hfs = (Hfs) ((CompositeTap) tap).getChildTaps().next();
        else
            hfs = (Hfs) tap;
        List<Footer> footers = getFooters(flowProcess, hfs);
        if (footers.isEmpty()) {
            throw new TapException("Could not read Parquet metadata at " + hfs.getPath());
        } else {
            return footers.get(0).getParquetMetadata().getFileMetaData().getSchema();
        }
    } catch (IOException e) {
        throw new TapException(e);
    }
}
Also used : Hfs(cascading.tap.hadoop.Hfs) CompositeTap(cascading.tap.CompositeTap) Footer(org.apache.parquet.hadoop.Footer) TapException(cascading.tap.TapException) IOException(java.io.IOException)

Example 10 with Hfs

use of cascading.tap.hadoop.Hfs in project parquet-mr by apache.

the class ParquetTupleScheme method readSchema.

private MessageType readSchema(FlowProcess<JobConf> flowProcess, Tap tap) {
    try {
        Hfs hfs;
        if (tap instanceof CompositeTap)
            hfs = (Hfs) ((CompositeTap) tap).getChildTaps().next();
        else
            hfs = (Hfs) tap;
        List<Footer> footers = getFooters(flowProcess, hfs);
        if (footers.isEmpty()) {
            throw new TapException("Could not read Parquet metadata at " + hfs.getPath());
        } else {
            return footers.get(0).getParquetMetadata().getFileMetaData().getSchema();
        }
    } catch (IOException e) {
        throw new TapException(e);
    }
}
Also used : Hfs(cascading.tap.hadoop.Hfs) CompositeTap(cascading.tap.CompositeTap) Footer(org.apache.parquet.hadoop.Footer) TapException(cascading.tap.TapException) IOException(java.io.IOException)

Aggregations

Hfs (cascading.tap.hadoop.Hfs)11 Flow (cascading.flow.Flow)9 Each (cascading.pipe.Each)9 Pipe (cascading.pipe.Pipe)9 Tap (cascading.tap.Tap)9 Fields (cascading.tuple.Fields)9 HadoopFlowConnector (cascading.flow.hadoop.HadoopFlowConnector)8 Scheme (cascading.scheme.Scheme)7 TextLine (cascading.scheme.hadoop.TextLine)7 Path (org.apache.hadoop.fs.Path)7 Configuration (org.apache.hadoop.conf.Configuration)6 FileSystem (org.apache.hadoop.fs.FileSystem)6 File (java.io.File)5 FlowDef (cascading.flow.FlowDef)2 Insert (cascading.operation.Insert)2 ExpressionFunction (cascading.operation.expression.ExpressionFunction)2 RegexFilter (cascading.operation.regex.RegexFilter)2 RegexSplitGenerator (cascading.operation.regex.RegexSplitGenerator)2 CoGroup (cascading.pipe.CoGroup)2 GroupBy (cascading.pipe.GroupBy)2