Search in sources :

Example 1 with Config

use of org.apache.parquet.cascading.ParquetValueScheme.Config in project parquet-mr by apache.

the class ParquetScroogeSchemeTest method verifyScroogeRead.

public <T> void verifyScroogeRead(List<TBase> recordsToWrite, Class<T> readClass, String expectedStr, String projectionFilter) throws Exception {
    Configuration conf = new Configuration();
    deleteIfExist(PARQUET_PATH);
    deleteIfExist(TXT_OUTPUT_PATH);
    final Path parquetFile = new Path(PARQUET_PATH);
    writeParquetFile(recordsToWrite, conf, parquetFile);
    Scheme sourceScheme = new ParquetScroogeScheme(new Config().withRecordClass(readClass).withProjectionString(projectionFilter));
    Tap source = new Hfs(sourceScheme, PARQUET_PATH);
    Scheme sinkScheme = new TextLine(new Fields("first", "last"));
    Tap sink = new Hfs(sinkScheme, TXT_OUTPUT_PATH);
    Pipe assembly = new Pipe("namecp");
    assembly = new Each(assembly, new ObjectToStringFunction());
    Flow flow = new HadoopFlowConnector().connect("namecp", source, sink, assembly);
    flow.complete();
    String result = FileUtils.readFileToString(new File(TXT_OUTPUT_PATH + "/part-00000"));
    assertEquals(expectedStr, result);
}
Also used : Path(org.apache.hadoop.fs.Path) Each(cascading.pipe.Each) Tap(cascading.tap.Tap) Scheme(cascading.scheme.Scheme) Configuration(org.apache.hadoop.conf.Configuration) TextLine(cascading.scheme.hadoop.TextLine) Config(org.apache.parquet.cascading.ParquetValueScheme.Config) Pipe(cascading.pipe.Pipe) HadoopFlowConnector(cascading.flow.hadoop.HadoopFlowConnector) Flow(cascading.flow.Flow) Fields(cascading.tuple.Fields) Hfs(cascading.tap.hadoop.Hfs) File(java.io.File)

Aggregations

Flow (cascading.flow.Flow)1 HadoopFlowConnector (cascading.flow.hadoop.HadoopFlowConnector)1 Each (cascading.pipe.Each)1 Pipe (cascading.pipe.Pipe)1 Scheme (cascading.scheme.Scheme)1 TextLine (cascading.scheme.hadoop.TextLine)1 Tap (cascading.tap.Tap)1 Hfs (cascading.tap.hadoop.Hfs)1 Fields (cascading.tuple.Fields)1 File (java.io.File)1 Configuration (org.apache.hadoop.conf.Configuration)1 Path (org.apache.hadoop.fs.Path)1 Config (org.apache.parquet.cascading.ParquetValueScheme.Config)1