Search in sources :

Example 1 with ColumnFileMetaData

use of org.apache.trevni.ColumnFileMetaData in project trevni by cutting.

the class AvroTrevniOutputFormat method getRecordWriter.

@Override
public RecordWriter<AvroWrapper<T>, NullWritable> getRecordWriter(FileSystem ignore, final JobConf job, final String name, Progressable prog) throws IOException {
    boolean isMapOnly = job.getNumReduceTasks() == 0;
    final Schema schema = isMapOnly ? AvroJob.getMapOutputSchema(job) : AvroJob.getOutputSchema(job);
    final ColumnFileMetaData meta = new ColumnFileMetaData();
    for (Map.Entry<String, String> e : job) if (e.getKey().startsWith(META_PREFIX))
        meta.put(e.getKey().substring(AvroJob.TEXT_PREFIX.length()), e.getValue().getBytes(MetaData.UTF8));
    final Path dir = FileOutputFormat.getTaskOutputPath(job, name);
    final FileSystem fs = dir.getFileSystem(job);
    if (!fs.mkdirs(dir))
        throw new IOException("Failed to create directory: " + dir);
    final long blockSize = fs.getDefaultBlockSize();
    return new RecordWriter<AvroWrapper<T>, NullWritable>() {

        private int part = 0;

        private AvroColumnWriter<T> writer = new AvroColumnWriter<T>(schema, meta, ReflectData.get());

        private void flush() throws IOException {
            OutputStream out = fs.create(new Path(dir, "part-" + (part++) + EXT));
            try {
                writer.writeTo(out);
            } finally {
                out.close();
            }
            writer = new AvroColumnWriter<T>(schema, meta, ReflectData.get());
        }

        public void write(AvroWrapper<T> wrapper, NullWritable ignore) throws IOException {
            writer.write(wrapper.datum());
            if (// block full
            writer.sizeEstimate() >= blockSize)
                flush();
        }

        public void close(Reporter reporter) throws IOException {
            flush();
        }
    };
}
Also used : Path(org.apache.hadoop.fs.Path) Schema(org.apache.avro.Schema) OutputStream(java.io.OutputStream) Reporter(org.apache.hadoop.mapred.Reporter) IOException(java.io.IOException) NullWritable(org.apache.hadoop.io.NullWritable) RecordWriter(org.apache.hadoop.mapred.RecordWriter) ColumnFileMetaData(org.apache.trevni.ColumnFileMetaData) FileSystem(org.apache.hadoop.fs.FileSystem) AvroWrapper(org.apache.avro.mapred.AvroWrapper) Map(java.util.Map)

Example 2 with ColumnFileMetaData

use of org.apache.trevni.ColumnFileMetaData in project trevni by cutting.

the class CreateRandomTool method run.

@Override
public int run(InputStream stdin, PrintStream out, PrintStream err, List<String> args) throws Exception {
    if (args.size() != 3) {
        err.println("Usage: schemaFile count outputFile");
        return 1;
    }
    File schemaFile = new File(args.get(0));
    int count = Integer.parseInt(args.get(1));
    File outputFile = new File(args.get(2));
    Schema schema = Schema.parse(schemaFile);
    AvroColumnWriter<Object> writer = new AvroColumnWriter<Object>(schema, new ColumnFileMetaData());
    for (Object datum : new RandomData(schema, count)) writer.write(datum);
    writer.writeTo(outputFile);
    return 0;
}
Also used : RandomData(org.apache.trevni.avro.RandomData) ColumnFileMetaData(org.apache.trevni.ColumnFileMetaData) Schema(org.apache.avro.Schema) File(java.io.File) AvroColumnWriter(org.apache.trevni.avro.AvroColumnWriter)

Example 3 with ColumnFileMetaData

use of org.apache.trevni.ColumnFileMetaData in project trevni by cutting.

the class TestShredder method checkWrite.

private void checkWrite(Schema schema) throws IOException {
    AvroColumnWriter<Object> writer = new AvroColumnWriter<Object>(schema, new ColumnFileMetaData());
    int count = 0;
    for (Object datum : new RandomData(schema, COUNT)) {
        //System.out.println("datum="+datum);
        writer.write(datum);
    }
    writer.writeTo(FILE);
}
Also used : ColumnFileMetaData(org.apache.trevni.ColumnFileMetaData)

Example 4 with ColumnFileMetaData

use of org.apache.trevni.ColumnFileMetaData in project trevni by cutting.

the class TestCases method runCase.

private void runCase(File dir) throws Exception {
    Schema schema = Schema.parse(new File(dir, "input.avsc"));
    List<Object> data = fromJson(schema, new File(dir, "input.json"));
    // write full data
    AvroColumnWriter<Object> writer = new AvroColumnWriter<Object>(schema, new ColumnFileMetaData());
    for (Object datum : data) writer.write(datum);
    writer.writeTo(FILE);
    // test that the full schema reads correctly
    checkRead(schema, data);
    // test that sub-schemas read correctly
    for (File f : dir.listFiles()) if (f.isDirectory()) {
        Schema s = Schema.parse(new File(f, "sub.avsc"));
        checkRead(s, fromJson(s, new File(f, "sub.json")));
    }
}
Also used : ColumnFileMetaData(org.apache.trevni.ColumnFileMetaData) Schema(org.apache.avro.Schema) File(java.io.File)

Aggregations

ColumnFileMetaData (org.apache.trevni.ColumnFileMetaData)4 Schema (org.apache.avro.Schema)3 File (java.io.File)2 IOException (java.io.IOException)1 OutputStream (java.io.OutputStream)1 Map (java.util.Map)1 AvroWrapper (org.apache.avro.mapred.AvroWrapper)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 Path (org.apache.hadoop.fs.Path)1 NullWritable (org.apache.hadoop.io.NullWritable)1 RecordWriter (org.apache.hadoop.mapred.RecordWriter)1 Reporter (org.apache.hadoop.mapred.Reporter)1 AvroColumnWriter (org.apache.trevni.avro.AvroColumnWriter)1 RandomData (org.apache.trevni.avro.RandomData)1