Search in sources :

Example 11 with FsInput

use of org.apache.avro.mapred.FsInput in project incubator-gobblin by apache.

the class AvroUtils method getDirectorySchema.

/**
 * Get the latest avro schema for a directory
 * @param directory the input dir that contains avro files
 * @param fs the {@link FileSystem} for the given directory.
 * @param latest true to return latest schema, false to return oldest schema
 * @return the latest/oldest schema in the directory
 * @throws IOException
 */
public static Schema getDirectorySchema(Path directory, FileSystem fs, boolean latest) throws IOException {
    Schema schema = null;
    try (Closer closer = Closer.create()) {
        List<FileStatus> files = getDirectorySchemaHelper(directory, fs);
        if (files == null || files.size() == 0) {
            LOG.warn("There is no previous avro file in the directory: " + directory);
        } else {
            FileStatus file = latest ? files.get(0) : files.get(files.size() - 1);
            LOG.debug("Path to get the avro schema: " + file);
            FsInput fi = new FsInput(file.getPath(), fs.getConf());
            GenericDatumReader<GenericRecord> genReader = new GenericDatumReader<>();
            schema = closer.register(new DataFileReader<>(fi, genReader)).getSchema();
        }
    } catch (IOException ioe) {
        throw new IOException("Cannot get the schema for directory " + directory, ioe);
    }
    return schema;
}
Also used : Closer(com.google.common.io.Closer) FileStatus(org.apache.hadoop.fs.FileStatus) FsInput(org.apache.avro.mapred.FsInput) GenericDatumReader(org.apache.avro.generic.GenericDatumReader) Schema(org.apache.avro.Schema) IOException(java.io.IOException) GenericRecord(org.apache.avro.generic.GenericRecord)

Aggregations

FsInput (org.apache.avro.mapred.FsInput)11 IOException (java.io.IOException)6 GenericRecord (org.apache.avro.generic.GenericRecord)6 Configuration (org.apache.hadoop.conf.Configuration)5 DataFileReader (org.apache.avro.file.DataFileReader)4 SeekableInput (org.apache.avro.file.SeekableInput)4 GenericDatumReader (org.apache.avro.generic.GenericDatumReader)4 ArrayList (java.util.ArrayList)3 FileStatus (org.apache.hadoop.fs.FileStatus)3 Path (org.apache.hadoop.fs.Path)3 Schema (org.apache.avro.Schema)2 HiddenFilter (org.apache.gobblin.util.filters.HiddenFilter)2 AbstractIterator (com.google.common.collect.AbstractIterator)1 UnmodifiableIterator (com.google.common.collect.UnmodifiableIterator)1 Closer (com.google.common.io.Closer)1 URI (java.net.URI)1 URISyntaxException (java.net.URISyntaxException)1 Properties (java.util.Properties)1 TreeMap (java.util.TreeMap)1 SpecificDatumReader (org.apache.avro.specific.SpecificDatumReader)1