Search in sources :

Example 1 with UnmodifiableIterator

use of com.google.common.collect.UnmodifiableIterator in project crunch by cloudera.

the class TextFileReaderFactory method read.

@Override
public Iterator<T> read(FileSystem fs, Path path) {
    MapFn mapFn = null;
    if (String.class.equals(ptype.getTypeClass())) {
        mapFn = IdentityFn.getInstance();
    } else {
        // Check for a composite MapFn for the PType.
        // Note that this won't work for Avro-- need to solve that.
        MapFn input = ptype.getInputMapFn();
        if (input instanceof CompositeMapFn) {
            mapFn = ((CompositeMapFn) input).getSecond();
        }
    }
    mapFn.setConfigurationForTest(conf);
    mapFn.initialize();
    FSDataInputStream is = null;
    try {
        is = fs.open(path);
    } catch (IOException e) {
        LOG.info("Could not read path: " + path, e);
        return Iterators.emptyIterator();
    }
    final BufferedReader reader = new BufferedReader(new InputStreamReader(is));
    final MapFn<String, T> iterMapFn = mapFn;
    return new UnmodifiableIterator<T>() {

        private String nextLine;

        @Override
        public boolean hasNext() {
            try {
                return (nextLine = reader.readLine()) != null;
            } catch (IOException e) {
                LOG.info("Exception reading text file stream", e);
                return false;
            }
        }

        @Override
        public T next() {
            return iterMapFn.map(nextLine);
        }
    };
}
Also used : UnmodifiableIterator(com.google.common.collect.UnmodifiableIterator) InputStreamReader(java.io.InputStreamReader) CompositeMapFn(org.apache.crunch.fn.CompositeMapFn) BufferedReader(java.io.BufferedReader) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) IOException(java.io.IOException) CompositeMapFn(org.apache.crunch.fn.CompositeMapFn) MapFn(org.apache.crunch.MapFn)

Example 2 with UnmodifiableIterator

use of com.google.common.collect.UnmodifiableIterator in project crunch by cloudera.

the class AvroFileReaderFactory method read.

@Override
public Iterator<T> read(FileSystem fs, final Path path) {
    this.mapFn.setConfigurationForTest(conf);
    this.mapFn.initialize();
    try {
        FsInput fsi = new FsInput(path, fs.getConf());
        final DataFileReader<T> reader = new DataFileReader<T>(fsi, recordReader);
        return new UnmodifiableIterator<T>() {

            @Override
            public boolean hasNext() {
                return reader.hasNext();
            }

            @Override
            public T next() {
                return mapFn.map(reader.next());
            }
        };
    } catch (IOException e) {
        LOG.info("Could not read avro file at path: " + path, e);
        return Iterators.emptyIterator();
    }
}
Also used : UnmodifiableIterator(com.google.common.collect.UnmodifiableIterator) DataFileReader(org.apache.avro.file.DataFileReader) FsInput(org.apache.avro.mapred.FsInput) IOException(java.io.IOException)

Aggregations

UnmodifiableIterator (com.google.common.collect.UnmodifiableIterator)2 IOException (java.io.IOException)2 BufferedReader (java.io.BufferedReader)1 InputStreamReader (java.io.InputStreamReader)1 DataFileReader (org.apache.avro.file.DataFileReader)1 FsInput (org.apache.avro.mapred.FsInput)1 MapFn (org.apache.crunch.MapFn)1 CompositeMapFn (org.apache.crunch.fn.CompositeMapFn)1 FSDataInputStream (org.apache.hadoop.fs.FSDataInputStream)1