Search in sources :

Example 16 with Reader

use of org.apache.hadoop.io.SequenceFile.Reader in project nutch by apache.

the class SequenceReader method read.

@Override
public List<List<String>> read(String path) throws FileNotFoundException {
    // TODO Auto-generated method stub
    List<List<String>> rows = new ArrayList<>();
    Path file = new Path(path);
    SequenceFile.Reader reader;
    try {
        reader = new SequenceFile.Reader(conf, Reader.file(file));
        Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf);
        Writable value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf);
        while (reader.next(key, value)) {
            List<String> row = new ArrayList<>();
            row.add(key.toString());
            row.add(value.toString());
            rows.add(row);
        }
        reader.close();
    } catch (FileNotFoundException fne) {
        throw new FileNotFoundException();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
        LOG.error("Error occurred while reading file {} : {}", file, StringUtils.stringifyException(e));
        throw new WebApplicationException();
    }
    return rows;
}
Also used : Path(org.apache.hadoop.fs.Path) Reader(org.apache.hadoop.io.SequenceFile.Reader) SequenceFile(org.apache.hadoop.io.SequenceFile) WebApplicationException(javax.ws.rs.WebApplicationException) ArrayList(java.util.ArrayList) FileNotFoundException(java.io.FileNotFoundException) Writable(org.apache.hadoop.io.Writable) List(java.util.List) ArrayList(java.util.ArrayList) IOException(java.io.IOException)

Example 17 with Reader

use of org.apache.hadoop.io.SequenceFile.Reader in project nutch by apache.

the class NodeReader method slice.

@Override
public List slice(String path, int start, int end) throws FileNotFoundException {
    List<HashMap> rows = new ArrayList<>();
    Path file = new Path(path);
    SequenceFile.Reader reader;
    try {
        reader = new SequenceFile.Reader(conf, Reader.file(file));
        Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf);
        Node value = new Node();
        int i = 0;
        // increment to read start position
        for (; i < start && reader.next(key, value); i++) {
        }
        while (reader.next(key, value) && i < end) {
            HashMap<String, String> t_row = getNodeRow(key, value);
            rows.add(t_row);
            i++;
        }
        reader.close();
    } catch (FileNotFoundException fne) {
        throw new FileNotFoundException();
    } catch (IOException e) {
        e.printStackTrace();
        LOG.error("Error occurred while reading file {} : {}", file, StringUtils.stringifyException(e));
        throw new WebApplicationException();
    }
    return rows;
}
Also used : Path(org.apache.hadoop.fs.Path) Reader(org.apache.hadoop.io.SequenceFile.Reader) WebApplicationException(javax.ws.rs.WebApplicationException) HashMap(java.util.HashMap) Node(org.apache.nutch.scoring.webgraph.Node) ArrayList(java.util.ArrayList) FileNotFoundException(java.io.FileNotFoundException) Writable(org.apache.hadoop.io.Writable) IOException(java.io.IOException) SequenceFile(org.apache.hadoop.io.SequenceFile)

Example 18 with Reader

use of org.apache.hadoop.io.SequenceFile.Reader in project nutch by apache.

the class NodeReader method head.

@Override
public List head(String path, int nrows) throws FileNotFoundException {
    List<HashMap> rows = new ArrayList<>();
    Path file = new Path(path);
    SequenceFile.Reader reader;
    try {
        reader = new SequenceFile.Reader(conf, Reader.file(file));
        Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf);
        Node value = new Node();
        int i = 0;
        while (reader.next(key, value) && i < nrows) {
            HashMap<String, String> t_row = getNodeRow(key, value);
            rows.add(t_row);
            i++;
        }
        reader.close();
    } catch (FileNotFoundException fne) {
        throw new FileNotFoundException();
    } catch (IOException e) {
        e.printStackTrace();
        LOG.error("Error occurred while reading file {} : {}", file, StringUtils.stringifyException(e));
        throw new WebApplicationException();
    }
    return rows;
}
Also used : Path(org.apache.hadoop.fs.Path) Reader(org.apache.hadoop.io.SequenceFile.Reader) WebApplicationException(javax.ws.rs.WebApplicationException) HashMap(java.util.HashMap) Node(org.apache.nutch.scoring.webgraph.Node) ArrayList(java.util.ArrayList) FileNotFoundException(java.io.FileNotFoundException) Writable(org.apache.hadoop.io.Writable) IOException(java.io.IOException) SequenceFile(org.apache.hadoop.io.SequenceFile)

Example 19 with Reader

use of org.apache.hadoop.io.SequenceFile.Reader in project nutch by apache.

the class NodeReader method count.

@Override
public int count(String path) throws FileNotFoundException {
    Path file = new Path(path);
    SequenceFile.Reader reader;
    int i = 0;
    try {
        reader = new SequenceFile.Reader(conf, Reader.file(file));
        Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf);
        Node value = new Node();
        while (reader.next(key, value)) {
            i++;
        }
        reader.close();
    } catch (FileNotFoundException fne) {
        throw new FileNotFoundException();
    } catch (IOException e) {
        e.printStackTrace();
        LOG.error("Error occurred while reading file {} : {}", file, StringUtils.stringifyException(e));
        throw new WebApplicationException();
    }
    return i;
}
Also used : Path(org.apache.hadoop.fs.Path) Reader(org.apache.hadoop.io.SequenceFile.Reader) SequenceFile(org.apache.hadoop.io.SequenceFile) WebApplicationException(javax.ws.rs.WebApplicationException) Node(org.apache.nutch.scoring.webgraph.Node) FileNotFoundException(java.io.FileNotFoundException) Writable(org.apache.hadoop.io.Writable) IOException(java.io.IOException)

Example 20 with Reader

use of org.apache.hadoop.io.SequenceFile.Reader in project incubator-systemml by apache.

the class TfUtils method initOffsetsReader.

private Reader initOffsetsReader(JobConf job) throws IOException {
    Path path = new Path(job.get(CSVReblockMR.ROWID_FILE_NAME));
    FileSystem fs = IOUtilFunctions.getFileSystem(path, job);
    Path[] files = MatrixReader.getSequenceFilePaths(fs, path);
    if (files.length != 1)
        throw new IOException("Expecting a single file under counters file: " + path.toString());
    Reader reader = new SequenceFile.Reader(fs, files[0], job);
    return reader;
}
Also used : Path(org.apache.hadoop.fs.Path) FileSystem(org.apache.hadoop.fs.FileSystem) MatrixReader(org.apache.sysml.runtime.io.MatrixReader) Reader(org.apache.hadoop.io.SequenceFile.Reader) IOException(java.io.IOException)

Aggregations

Reader (org.apache.hadoop.io.SequenceFile.Reader)25 Path (org.apache.hadoop.fs.Path)19 SequenceFile (org.apache.hadoop.io.SequenceFile)17 IOException (java.io.IOException)14 Writable (org.apache.hadoop.io.Writable)14 FileNotFoundException (java.io.FileNotFoundException)12 WebApplicationException (javax.ws.rs.WebApplicationException)12 ArrayList (java.util.ArrayList)9 HashMap (java.util.HashMap)6 Text (org.apache.hadoop.io.Text)5 Node (org.apache.nutch.scoring.webgraph.Node)4 List (java.util.List)3 LinkDatum (org.apache.nutch.scoring.webgraph.LinkDatum)3 Test (org.junit.Test)3 HashSet (java.util.HashSet)2 Configuration (org.apache.hadoop.conf.Configuration)2 FileSystem (org.apache.hadoop.fs.FileSystem)2 Writer (org.apache.hadoop.io.SequenceFile.Writer)2 FlowFile (org.apache.nifi.flowfile.FlowFile)2 ProcessException (org.apache.nifi.processor.exception.ProcessException)2