Examples with StreamSource - org.opensolaris.opengrok.analysis.StreamSource

Example 1 with StreamSource

use of org.opensolaris.opengrok.analysis.StreamSource in project OpenGrok by OpenGrok.

the class TroffAnalyzerTest method testAnalyze.

/**
 * Test method for {@link org.opensolaris.opengrok.analysis.document
 *  .TroffAnalyzer#analyze(org.apache.lucene.document.Document,
 *      java.io.InputStream)}.
 *
 * @throws IOException
 */
@Test
public void testAnalyze() throws IOException {
    Document doc = new Document();
    StringWriter xrefOut = new StringWriter();
    analyzer.analyze(doc, new StreamSource() {

        @Override
        public InputStream getStream() throws IOException {
            return new ByteArrayInputStream(content.getBytes());
        }
    }, xrefOut);
}

Also used : StringWriter(java.io.StringWriter) ByteArrayInputStream(java.io.ByteArrayInputStream) ByteArrayInputStream(java.io.ByteArrayInputStream) InputStream(java.io.InputStream) StreamSource(org.opensolaris.opengrok.analysis.StreamSource) IOException(java.io.IOException) Document(org.apache.lucene.document.Document) Test(org.junit.Test)

Example 2 with StreamSource

use of org.opensolaris.opengrok.analysis.StreamSource in project OpenGrok by OpenGrok.

the class LineBreakerTest method shouldHandleDocsOfLongerLength.

@Test
public void shouldHandleDocsOfLongerLength() throws IOException {
    // 0             0
    // 0-- -  5-- - -1--- - 5--- - 2-
    final String INPUT = "ab\r\ncde\r\nefgh\r\nijk\r\nlm";
    StreamSource src = StreamSource.fromString(INPUT);
    brkr.reset(src);
    assertEquals("split count", 5, brkr.count());
    assertEquals("split position", 0, brkr.getPosition(0));
    assertEquals("split position", 4, brkr.getPosition(1));
    assertEquals("split position", 9, brkr.getPosition(2));
    assertEquals("split position", 15, brkr.getPosition(3));
    assertEquals("split position", 20, brkr.getPosition(4));
}

Also used : StreamSource(org.opensolaris.opengrok.analysis.StreamSource) Test(org.junit.Test)

Example 3 with StreamSource

use of org.opensolaris.opengrok.analysis.StreamSource in project OpenGrok by OpenGrok.

the class LineBreakerTest method shouldSplitEndingLFsIntoOneMoreLine.

@Test
public void shouldSplitEndingLFsIntoOneMoreLine() throws IOException {
    StreamSource src = StreamSource.fromString("abc\ndef\n");
    brkr.reset(src);
    assertEquals("split count", 3, brkr.count());
    assertEquals("split position", 0, brkr.getPosition(0));
    assertEquals("split position", 4, brkr.getPosition(1));
    assertEquals("split position", 8, brkr.getPosition(2));
}

Also used : StreamSource(org.opensolaris.opengrok.analysis.StreamSource) Test(org.junit.Test)

Example 4 with StreamSource

use of org.opensolaris.opengrok.analysis.StreamSource in project OpenGrok by OpenGrok.

the class StreamUtils method sourceFromEmbedded.

/**
 * Creates a {@code StreamSource} instance that reads data from an
 * embedded resource.
 * @param resourceName a required resource name
 * @return a stream source that reads from {@code name}
 */
public static StreamSource sourceFromEmbedded(String resourceName) {
    return new StreamSource() {

        @Override
        public InputStream getStream() throws IOException {
            InputStream res = StreamUtils.class.getClassLoader().getResourceAsStream(resourceName);
            assertNotNull("resource " + resourceName, res);
            return new BufferedInputStream(res);
        }
    };
}

Also used : BufferedInputStream(java.io.BufferedInputStream) BufferedInputStream(java.io.BufferedInputStream) InputStream(java.io.InputStream) StreamSource(org.opensolaris.opengrok.analysis.StreamSource)

Example 5 with StreamSource

use of org.opensolaris.opengrok.analysis.StreamSource in project OpenGrok by OpenGrok.

the class GZIPAnalyzer method analyze.

@Override
public void analyze(Document doc, StreamSource src, Writer xrefOut) throws IOException, InterruptedException {
    StreamSource gzSrc = wrap(src);
    String path = doc.get("path");
    if (path != null && (path.endsWith(".gz") || path.endsWith(".GZ") || path.endsWith(".Gz"))) {
        String newname = path.substring(0, path.length() - 3);
        // System.err.println("GZIPPED OF = " + newname);
        try (InputStream gzis = gzSrc.getStream()) {
            fa = AnalyzerGuru.getAnalyzer(gzis, newname);
        }
        if (fa == null) {
            this.g = Genre.DATA;
            LOGGER.log(Level.WARNING, "Did not analyze {0}, detected as data.", newname);
        // TODO we could probably wrap tar analyzer here, need to do research on reader coming from gzis ...
        } else {
            // simple file gziped case captured here
            if (fa.getGenre() == Genre.PLAIN || fa.getGenre() == Genre.XREFABLE) {
                this.g = Genre.XREFABLE;
            } else {
                this.g = Genre.DATA;
            }
            fa.analyze(doc, gzSrc, xrefOut);
            if (doc.get("t") != null) {
                doc.removeField("t");
                if (g == Genre.XREFABLE) {
                    doc.add(new Field("t", g.typeName(), AnalyzerGuru.string_ft_stored_nanalyzed_norms));
                }
            }
        }
    }
}

Also used : Field(org.apache.lucene.document.Field) GZIPInputStream(java.util.zip.GZIPInputStream) BufferedInputStream(java.io.BufferedInputStream) InputStream(java.io.InputStream) StreamSource(org.opensolaris.opengrok.analysis.StreamSource)

Aggregations

StreamSource (org.opensolaris.opengrok.analysis.StreamSource)12 InputStream (java.io.InputStream)6 Test (org.junit.Test)6 BufferedInputStream (java.io.BufferedInputStream)4 IOException (java.io.IOException)3 Field (org.apache.lucene.document.Field)2 BufferedReader (java.io.BufferedReader)1 ByteArrayInputStream (java.io.ByteArrayInputStream)1 InputStreamReader (java.io.InputStreamReader)1 Reader (java.io.Reader)1 StringWriter (java.io.StringWriter)1 GZIPInputStream (java.util.zip.GZIPInputStream)1 CharTermAttribute (org.apache.lucene.analysis.tokenattributes.CharTermAttribute)1 OffsetAttribute (org.apache.lucene.analysis.tokenattributes.OffsetAttribute)1 Document (org.apache.lucene.document.Document)1 CBZip2InputStream (org.apache.tools.bzip2.CBZip2InputStream)1 Assert.assertArrayEquals (org.junit.Assert.assertArrayEquals)1 Assert.assertEquals (org.junit.Assert.assertEquals)1 CtagsReader (org.opensolaris.opengrok.analysis.CtagsReader)1 Definitions (org.opensolaris.opengrok.analysis.Definitions)1