Search in sources :

Example 6 with URIImpl

use of org.ontoware.rdf2go.model.node.impl.URIImpl in project stanbol by apache.

the class TestMetaxaCore method testHtmlExtraction.

/**
     * This tests the html extraction.
     *
     * @throws ExtractorException if there is an error during extraction
     * @throws IOException if there is an error when reading the document
     */
@Test
public void testHtmlExtraction() throws Exception {
    String testFile = "test.html";
    String testResultFile = "html-res.txt";
    // extract text from html
    InputStream in = getResourceAsStream(testFile);
    assertNotNull("failed to load resource " + testFile, in);
    Model m = extractor.extract(in, new URIImpl("file://" + testFile), "text/html");
    String text = MetaxaCore.getText(m);
    // get expected result
    InputStream in2 = getResourceAsStream(testResultFile);
    assertNotNull("failed to load resource " + testResultFile, in2);
    String expectedText = IOUtils.toString(in2, "utf-8");
    // test
    assertEquals(cleanup(expectedText), cleanup(text));
    // show triples
    int tripleCounter = this.printTriples(m);
    assertEquals(28, tripleCounter);
}
Also used : InputStream(java.io.InputStream) Model(org.ontoware.rdf2go.model.Model) URIImpl(org.ontoware.rdf2go.model.node.impl.URIImpl) Test(org.junit.Test)

Example 7 with URIImpl

use of org.ontoware.rdf2go.model.node.impl.URIImpl in project stanbol by apache.

the class TestMetaxaCore method testMailExtraction.

@Test
public void testMailExtraction() throws Exception {
    String testFile = "mail-multipart-test.eml";
    InputStream in = getResourceAsStream(testFile);
    assertNotNull("failed to load resource " + testFile, in);
    Model m = extractor.extract(in, new URIImpl("file://" + testFile), "message/rfc822");
    boolean textContained = m.contains(Variable.ANY, NMO.plainTextMessageContent, Variable.ANY);
    assertTrue(textContained);
}
Also used : InputStream(java.io.InputStream) Model(org.ontoware.rdf2go.model.Model) URIImpl(org.ontoware.rdf2go.model.node.impl.URIImpl) Test(org.junit.Test)

Aggregations

URIImpl (org.ontoware.rdf2go.model.node.impl.URIImpl)7 InputStream (java.io.InputStream)6 Model (org.ontoware.rdf2go.model.Model)6 Test (org.junit.Test)4 File (java.io.File)2 FileInputStream (java.io.FileInputStream)2 URI (org.ontoware.rdf2go.model.node.URI)2 RDFContainer (org.semanticdesktop.aperture.rdf.RDFContainer)2 RDFContainerFactory (org.semanticdesktop.aperture.rdf.RDFContainerFactory)2 RDFContainerFactoryImpl (org.semanticdesktop.aperture.rdf.impl.RDFContainerFactoryImpl)2 BufferedInputStream (java.io.BufferedInputStream)1 BufferedWriter (java.io.BufferedWriter)1 ByteArrayInputStream (java.io.ByteArrayInputStream)1 IOException (java.io.IOException)1 OutputStreamWriter (java.io.OutputStreamWriter)1 Charset (java.nio.charset.Charset)1 HashMap (java.util.HashMap)1 BlankNode (org.apache.clerezza.commons.rdf.BlankNode)1 BlankNodeOrIRI (org.apache.clerezza.commons.rdf.BlankNodeOrIRI)1 Graph (org.apache.clerezza.commons.rdf.Graph)1