Search in sources :

Example 1 with ExtractedText

use of org.apache.jackrabbit.oak.plugins.index.fulltext.ExtractedText in project jackrabbit-oak by apache.

the class ExtractedTextCacheTest method preExtractionReindex.

@Test
public void preExtractionReindex() throws Exception {
    ExtractedTextCache cache = new ExtractedTextCache(10 * FileUtils.ONE_MB, 100);
    PreExtractedTextProvider provider = mock(PreExtractedTextProvider.class);
    cache.setExtractedTextProvider(provider);
    when(provider.getText(anyString(), any(Blob.class))).thenReturn(new ExtractedText(ExtractionResult.SUCCESS, "bar"));
    Blob b = new IdBlob("hello", "a");
    String text = cache.get("/a", "foo", b, true);
    assertEquals("bar", text);
}
Also used : PreExtractedTextProvider(org.apache.jackrabbit.oak.plugins.index.fulltext.PreExtractedTextProvider) Blob(org.apache.jackrabbit.oak.api.Blob) ArrayBasedBlob(org.apache.jackrabbit.oak.plugins.memory.ArrayBasedBlob) Matchers.anyString(org.mockito.Matchers.anyString) ExtractedText(org.apache.jackrabbit.oak.plugins.index.fulltext.ExtractedText) Test(org.junit.Test)

Example 2 with ExtractedText

use of org.apache.jackrabbit.oak.plugins.index.fulltext.ExtractedText in project jackrabbit-oak by apache.

the class ExtractedTextCacheTest method cacheEnabled.

@Test
public void cacheEnabled() throws Exception {
    ExtractedTextCache cache = new ExtractedTextCache(10 * FileUtils.ONE_MB, 100);
    assertNotNull(cache.getCacheStats());
    Blob b = new IdBlob("hello", "a");
    String text = cache.get("/a", "foo", b, false);
    assertNull(text);
    cache.put(b, new ExtractedText(ExtractionResult.SUCCESS, "test hello"));
    text = cache.get("/a", "foo", b, false);
    assertEquals("test hello", text);
}
Also used : Blob(org.apache.jackrabbit.oak.api.Blob) ArrayBasedBlob(org.apache.jackrabbit.oak.plugins.memory.ArrayBasedBlob) Matchers.anyString(org.mockito.Matchers.anyString) ExtractedText(org.apache.jackrabbit.oak.plugins.index.fulltext.ExtractedText) Test(org.junit.Test)

Example 3 with ExtractedText

use of org.apache.jackrabbit.oak.plugins.index.fulltext.ExtractedText in project jackrabbit-oak by apache.

the class ExtractedTextCacheTest method cacheEnabledErrorInTextExtraction.

@Test
public void cacheEnabledErrorInTextExtraction() throws Exception {
    ExtractedTextCache cache = new ExtractedTextCache(10 * FileUtils.ONE_MB, 100);
    Blob b = new IdBlob("hello", "a");
    String text = cache.get("/a", "foo", b, false);
    assertNull(text);
    cache.put(b, new ExtractedText(ExtractionResult.ERROR, "test hello"));
    text = cache.get("/a", "foo", b, false);
    assertEquals(LuceneIndexEditor.TEXT_EXTRACTION_ERROR, text);
}
Also used : Blob(org.apache.jackrabbit.oak.api.Blob) ArrayBasedBlob(org.apache.jackrabbit.oak.plugins.memory.ArrayBasedBlob) Matchers.anyString(org.mockito.Matchers.anyString) ExtractedText(org.apache.jackrabbit.oak.plugins.index.fulltext.ExtractedText) Test(org.junit.Test)

Example 4 with ExtractedText

use of org.apache.jackrabbit.oak.plugins.index.fulltext.ExtractedText in project jackrabbit-oak by apache.

the class ExtractedTextCacheTest method cacheEnabledNonIdBlob.

@Test
public void cacheEnabledNonIdBlob() throws Exception {
    ExtractedTextCache cache = new ExtractedTextCache(10 * FileUtils.ONE_MB, 100);
    Blob b = new ArrayBasedBlob("hello".getBytes());
    String text = cache.get("/a", "foo", b, false);
    assertNull(text);
    cache.put(b, new ExtractedText(ExtractionResult.SUCCESS, "test hello"));
    text = cache.get("/a", "foo", b, false);
    assertNull(text);
}
Also used : Blob(org.apache.jackrabbit.oak.api.Blob) ArrayBasedBlob(org.apache.jackrabbit.oak.plugins.memory.ArrayBasedBlob) ArrayBasedBlob(org.apache.jackrabbit.oak.plugins.memory.ArrayBasedBlob) Matchers.anyString(org.mockito.Matchers.anyString) ExtractedText(org.apache.jackrabbit.oak.plugins.index.fulltext.ExtractedText) Test(org.junit.Test)

Example 5 with ExtractedText

use of org.apache.jackrabbit.oak.plugins.index.fulltext.ExtractedText in project jackrabbit-oak by apache.

the class DataStoreTextWriterTest method nonExistingEntry.

@Test
public void nonExistingEntry() throws Exception {
    File fdsDir = temporaryFolder.newFolder();
    FileDataStore fds = DataStoreUtils.createFDS(fdsDir, 0);
    ByteArrayInputStream is = new ByteArrayInputStream("hello".getBytes());
    DataRecord dr = fds.addRecord(is);
    File writerDir = temporaryFolder.newFolder();
    DataStoreTextWriter w = new DataStoreTextWriter(writerDir, false);
    String id = dr.getIdentifier().toString();
    assertFalse(w.isProcessed(id));
    assertNull(w.getText("/a", new IdBlob("foo", id)));
    w.write(id, "foo");
    assertTrue(w.isProcessed(id));
    ExtractedText et = w.getText("/a", new IdBlob("foo", id));
    assertEquals("foo", et.getExtractedText());
    assertEquals(ExtractionResult.SUCCESS, et.getExtractionResult());
    w.markEmpty("a");
    assertTrue(w.isProcessed("a"));
}
Also used : ByteArrayInputStream(java.io.ByteArrayInputStream) DataRecord(org.apache.jackrabbit.core.data.DataRecord) File(java.io.File) FileDataStore(org.apache.jackrabbit.core.data.FileDataStore) ExtractedText(org.apache.jackrabbit.oak.plugins.index.fulltext.ExtractedText) Test(org.junit.Test)

Aggregations

ExtractedText (org.apache.jackrabbit.oak.plugins.index.fulltext.ExtractedText)9 Test (org.junit.Test)6 Blob (org.apache.jackrabbit.oak.api.Blob)5 ArrayBasedBlob (org.apache.jackrabbit.oak.plugins.memory.ArrayBasedBlob)5 Matchers.anyString (org.mockito.Matchers.anyString)5 File (java.io.File)2 IOException (java.io.IOException)2 PreExtractedTextProvider (org.apache.jackrabbit.oak.plugins.index.fulltext.PreExtractedTextProvider)2 CountingInputStream (com.google.common.io.CountingInputStream)1 ByteArrayInputStream (java.io.ByteArrayInputStream)1 TimeoutException (java.util.concurrent.TimeoutException)1 CheckForNull (javax.annotation.CheckForNull)1 DataRecord (org.apache.jackrabbit.core.data.DataRecord)1 FileDataStore (org.apache.jackrabbit.core.data.FileDataStore)1 LazyInputStream (org.apache.jackrabbit.oak.commons.io.LazyInputStream)1 TikaException (org.apache.tika.exception.TikaException)1 ParseContext (org.apache.tika.parser.ParseContext)1 WriteOutContentHandler (org.apache.tika.sax.WriteOutContentHandler)1 SAXException (org.xml.sax.SAXException)1