Search in sources :

Example 61 with Document

use of org.icij.datashare.text.Document in project datashare by ICIJ.

the class BatchSearchRunnerIntTest method test_search_phrase_matches_with_slop.

@Test
public void test_search_phrase_matches_with_slop() throws Exception {
    // with phrase match a permutation (they call it transposition) is 2 slop
    // https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html
    Document mydoc = createDoc("docId").with("mydoc find").build();
    indexer.add(TEST_INDEX, mydoc);
    BatchSearch search = new BatchSearch(project(TEST_INDEX), "name", "desc", asSet("find mydoc"), User.local(), false, null, null, 2, true);
    new BatchSearchRunner(indexer, new PropertiesProvider(), search, resultConsumer).call();
    verify(resultConsumer).apply(search.uuid, "find mydoc", singletonList(mydoc));
}
Also used : PropertiesProvider(org.icij.datashare.PropertiesProvider) BatchSearch(org.icij.datashare.batch.BatchSearch) Document(org.icij.datashare.text.Document)

Example 62 with Document

use of org.icij.datashare.text.Document in project datashare by ICIJ.

the class BatchSearchRunnerIntTest method test_search_with_file_types_ko.

@Test
public void test_search_with_file_types_ko() throws Exception {
    Document mydoc = createDoc("mydoc").build();
    indexer.add(TEST_INDEX, mydoc);
    BatchSearch searchKo = new BatchSearch(project(TEST_INDEX), "name", "desc", asSet("mydoc"), User.local(), false, singletonList("application/pdf"), null, 0);
    new BatchSearchRunner(indexer, new PropertiesProvider(), searchKo, resultConsumer).call();
    verify(resultConsumer, never()).apply(eq(searchKo.uuid), eq("mydoc"), anyList());
}
Also used : PropertiesProvider(org.icij.datashare.PropertiesProvider) BatchSearch(org.icij.datashare.batch.BatchSearch) Document(org.icij.datashare.text.Document)

Example 63 with Document

use of org.icij.datashare.text.Document in project datashare by ICIJ.

the class NerResourceTest method test_post_text_returns_NamedEntity_list.

@Test
public void test_post_text_returns_NamedEntity_list() throws Exception {
    Document doc = DocumentBuilder.createDoc("inline").with("This the 'foù' file content.").with(ENGLISH).build();
    final Annotations annotations = new Annotations("inline", CORENLP, ENGLISH);
    annotations.add(NlpStage.NER, 10, 13, NamedEntity.Category.PERSON);
    doReturn(asList(NamedEntity.create(NamedEntity.Category.PERSON, "foù", asList(10L), doc.getId(), "root", CORENLP, ENGLISH))).when(pipeline).process(eq(doc));
    Response response = post("/api/ner/findNames/CORENLP", doc.getContent()).response();
    List actualNerList = TypeConvert.fromJson(response.content(), List.class);
    assertThat(actualNerList).hasSize(1);
    assertThat(actualNerList.get(0)).isInstanceOf(HashMap.class);
    assertThat((Map) actualNerList.get(0)).includes(entry("mention", "foù"), entry("extractor", "CORENLP"), entry("mentionNorm", "fou"), entry("offsets", asList(10)));
}
Also used : Response(net.codestory.rest.Response) Annotations(org.icij.datashare.text.nlp.Annotations) Arrays.asList(java.util.Arrays.asList) Collections.emptyList(java.util.Collections.emptyList) List(java.util.List) Document(org.icij.datashare.text.Document) HashMap(java.util.HashMap) Map(java.util.Map) AbstractProdWebServerTest(org.icij.datashare.web.testhelpers.AbstractProdWebServerTest) Test(org.junit.Test)

Aggregations

Document (org.icij.datashare.text.Document)63 Test (org.junit.Test)48 PropertiesProvider (org.icij.datashare.PropertiesProvider)19 BatchSearch (org.icij.datashare.batch.BatchSearch)15 NamedEntity (org.icij.datashare.text.NamedEntity)11 TikaDocument (org.icij.extract.document.TikaDocument)10 HashMap (java.util.HashMap)9 Path (java.nio.file.Path)6 Date (java.util.Date)5 Indexer (org.icij.datashare.text.indexing.Indexer)5 File (java.io.File)4 IOException (java.io.IOException)4 InputStream (java.io.InputStream)4 IntStream (java.util.stream.IntStream)4 DocumentBuilder.createDoc (org.icij.datashare.text.DocumentBuilder.createDoc)4 Project.project (org.icij.datashare.text.Project.project)4 User (org.icij.datashare.user.User)4 Rule (org.junit.Rule)4 Arrays.asList (java.util.Arrays.asList)3 List (java.util.List)3