Search in sources :

Example 1 with DigestIdentifier

use of org.icij.extract.document.DigestIdentifier in project datashare by ICIJ.

the class IndexerHelper method indexEmbeddedFile.

File indexEmbeddedFile(String project, String docPath) throws IOException {
    Path path = get(getClass().getResource(docPath).getPath());
    Extractor extractor = new Extractor(new DocumentFactory().withIdentifier(new DigestIdentifier("SHA-384", Charset.defaultCharset())));
    extractor.setDigester(new UpdatableDigester(project, Entity.HASHER.toString()));
    TikaDocument document = extractor.extract(path);
    ElasticsearchSpewer elasticsearchSpewer = new ElasticsearchSpewer(client, l -> ENGLISH, new FieldNames(), mock(Publisher.class), new PropertiesProvider()).withRefresh(IMMEDIATE).withIndex("test-datashare");
    elasticsearchSpewer.write(document);
    return path.toFile();
}
Also used : Path(java.nio.file.Path) PropertiesProvider(org.icij.datashare.PropertiesProvider) ElasticsearchSpewer(org.icij.datashare.text.indexing.elasticsearch.ElasticsearchSpewer) DocumentFactory(org.icij.extract.document.DocumentFactory) UpdatableDigester(org.icij.extract.extractor.UpdatableDigester) FieldNames(org.icij.spewer.FieldNames) DigestIdentifier(org.icij.extract.document.DigestIdentifier) TikaDocument(org.icij.extract.document.TikaDocument) Extractor(org.icij.extract.extractor.Extractor) Publisher(org.icij.datashare.com.Publisher)

Example 2 with DigestIdentifier

use of org.icij.extract.document.DigestIdentifier in project datashare by ICIJ.

the class DatashareExtractIntegrationTest method createExtractor.

Extractor createExtractor() {
    Extractor extractor = new Extractor(new DocumentFactory().withIdentifier(new DigestIdentifier("SHA-384", Charset.defaultCharset())));
    extractor.setDigester(new UpdatableDigester("test", Entity.HASHER.toString()));
    return extractor;
}
Also used : DocumentFactory(org.icij.extract.document.DocumentFactory) UpdatableDigester(org.icij.extract.extractor.UpdatableDigester) DigestIdentifier(org.icij.extract.document.DigestIdentifier) Extractor(org.icij.extract.extractor.Extractor)

Aggregations

DigestIdentifier (org.icij.extract.document.DigestIdentifier)2 DocumentFactory (org.icij.extract.document.DocumentFactory)2 Extractor (org.icij.extract.extractor.Extractor)2 UpdatableDigester (org.icij.extract.extractor.UpdatableDigester)2 Path (java.nio.file.Path)1 PropertiesProvider (org.icij.datashare.PropertiesProvider)1 Publisher (org.icij.datashare.com.Publisher)1 ElasticsearchSpewer (org.icij.datashare.text.indexing.elasticsearch.ElasticsearchSpewer)1 TikaDocument (org.icij.extract.document.TikaDocument)1 FieldNames (org.icij.spewer.FieldNames)1