Search in sources :

Example 1 with EntityMungingRdfHandler

use of org.wikidata.query.rdf.tool.rdf.EntityMungingRdfHandler in project wikidata-query-rdf by wikimedia.

the class Munge method run.

public void run() throws RDFHandlerException, IOException, RDFParseException, InterruptedException {
    try {
        AsyncRDFHandler chunkWriter = AsyncRDFHandler.processAsync(new RDFChunkWriter(chunkFileFormat), false, BUFFER_SIZE);
        AtomicLong actualChunk = new AtomicLong(0);
        EntityMungingRdfHandler.EntityCountListener chunker = (entities) -> {
            long currentChunk = entities / chunkSize;
            if (currentChunk != actualChunk.get()) {
                actualChunk.set(currentChunk);
                // endRDF will cause RDFChunkWriter to start writing a new chunk
                chunkWriter.endRDF();
            }
        };
        EntityMungingRdfHandler munger = new EntityMungingRdfHandler(uris, this.munger, chunkWriter, chunker);
        RDFParser parser = RDFParserSuppliers.defaultRdfParser().get(AsyncRDFHandler.processAsync(new NormalizingRdfHandler(munger), true, BUFFER_SIZE));
        parser.parse(from, uris.root());
        // thread:main: parser -> AsyncRDFHandler -> queue
        // thread:replayer1: Normalizing/Munging -> AsyncRDFHandler -> queue
        // thread:replayer2: RDFChunkWriter -> RDFWriter -> IO
        chunkWriter.waitForCompletion();
    } finally {
        try {
            from.close();
        } catch (IOException e) {
            log.error("Error closing input", e);
        }
    }
}
Also used : Statement(org.openrdf.model.Statement) Munger(org.wikidata.query.rdf.tool.rdf.Munger) LoggerFactory(org.slf4j.LoggerFactory) NormalizingRdfHandler(org.wikidata.query.rdf.tool.rdf.NormalizingRdfHandler) LinkedHashMap(java.util.LinkedHashMap) RDFFormat(org.openrdf.rio.RDFFormat) Locale(java.util.Locale) Map(java.util.Map) MungeOptions(org.wikidata.query.rdf.tool.options.MungeOptions) BasicWriterSettings(org.openrdf.rio.helpers.BasicWriterSettings) AsyncRDFHandler(org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler) OptionsUtils.mungerFromOptions(org.wikidata.query.rdf.tool.options.OptionsUtils.mungerFromOptions) FALSE(java.lang.Boolean.FALSE) Logger(org.slf4j.Logger) RDFHandlerException(org.openrdf.rio.RDFHandlerException) OptionsUtils(org.wikidata.query.rdf.tool.options.OptionsUtils) RDFParserSuppliers(org.wikidata.query.rdf.tool.rdf.RDFParserSuppliers) WriterConfig(org.openrdf.rio.WriterConfig) IOException(java.io.IOException) Rio(org.openrdf.rio.Rio) Reader(java.io.Reader) PrefixRecordingRdfHandler(org.wikidata.query.rdf.tool.rdf.PrefixRecordingRdfHandler) AtomicLong(java.util.concurrent.atomic.AtomicLong) RDFParser(org.openrdf.rio.RDFParser) OptionsUtils.handleOptions(org.wikidata.query.rdf.tool.options.OptionsUtils.handleOptions) RDFParseException(org.openrdf.rio.RDFParseException) UrisScheme(org.wikidata.query.rdf.common.uri.UrisScheme) Writer(java.io.Writer) EntityMungingRdfHandler(org.wikidata.query.rdf.tool.rdf.EntityMungingRdfHandler) RDFHandler(org.openrdf.rio.RDFHandler) RDFWriter(org.openrdf.rio.RDFWriter) AtomicLong(java.util.concurrent.atomic.AtomicLong) EntityMungingRdfHandler(org.wikidata.query.rdf.tool.rdf.EntityMungingRdfHandler) AsyncRDFHandler(org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler) IOException(java.io.IOException) RDFParser(org.openrdf.rio.RDFParser) NormalizingRdfHandler(org.wikidata.query.rdf.tool.rdf.NormalizingRdfHandler)

Aggregations

IOException (java.io.IOException)1 Reader (java.io.Reader)1 Writer (java.io.Writer)1 FALSE (java.lang.Boolean.FALSE)1 LinkedHashMap (java.util.LinkedHashMap)1 Locale (java.util.Locale)1 Map (java.util.Map)1 AtomicLong (java.util.concurrent.atomic.AtomicLong)1 Statement (org.openrdf.model.Statement)1 RDFFormat (org.openrdf.rio.RDFFormat)1 RDFHandler (org.openrdf.rio.RDFHandler)1 RDFHandlerException (org.openrdf.rio.RDFHandlerException)1 RDFParseException (org.openrdf.rio.RDFParseException)1 RDFParser (org.openrdf.rio.RDFParser)1 RDFWriter (org.openrdf.rio.RDFWriter)1 Rio (org.openrdf.rio.Rio)1 WriterConfig (org.openrdf.rio.WriterConfig)1 BasicWriterSettings (org.openrdf.rio.helpers.BasicWriterSettings)1 Logger (org.slf4j.Logger)1 LoggerFactory (org.slf4j.LoggerFactory)1