Search in sources :

Example 1 with NormalizingRdfHandler

use of org.wikidata.query.rdf.tool.rdf.NormalizingRdfHandler in project wikidata-query-rdf by wikimedia.

the class WikibaseRepository method fetchRdfForEntity.

/**
 * Fetch the RDF for some entity.
 *
 * @throws RetryableException thrown if there is an error communicating with
 *             wikibase
 */
public Collection<Statement> fetchRdfForEntity(String entityId) throws RetryableException {
    // TODO handle ?flavor=dump or whatever parameters we need
    URI uri = uris.rdf(entityId);
    long start = System.currentTimeMillis();
    log.debug("Fetching rdf from {}", uri);
    RDFParser parser = Rio.createParser(RDFFormat.TURTLE);
    StatementCollector collector = new StatementCollector();
    parser.setRDFHandler(new NormalizingRdfHandler(collector));
    HttpGet request = new HttpGet(uri);
    request.setConfig(configWithTimeout);
    try {
        try (CloseableHttpResponse response = client.execute(request)) {
            if (response.getStatusLine().getStatusCode() == 404) {
                // A delete/nonexistent page
                return Collections.emptyList();
            }
            if (response.getStatusLine().getStatusCode() >= 300) {
                throw new ContainedException("Unexpected status code fetching RDF for " + uri + ":  " + response.getStatusLine().getStatusCode());
            }
            parser.parse(new InputStreamReader(response.getEntity().getContent(), Charsets.UTF_8), uri.toString());
        }
    } catch (UnknownHostException | SocketException | SSLHandshakeException e) {
        // We want to bail on this, since it happens to be sticky for some reason
        throw new RuntimeException(e);
    } catch (IOException e) {
        throw new RetryableException("Error fetching RDF for " + uri, e);
    } catch (RDFParseException | RDFHandlerException e) {
        throw new ContainedException("RDF parsing error for " + uri, e);
    }
    log.debug("Done in {} ms", System.currentTimeMillis() - start);
    return collector.getStatements();
}
Also used : SocketException(java.net.SocketException) InputStreamReader(java.io.InputStreamReader) UnknownHostException(java.net.UnknownHostException) StatementCollector(org.openrdf.rio.helpers.StatementCollector) HttpGet(org.apache.http.client.methods.HttpGet) ContainedException(org.wikidata.query.rdf.tool.exception.ContainedException) InterruptedIOException(java.io.InterruptedIOException) IOException(java.io.IOException) RDFParser(org.openrdf.rio.RDFParser) URI(java.net.URI) SSLHandshakeException(javax.net.ssl.SSLHandshakeException) NormalizingRdfHandler(org.wikidata.query.rdf.tool.rdf.NormalizingRdfHandler) RetryableException(org.wikidata.query.rdf.tool.exception.RetryableException) RDFHandlerException(org.openrdf.rio.RDFHandlerException) CloseableHttpResponse(org.apache.http.client.methods.CloseableHttpResponse) RDFParseException(org.openrdf.rio.RDFParseException)

Example 2 with NormalizingRdfHandler

use of org.wikidata.query.rdf.tool.rdf.NormalizingRdfHandler in project wikidata-query-rdf by wikimedia.

the class Munge method run.

public void run() throws RDFHandlerException, IOException, RDFParseException, InterruptedException {
    try {
        AsyncRDFHandler chunkWriter = AsyncRDFHandler.processAsync(new RDFChunkWriter(chunkFileFormat), false, BUFFER_SIZE);
        AtomicLong actualChunk = new AtomicLong(0);
        EntityMungingRdfHandler.EntityCountListener chunker = (entities) -> {
            long currentChunk = entities / chunkSize;
            if (currentChunk != actualChunk.get()) {
                actualChunk.set(currentChunk);
                // endRDF will cause RDFChunkWriter to start writing a new chunk
                chunkWriter.endRDF();
            }
        };
        EntityMungingRdfHandler munger = new EntityMungingRdfHandler(uris, this.munger, chunkWriter, chunker);
        RDFParser parser = RDFParserSuppliers.defaultRdfParser().get(AsyncRDFHandler.processAsync(new NormalizingRdfHandler(munger), true, BUFFER_SIZE));
        parser.parse(from, uris.root());
        // thread:main: parser -> AsyncRDFHandler -> queue
        // thread:replayer1: Normalizing/Munging -> AsyncRDFHandler -> queue
        // thread:replayer2: RDFChunkWriter -> RDFWriter -> IO
        chunkWriter.waitForCompletion();
    } finally {
        try {
            from.close();
        } catch (IOException e) {
            log.error("Error closing input", e);
        }
    }
}
Also used : Statement(org.openrdf.model.Statement) Munger(org.wikidata.query.rdf.tool.rdf.Munger) LoggerFactory(org.slf4j.LoggerFactory) NormalizingRdfHandler(org.wikidata.query.rdf.tool.rdf.NormalizingRdfHandler) LinkedHashMap(java.util.LinkedHashMap) RDFFormat(org.openrdf.rio.RDFFormat) Locale(java.util.Locale) Map(java.util.Map) MungeOptions(org.wikidata.query.rdf.tool.options.MungeOptions) BasicWriterSettings(org.openrdf.rio.helpers.BasicWriterSettings) AsyncRDFHandler(org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler) OptionsUtils.mungerFromOptions(org.wikidata.query.rdf.tool.options.OptionsUtils.mungerFromOptions) FALSE(java.lang.Boolean.FALSE) Logger(org.slf4j.Logger) RDFHandlerException(org.openrdf.rio.RDFHandlerException) OptionsUtils(org.wikidata.query.rdf.tool.options.OptionsUtils) RDFParserSuppliers(org.wikidata.query.rdf.tool.rdf.RDFParserSuppliers) WriterConfig(org.openrdf.rio.WriterConfig) IOException(java.io.IOException) Rio(org.openrdf.rio.Rio) Reader(java.io.Reader) PrefixRecordingRdfHandler(org.wikidata.query.rdf.tool.rdf.PrefixRecordingRdfHandler) AtomicLong(java.util.concurrent.atomic.AtomicLong) RDFParser(org.openrdf.rio.RDFParser) OptionsUtils.handleOptions(org.wikidata.query.rdf.tool.options.OptionsUtils.handleOptions) RDFParseException(org.openrdf.rio.RDFParseException) UrisScheme(org.wikidata.query.rdf.common.uri.UrisScheme) Writer(java.io.Writer) EntityMungingRdfHandler(org.wikidata.query.rdf.tool.rdf.EntityMungingRdfHandler) RDFHandler(org.openrdf.rio.RDFHandler) RDFWriter(org.openrdf.rio.RDFWriter) AtomicLong(java.util.concurrent.atomic.AtomicLong) EntityMungingRdfHandler(org.wikidata.query.rdf.tool.rdf.EntityMungingRdfHandler) AsyncRDFHandler(org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler) IOException(java.io.IOException) RDFParser(org.openrdf.rio.RDFParser) NormalizingRdfHandler(org.wikidata.query.rdf.tool.rdf.NormalizingRdfHandler)

Aggregations

IOException (java.io.IOException)2 RDFHandlerException (org.openrdf.rio.RDFHandlerException)2 RDFParseException (org.openrdf.rio.RDFParseException)2 RDFParser (org.openrdf.rio.RDFParser)2 NormalizingRdfHandler (org.wikidata.query.rdf.tool.rdf.NormalizingRdfHandler)2 InputStreamReader (java.io.InputStreamReader)1 InterruptedIOException (java.io.InterruptedIOException)1 Reader (java.io.Reader)1 Writer (java.io.Writer)1 FALSE (java.lang.Boolean.FALSE)1 SocketException (java.net.SocketException)1 URI (java.net.URI)1 UnknownHostException (java.net.UnknownHostException)1 LinkedHashMap (java.util.LinkedHashMap)1 Locale (java.util.Locale)1 Map (java.util.Map)1 AtomicLong (java.util.concurrent.atomic.AtomicLong)1 SSLHandshakeException (javax.net.ssl.SSLHandshakeException)1 CloseableHttpResponse (org.apache.http.client.methods.CloseableHttpResponse)1 HttpGet (org.apache.http.client.methods.HttpGet)1