Search in sources :

Example 1 with LabelToNode

use of org.apache.jena.riot.lang.LabelToNode in project jena by apache.

the class AbstractLineBasedNodeTupleReader method initialize.

@Override
public final void initialize(InputSplit genericSplit, TaskAttemptContext context) throws IOException {
    LOG.debug("initialize({}, {})", genericSplit, context);
    // Assuming file split
    if (!(genericSplit instanceof FileSplit))
        throw new IOException("This record reader only supports FileSplit inputs");
    FileSplit split = (FileSplit) genericSplit;
    // Intermediate : RDFParser but need to make a Iterator<Quad/Triple>
    LabelToNode labelToNode = RdfIOUtils.createLabelToNode(context, split.getPath());
    maker = new ParserProfileStd(RiotLib.factoryRDF(labelToNode), ErrorHandlerFactory.errorHandlerStd, IRIResolver.create(), PrefixMapFactory.createForInput(), null, true, false);
    Configuration config = context.getConfiguration();
    this.ignoreBadTuples = config.getBoolean(RdfIOConstants.INPUT_IGNORE_BAD_TUPLES, true);
    if (this.ignoreBadTuples)
        LOG.warn("Configured to ignore bad tuples, parsing errors will be logged and the bad line skipped but no errors will be thrownConsider setting {} to false to disable this behaviour", RdfIOConstants.INPUT_IGNORE_BAD_TUPLES);
    // Figure out what portion of the file to read
    this.maxLineLength = config.getInt(HadoopIOConstants.MAX_LINE_LENGTH, Integer.MAX_VALUE);
    start = split.getStart();
    end = start + split.getLength();
    final Path file = split.getPath();
    long totalLength = file.getFileSystem(context.getConfiguration()).getFileStatus(file).getLen();
    compressionCodecs = new CompressionCodecFactory(config);
    final CompressionCodec codec = compressionCodecs.getCodec(file);
    LOG.info(String.format("Got split with start %d and length %d for file with total length of %d", new Object[] { start, split.getLength(), totalLength }));
    // Open the file and seek to the start of the split
    FileSystem fs = file.getFileSystem(config);
    FSDataInputStream fileIn = fs.open(file);
    boolean skipFirstLine = false;
    if (codec != null) {
        // Add 1 and verify we got complete split
        if (totalLength > split.getLength() + 1)
            throw new IOException("This record reader can only be used with compressed input where the split covers the whole file");
        in = new LineReader(codec.createInputStream(fileIn), config);
        estLength = end;
        end = Long.MAX_VALUE;
    } else {
        // Uncompressed input
        if (start != 0) {
            skipFirstLine = true;
            --start;
            fileIn.seek(start);
        }
        in = new LineReader(fileIn, config);
    }
    // NLineInputFormat will provide the split information to use
    if (skipFirstLine) {
        start += in.readLine(new Text(), 0, (int) Math.min(Integer.MAX_VALUE, end - start));
    }
    this.pos = start;
}
Also used : Path(org.apache.hadoop.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) LabelToNode(org.apache.jena.riot.lang.LabelToNode) Text(org.apache.hadoop.io.Text) IOException(java.io.IOException) FileSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit) CompressionCodecFactory(org.apache.hadoop.io.compress.CompressionCodecFactory) FileSystem(org.apache.hadoop.fs.FileSystem) LineReader(org.apache.hadoop.util.LineReader) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec)

Example 2 with LabelToNode

use of org.apache.jena.riot.lang.LabelToNode in project jena by apache.

the class RdfIOUtils method createParserProfile.

/**
     * Creates a parser profile for the given job context
     * 
     * @param context
     *            Context
     * @param path
     *            File path
     * @return Parser profile
     * @deprecated Legacy - use {@link #createRDFParserBuilder}.
     */
@Deprecated
public static ParserProfile createParserProfile(JobContext context, Path path) {
    LabelToNode labelMapping = createLabelToNode(context, path);
    ParserProfile profile = RiotLib.createParserProfile(RiotLib.factoryRDF(labelMapping), ErrorHandlerFactory.errorHandlerStd, IRIResolver.createNoResolve(), false);
    return profile;
}
Also used : LabelToNode(org.apache.jena.riot.lang.LabelToNode)

Example 3 with LabelToNode

use of org.apache.jena.riot.lang.LabelToNode in project jena by apache.

the class RdfIOUtils method createRDFParserBuilder.

public static RDFParserBuilder createRDFParserBuilder(JobContext context, Path path) {
    LabelToNode labelMapping = createLabelToNode(context, path);
    RDFParserBuilder builder = RDFParser.create().labelToNode(labelMapping).errorHandler(ErrorHandlerFactory.errorHandlerStd);
    return builder;
}
Also used : RDFParserBuilder(org.apache.jena.riot.RDFParserBuilder) LabelToNode(org.apache.jena.riot.lang.LabelToNode)

Example 4 with LabelToNode

use of org.apache.jena.riot.lang.LabelToNode in project jena by apache.

the class RdfIOUtils method createLabelToNode.

public static LabelToNode createLabelToNode(JobContext context, Path path) {
    UUID seed = RdfIOUtils.getSeed(context, path);
    LabelToNode labelMapping = LabelToNode.createScopeByDocumentHash(seed);
    return labelMapping;
}
Also used : LabelToNode(org.apache.jena.riot.lang.LabelToNode) UUID(java.util.UUID)

Example 5 with LabelToNode

use of org.apache.jena.riot.lang.LabelToNode in project jena by apache.

the class RiotLib method profile.

/**
 * Create a parser profile for the given setup
 */
private static ParserProfile profile(String baseIRI, boolean resolveIRIs, boolean checking, ErrorHandler handler) {
    LabelToNode labelToNode = SyntaxLabels.createLabelToNode();
    IRIx base = resolveIRIs ? IRIs.resolveIRI(baseIRI) : IRIx.create(baseIRI);
    IRIxResolver resolver = IRIxResolver.create(base).resolve(resolveIRIs).allowRelative(false).build();
    return RiotLib.createParserProfile(factoryRDF(labelToNode), handler, resolver, checking);
}
Also used : LabelToNode(org.apache.jena.riot.lang.LabelToNode) IRIx(org.apache.jena.irix.IRIx) IRIxResolver(org.apache.jena.irix.IRIxResolver)

Aggregations

LabelToNode (org.apache.jena.riot.lang.LabelToNode)5 IOException (java.io.IOException)1 UUID (java.util.UUID)1 Configuration (org.apache.hadoop.conf.Configuration)1 FSDataInputStream (org.apache.hadoop.fs.FSDataInputStream)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 Path (org.apache.hadoop.fs.Path)1 Text (org.apache.hadoop.io.Text)1 CompressionCodec (org.apache.hadoop.io.compress.CompressionCodec)1 CompressionCodecFactory (org.apache.hadoop.io.compress.CompressionCodecFactory)1 FileSplit (org.apache.hadoop.mapreduce.lib.input.FileSplit)1 LineReader (org.apache.hadoop.util.LineReader)1 IRIx (org.apache.jena.irix.IRIx)1 IRIxResolver (org.apache.jena.irix.IRIxResolver)1 RDFParserBuilder (org.apache.jena.riot.RDFParserBuilder)1