Search in sources :

Example 1 with BibTeXEntry

use of org.jbibtex.BibTeXEntry in project jabref by JabRef.

the class CitationStyleGenerator method bibEntryToCSLItemData.

/**
     * Converts the {@link BibEntry} into {@link CSLItemData}.
     */
private static CSLItemData bibEntryToCSLItemData(BibEntry bibEntry) {
    String citeKey = bibEntry.getCiteKeyOptional().orElse("");
    BibTeXEntry bibTeXEntry = new BibTeXEntry(new Key(bibEntry.getType()), new Key(citeKey));
    // Not every field is already generated into latex free fields
    for (String key : bibEntry.getFieldMap().keySet()) {
        Optional<String> latexFreeField = bibEntry.getLatexFreeField(key);
        latexFreeField.ifPresent(value -> bibTeXEntry.addField(new Key(key), new DigitStringValue(value)));
    }
    return BIBTEX_CONVERTER.toItemData(bibTeXEntry);
}
Also used : DigitStringValue(org.jbibtex.DigitStringValue) BibTeXEntry(org.jbibtex.BibTeXEntry) Key(org.jbibtex.Key)

Example 2 with BibTeXEntry

use of org.jbibtex.BibTeXEntry in project Anserini by castorini.

the class BibtexGenerator method createDocument.

@Override
public Document createDocument(BibtexCollection.Document bibtexDoc) throws GeneratorException {
    String id = bibtexDoc.id();
    String content = bibtexDoc.contents();
    String type = bibtexDoc.type();
    BibTeXEntry bibtexEntry = bibtexDoc.bibtexEntry();
    if (content == null || content.trim().isEmpty()) {
        throw new EmptyDocumentException();
    }
    Document doc = new Document();
    // Store the collection docid.
    doc.add(new StringField(IndexArgs.ID, id, Field.Store.YES));
    // This is needed to break score ties by docid.
    doc.add(new SortedDocValuesField(IndexArgs.ID, new BytesRef(id)));
    // Store the collection's bibtex type
    doc.add(new StringField(TYPE, type, Field.Store.YES));
    if (args.storeRaw) {
        doc.add(new StoredField(IndexArgs.RAW, bibtexDoc.raw()));
    }
    FieldType fieldType = new FieldType();
    fieldType.setStored(args.storeContents);
    // Are we storing document vectors?
    if (args.storeDocvectors) {
        fieldType.setStoreTermVectors(true);
        fieldType.setStoreTermVectorPositions(true);
    }
    // Are we building a "positional" or "count" index?
    if (args.storePositions) {
        fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
    } else {
        fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
    }
    doc.add(new Field(IndexArgs.CONTENTS, content, fieldType));
    for (Map.Entry<Key, Value> fieldEntry : bibtexEntry.getFields().entrySet()) {
        String fieldKey = fieldEntry.getKey().toString();
        String fieldValue = fieldEntry.getValue().toUserString();
        // not worth trying to parse/normalize all numbers at the moment
        if (fieldKey.equals(BibtexField.NUMBER.name)) {
            continue;
        }
        if (STRING_FIELD_NAMES.contains(fieldKey)) {
            // index field as single token
            doc.add(new StringField(fieldKey, fieldValue, Field.Store.YES));
        } else if (FIELDS_WITHOUT_STEMMING.contains(fieldKey)) {
            // index field without stemming but store original string value
            FieldType nonStemmedType = new FieldType(fieldType);
            nonStemmedType.setStored(true);
            // token stream to be indexed
            Analyzer nonStemmingAnalyzer = DefaultEnglishAnalyzer.newNonStemmingInstance(CharArraySet.EMPTY_SET);
            StringReader reader = new StringReader(fieldValue);
            TokenStream stream = nonStemmingAnalyzer.tokenStream(null, reader);
            Field field = new Field(fieldKey, fieldValue, nonStemmedType);
            field.setTokenStream(stream);
            doc.add(field);
            nonStemmingAnalyzer.close();
        } else if (fieldKey.equals(BibtexField.YEAR.name)) {
            if (fieldValue != "") {
                // index as numeric value to allow range queries
                doc.add(new IntPoint(fieldKey, Integer.parseInt(fieldValue)));
            }
            doc.add(new StoredField(fieldKey, fieldValue));
        } else {
            // default to normal Field with tokenization and stemming
            doc.add(new Field(fieldKey, fieldValue, fieldType));
        }
    }
    return doc;
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) BibTeXEntry(org.jbibtex.BibTeXEntry) Document(org.apache.lucene.document.Document) Analyzer(org.apache.lucene.analysis.Analyzer) DefaultEnglishAnalyzer(io.anserini.analysis.DefaultEnglishAnalyzer) FieldType(org.apache.lucene.document.FieldType) StringField(org.apache.lucene.document.StringField) StoredField(org.apache.lucene.document.StoredField) SortedDocValuesField(org.apache.lucene.document.SortedDocValuesField) Field(org.apache.lucene.document.Field) IntPoint(org.apache.lucene.document.IntPoint) StoredField(org.apache.lucene.document.StoredField) StringField(org.apache.lucene.document.StringField) SortedDocValuesField(org.apache.lucene.document.SortedDocValuesField) Value(org.jbibtex.Value) StringReader(java.io.StringReader) Map(java.util.Map) BytesRef(org.apache.lucene.util.BytesRef) Key(org.jbibtex.Key)

Aggregations

BibTeXEntry (org.jbibtex.BibTeXEntry)2 Key (org.jbibtex.Key)2 DefaultEnglishAnalyzer (io.anserini.analysis.DefaultEnglishAnalyzer)1 StringReader (java.io.StringReader)1 Map (java.util.Map)1 Analyzer (org.apache.lucene.analysis.Analyzer)1 TokenStream (org.apache.lucene.analysis.TokenStream)1 Document (org.apache.lucene.document.Document)1 Field (org.apache.lucene.document.Field)1 FieldType (org.apache.lucene.document.FieldType)1 IntPoint (org.apache.lucene.document.IntPoint)1 SortedDocValuesField (org.apache.lucene.document.SortedDocValuesField)1 StoredField (org.apache.lucene.document.StoredField)1 StringField (org.apache.lucene.document.StringField)1 BytesRef (org.apache.lucene.util.BytesRef)1 DigitStringValue (org.jbibtex.DigitStringValue)1 Value (org.jbibtex.Value)1