Search in sources :

Example 1 with TermFreqVector

use of org.apache.lucene.index.TermFreqVector in project jackrabbit by apache.

the class MoreLikeThis method retrieveTerms.

/**
 * Find words for a more-like-this query former.
 *
 * @param docNum the id of the lucene document from which to find terms
 */
public PriorityQueue retrieveTerms(int docNum) throws IOException {
    Map<String, Int> termFreqMap = new HashMap<String, Int>();
    for (int i = 0; i < fieldNames.length; i++) {
        String fieldName = fieldNames[i];
        TermFreqVector vector = ir.getTermFreqVector(docNum, fieldName);
        // field does not store term vector info
        if (vector == null) {
            Document d = ir.document(docNum);
            String[] text = d.getValues(fieldName);
            if (text != null) {
                for (int j = 0; j < text.length; j++) {
                    addTermFrequencies(new StringReader(text[j]), termFreqMap, fieldName);
                }
            }
        } else {
            addTermFrequencies(termFreqMap, vector);
        }
    }
    return createQueue(termFreqMap);
}
Also used : TermFreqVector(org.apache.lucene.index.TermFreqVector) HashMap(java.util.HashMap) StringReader(java.io.StringReader) Document(org.apache.lucene.document.Document)

Example 2 with TermFreqVector

use of org.apache.lucene.index.TermFreqVector in project jackrabbit by apache.

the class AbstractExcerpt method getExcerpt.

/**
 * {@inheritDoc}
 */
public String getExcerpt(NodeId id, int maxFragments, int maxFragmentSize) throws IOException {
    IndexReader reader = index.getIndexReader();
    try {
        checkRewritten(reader);
        Term idTerm = TermFactory.createUUIDTerm(id.toString());
        TermDocs tDocs = reader.termDocs(idTerm);
        int docNumber;
        Document doc;
        try {
            if (tDocs.next()) {
                docNumber = tDocs.doc();
                doc = reader.document(docNumber);
            } else {
                // node not found in index
                return null;
            }
        } finally {
            tDocs.close();
        }
        Fieldable[] fields = doc.getFieldables(FieldNames.FULLTEXT);
        if (fields.length == 0) {
            log.debug("Fulltext field not stored, using {}", SimpleExcerptProvider.class.getName());
            SimpleExcerptProvider exProvider = new SimpleExcerptProvider();
            exProvider.init(query, index);
            return exProvider.getExcerpt(id, maxFragments, maxFragmentSize);
        }
        StringBuffer text = new StringBuffer();
        String separator = "";
        for (int i = 0; i < fields.length; i++) {
            if (fields[i].stringValue().length() == 0) {
                continue;
            }
            text.append(separator);
            text.append(fields[i].stringValue());
            separator = " ";
        }
        TermFreqVector tfv = reader.getTermFreqVector(docNumber, FieldNames.FULLTEXT);
        if (tfv instanceof TermPositionVector) {
            return createExcerpt((TermPositionVector) tfv, text.toString(), maxFragments, maxFragmentSize);
        } else {
            log.debug("No TermPositionVector on Fulltext field.");
            return null;
        }
    } finally {
        Util.closeOrRelease(reader);
    }
}
Also used : TermFreqVector(org.apache.lucene.index.TermFreqVector) Fieldable(org.apache.lucene.document.Fieldable) TermDocs(org.apache.lucene.index.TermDocs) IndexReader(org.apache.lucene.index.IndexReader) Term(org.apache.lucene.index.Term) Document(org.apache.lucene.document.Document) TermPositionVector(org.apache.lucene.index.TermPositionVector)

Aggregations

Document (org.apache.lucene.document.Document)2 TermFreqVector (org.apache.lucene.index.TermFreqVector)2 StringReader (java.io.StringReader)1 HashMap (java.util.HashMap)1 Fieldable (org.apache.lucene.document.Fieldable)1 IndexReader (org.apache.lucene.index.IndexReader)1 Term (org.apache.lucene.index.Term)1 TermDocs (org.apache.lucene.index.TermDocs)1 TermPositionVector (org.apache.lucene.index.TermPositionVector)1