Search in sources :

Example 1 with FeatureField

use of org.apache.lucene.document.FeatureField in project OpenSearch by opensearch-project.

the class RankFeaturesFieldMapper method parse.

@Override
public void parse(ParseContext context) throws IOException {
    if (context.externalValueSet()) {
        throw new IllegalArgumentException("[rank_features] fields can't be used in multi-fields");
    }
    if (context.parser().currentToken() != Token.START_OBJECT) {
        throw new IllegalArgumentException("[rank_features] fields must be json objects, expected a START_OBJECT but got: " + context.parser().currentToken());
    }
    String feature = null;
    for (Token token = context.parser().nextToken(); token != Token.END_OBJECT; token = context.parser().nextToken()) {
        if (token == Token.FIELD_NAME) {
            feature = context.parser().currentName();
        } else if (token == Token.VALUE_NULL) {
        // ignore feature, this is consistent with numeric fields
        } else if (token == Token.VALUE_NUMBER || token == Token.VALUE_STRING) {
            final String key = name() + "." + feature;
            float value = context.parser().floatValue(true);
            if (context.doc().getByKey(key) != null) {
                throw new IllegalArgumentException("[rank_features] fields do not support indexing multiple values for the same " + "rank feature [" + key + "] in the same document");
            }
            context.doc().addWithKey(key, new FeatureField(name(), feature, value));
        } else {
            throw new IllegalArgumentException("[rank_features] fields take hashes that map a feature to a strictly positive " + "float, but got unexpected token " + token);
        }
    }
}
Also used : Token(org.opensearch.common.xcontent.XContentParser.Token) FeatureField(org.apache.lucene.document.FeatureField)

Example 2 with FeatureField

use of org.apache.lucene.document.FeatureField in project OpenSearch by opensearch-project.

the class RankFeatureFieldMapperTests method testNegativeScoreImpact.

public void testNegativeScoreImpact() throws Exception {
    DocumentMapper mapper = createDocumentMapper(fieldMapping(b -> b.field("type", "rank_feature").field("positive_score_impact", false)));
    ParsedDocument doc1 = mapper.parse(source(b -> b.field("field", 10)));
    IndexableField[] fields = doc1.rootDoc().getFields("_feature");
    assertEquals(1, fields.length);
    assertThat(fields[0], instanceOf(FeatureField.class));
    FeatureField featureField1 = (FeatureField) fields[0];
    ParsedDocument doc2 = mapper.parse(source(b -> b.field("field", 12)));
    FeatureField featureField2 = (FeatureField) doc2.rootDoc().getFields("_feature")[0];
    int freq1 = getFrequency(featureField1.tokenStream(null, null));
    int freq2 = getFrequency(featureField2.tokenStream(null, null));
    assertTrue(freq1 > freq2);
}
Also used : Query(org.apache.lucene.search.Query) Arrays(java.util.Arrays) TokenStream(org.apache.lucene.analysis.TokenStream) IndexableField(org.apache.lucene.index.IndexableField) Collection(java.util.Collection) TermFrequencyAttribute(org.apache.lucene.analysis.tokenattributes.TermFrequencyAttribute) IOException(java.io.IOException) Plugin(org.opensearch.plugins.Plugin) Strings(org.opensearch.common.Strings) FeatureField(org.apache.lucene.document.FeatureField) XContentBuilder(org.opensearch.common.xcontent.XContentBuilder) Matchers.instanceOf(org.hamcrest.Matchers.instanceOf) TermQuery(org.apache.lucene.search.TermQuery) List(org.opensearch.common.collect.List) IndexableField(org.apache.lucene.index.IndexableField) FeatureField(org.apache.lucene.document.FeatureField)

Example 3 with FeatureField

use of org.apache.lucene.document.FeatureField in project OpenSearch by opensearch-project.

the class RankFeatureFieldMapperTests method testDefaults.

public void testDefaults() throws Exception {
    DocumentMapper mapper = createDocumentMapper(fieldMapping(this::minimalMapping));
    assertEquals(Strings.toString(fieldMapping(this::minimalMapping)), mapper.mappingSource().toString());
    ParsedDocument doc1 = mapper.parse(source(b -> b.field("field", 10)));
    IndexableField[] fields = doc1.rootDoc().getFields("_feature");
    assertEquals(1, fields.length);
    assertThat(fields[0], instanceOf(FeatureField.class));
    FeatureField featureField1 = (FeatureField) fields[0];
    ParsedDocument doc2 = mapper.parse(source(b -> b.field("field", 12)));
    FeatureField featureField2 = (FeatureField) doc2.rootDoc().getFields("_feature")[0];
    int freq1 = getFrequency(featureField1.tokenStream(null, null));
    int freq2 = getFrequency(featureField2.tokenStream(null, null));
    assertTrue(freq1 < freq2);
}
Also used : Query(org.apache.lucene.search.Query) Arrays(java.util.Arrays) TokenStream(org.apache.lucene.analysis.TokenStream) IndexableField(org.apache.lucene.index.IndexableField) Collection(java.util.Collection) TermFrequencyAttribute(org.apache.lucene.analysis.tokenattributes.TermFrequencyAttribute) IOException(java.io.IOException) Plugin(org.opensearch.plugins.Plugin) Strings(org.opensearch.common.Strings) FeatureField(org.apache.lucene.document.FeatureField) XContentBuilder(org.opensearch.common.xcontent.XContentBuilder) Matchers.instanceOf(org.hamcrest.Matchers.instanceOf) TermQuery(org.apache.lucene.search.TermQuery) List(org.opensearch.common.collect.List) IndexableField(org.apache.lucene.index.IndexableField) FeatureField(org.apache.lucene.document.FeatureField)

Example 4 with FeatureField

use of org.apache.lucene.document.FeatureField in project OpenSearch by opensearch-project.

the class RankFeaturesFieldMapperTests method testDefaults.

public void testDefaults() throws Exception {
    DocumentMapper mapper = createDocumentMapper(fieldMapping(this::minimalMapping));
    assertEquals(Strings.toString(fieldMapping(this::minimalMapping)), mapper.mappingSource().toString());
    ParsedDocument doc1 = mapper.parse(source(this::writeField));
    IndexableField[] fields = doc1.rootDoc().getFields("field");
    assertEquals(2, fields.length);
    assertThat(fields[0], Matchers.instanceOf(FeatureField.class));
    FeatureField featureField1 = (FeatureField) fields[0];
    assertThat(featureField1.stringValue(), Matchers.equalTo("foo"));
    FeatureField featureField2 = (FeatureField) fields[1];
    assertThat(featureField2.stringValue(), Matchers.equalTo("bar"));
    int freq1 = RankFeatureFieldMapperTests.getFrequency(featureField1.tokenStream(null, null));
    int freq2 = RankFeatureFieldMapperTests.getFrequency(featureField2.tokenStream(null, null));
    assertTrue(freq1 < freq2);
}
Also used : IndexableField(org.apache.lucene.index.IndexableField) FeatureField(org.apache.lucene.document.FeatureField)

Example 5 with FeatureField

use of org.apache.lucene.document.FeatureField in project BootForum by chipolaris.

the class IndexService method createCommentDocument.

/**
 * utility method to create a Document based on Comment entity
 * @param comment
 * @return
 */
private Document createCommentDocument(Comment comment) {
    Document document = new Document();
    // store id as a String field
    document.add(new StringField("id", comment.getId() + "", Store.YES));
    // also use id as a FeatureField to be factored in the scoring process during search
    document.add(new FeatureField("features", "MoreRecent", comment.getId()));
    // use StoredField for attributes that do not get queried
    document.add(new StoredField("createBy", comment.getCreateBy()));
    document.add(new StoredField("createDate", comment.getCreateDate().getTime()));
    // note: TextField vs. StringField: the former get tokenized while the later does not
    document.add(new TextField("title", comment.getTitle(), Store.YES));
    // comment.content contains HTML content so first extract the text content
    document.add(new TextField("content", new TextExtractor(new Source(comment.getContent())).toString(), Store.YES));
    // discussion fields
    document.add(new StringField("discussionId", comment.getDiscussion().getId() + "", Store.YES));
    return document;
}
Also used : StoredField(org.apache.lucene.document.StoredField) StringField(org.apache.lucene.document.StringField) TextExtractor(net.htmlparser.jericho.TextExtractor) TextField(org.apache.lucene.document.TextField) Document(org.apache.lucene.document.Document) FeatureField(org.apache.lucene.document.FeatureField) Source(net.htmlparser.jericho.Source)

Aggregations

FeatureField (org.apache.lucene.document.FeatureField)7 IndexableField (org.apache.lucene.index.IndexableField)3 IOException (java.io.IOException)2 Arrays (java.util.Arrays)2 Collection (java.util.Collection)2 TokenStream (org.apache.lucene.analysis.TokenStream)2 TermFrequencyAttribute (org.apache.lucene.analysis.tokenattributes.TermFrequencyAttribute)2 Document (org.apache.lucene.document.Document)2 StoredField (org.apache.lucene.document.StoredField)2 StringField (org.apache.lucene.document.StringField)2 TextField (org.apache.lucene.document.TextField)2 Query (org.apache.lucene.search.Query)2 TermQuery (org.apache.lucene.search.TermQuery)2 Matchers.instanceOf (org.hamcrest.Matchers.instanceOf)2 Strings (org.opensearch.common.Strings)2 List (org.opensearch.common.collect.List)2 XContentBuilder (org.opensearch.common.xcontent.XContentBuilder)2 Plugin (org.opensearch.plugins.Plugin)2 Tag (com.github.chipolaris.bootforum.domain.Tag)1 Source (net.htmlparser.jericho.Source)1