Search in sources :

Example 1 with GND

use of org.opensearch.search.aggregations.bucket.terms.heuristic.GND in project OpenSearch by opensearch-project.

the class SignificantTermsSignificanceScoreIT method testBackgroundVsSeparateSet.

public void testBackgroundVsSeparateSet() throws Exception {
    String type = randomBoolean() ? "text" : "long";
    String settings = "{\"index.number_of_shards\": 1, \"index.number_of_replicas\": 0}";
    SharedSignificantTermsTestMethods.index01Docs(type, settings, this);
    testBackgroundVsSeparateSet(new MutualInformation(true, true), new MutualInformation(true, false), type);
    testBackgroundVsSeparateSet(new ChiSquare(true, true), new ChiSquare(true, false), type);
    testBackgroundVsSeparateSet(new GND(true), new GND(false), type);
}
Also used : ChiSquare(org.opensearch.search.aggregations.bucket.terms.heuristic.ChiSquare) MutualInformation(org.opensearch.search.aggregations.bucket.terms.heuristic.MutualInformation) GND(org.opensearch.search.aggregations.bucket.terms.heuristic.GND)

Example 2 with GND

use of org.opensearch.search.aggregations.bucket.terms.heuristic.GND in project OpenSearch by opensearch-project.

the class SignificanceHeuristicTests method testGNDCornerCases.

public void testGNDCornerCases() throws Exception {
    GND gnd = new GND(true);
    // term is only in the subset, not at all in the other set but that is because the other set is empty.
    // this should actually not happen because only terms that are in the subset are considered now,
    // however, in this case the score should be 0 because a term that does not exist cannot be relevant...
    assertThat(gnd.getScore(0, randomIntBetween(1, 2), 0, randomIntBetween(2, 3)), equalTo(0.0));
    // the terms do not co-occur at all - should be 0
    assertThat(gnd.getScore(0, randomIntBetween(1, 2), randomIntBetween(2, 3), randomIntBetween(5, 6)), equalTo(0.0));
    // comparison between two terms that do not exist - probably not relevant
    assertThat(gnd.getScore(0, 0, 0, randomIntBetween(1, 2)), equalTo(0.0));
    // terms co-occur perfectly - should be 1
    assertThat(gnd.getScore(1, 1, 1, 1), equalTo(1.0));
    gnd = new GND(false);
    assertThat(gnd.getScore(0, 0, 0, 0), equalTo(0.0));
}
Also used : GND(org.opensearch.search.aggregations.bucket.terms.heuristic.GND)

Example 3 with GND

use of org.opensearch.search.aggregations.bucket.terms.heuristic.GND in project OpenSearch by opensearch-project.

the class SignificanceHeuristicTests method getRandomSignificanceheuristic.

public static SignificanceHeuristic getRandomSignificanceheuristic() {
    List<SignificanceHeuristic> heuristics = new ArrayList<>();
    heuristics.add(new JLHScore());
    heuristics.add(new MutualInformation(randomBoolean(), randomBoolean()));
    heuristics.add(new GND(randomBoolean()));
    heuristics.add(new ChiSquare(randomBoolean(), randomBoolean()));
    return heuristics.get(randomInt(3));
}
Also used : JLHScore(org.opensearch.search.aggregations.bucket.terms.heuristic.JLHScore) ChiSquare(org.opensearch.search.aggregations.bucket.terms.heuristic.ChiSquare) ArrayList(java.util.ArrayList) SignificanceHeuristic(org.opensearch.search.aggregations.bucket.terms.heuristic.SignificanceHeuristic) MutualInformation(org.opensearch.search.aggregations.bucket.terms.heuristic.MutualInformation) GND(org.opensearch.search.aggregations.bucket.terms.heuristic.GND)

Example 4 with GND

use of org.opensearch.search.aggregations.bucket.terms.heuristic.GND in project OpenSearch by opensearch-project.

the class SignificanceHeuristicTests method testAssertions.

public void testAssertions() throws Exception {
    testBackgroundAssertions(new MutualInformation(true, true), new MutualInformation(true, false));
    testBackgroundAssertions(new ChiSquare(true, true), new ChiSquare(true, false));
    testBackgroundAssertions(new GND(true), new GND(false));
    testAssertions(new PercentageScore());
    testAssertions(new JLHScore());
}
Also used : JLHScore(org.opensearch.search.aggregations.bucket.terms.heuristic.JLHScore) ChiSquare(org.opensearch.search.aggregations.bucket.terms.heuristic.ChiSquare) MutualInformation(org.opensearch.search.aggregations.bucket.terms.heuristic.MutualInformation) GND(org.opensearch.search.aggregations.bucket.terms.heuristic.GND) PercentageScore(org.opensearch.search.aggregations.bucket.terms.heuristic.PercentageScore)

Example 5 with GND

use of org.opensearch.search.aggregations.bucket.terms.heuristic.GND in project OpenSearch by opensearch-project.

the class SignificanceHeuristicTests method testBuilderAndParser.

// test that
// 1. The output of the builders can actually be parsed
// 2. The parser does not swallow parameters after a significance heuristic was defined
public void testBuilderAndParser() throws Exception {
    // test jlh with string
    assertTrue(parseFromString("\"jlh\":{}") instanceof JLHScore);
    // test gnd with string
    assertTrue(parseFromString("\"gnd\":{}") instanceof GND);
    // test mutual information with string
    boolean includeNegatives = randomBoolean();
    boolean backgroundIsSuperset = randomBoolean();
    String mutual = "\"mutual_information\":{\"include_negatives\": " + includeNegatives + ", \"background_is_superset\":" + backgroundIsSuperset + "}";
    assertEquals(new MutualInformation(includeNegatives, backgroundIsSuperset), parseFromString(mutual));
    String chiSquare = "\"chi_square\":{\"include_negatives\": " + includeNegatives + ", \"background_is_superset\":" + backgroundIsSuperset + "}";
    assertEquals(new ChiSquare(includeNegatives, backgroundIsSuperset), parseFromString(chiSquare));
    // test with builders
    assertThat(parseFromBuilder(new JLHScore()), instanceOf(JLHScore.class));
    assertThat(parseFromBuilder(new GND(backgroundIsSuperset)), instanceOf(GND.class));
    assertEquals(new MutualInformation(includeNegatives, backgroundIsSuperset), parseFromBuilder(new MutualInformation(includeNegatives, backgroundIsSuperset)));
    assertEquals(new ChiSquare(includeNegatives, backgroundIsSuperset), parseFromBuilder(new ChiSquare(includeNegatives, backgroundIsSuperset)));
    // test exceptions
    String expectedError = "unknown field [unknown_field]";
    checkParseException("\"mutual_information\":{\"include_negatives\": false, \"unknown_field\": false}", expectedError);
    checkParseException("\"chi_square\":{\"unknown_field\": true}", expectedError);
    checkParseException("\"jlh\":{\"unknown_field\": true}", expectedError);
    checkParseException("\"gnd\":{\"unknown_field\": true}", expectedError);
}
Also used : JLHScore(org.opensearch.search.aggregations.bucket.terms.heuristic.JLHScore) ChiSquare(org.opensearch.search.aggregations.bucket.terms.heuristic.ChiSquare) MutualInformation(org.opensearch.search.aggregations.bucket.terms.heuristic.MutualInformation) Matchers.containsString(org.hamcrest.Matchers.containsString) GND(org.opensearch.search.aggregations.bucket.terms.heuristic.GND)

Aggregations

GND (org.opensearch.search.aggregations.bucket.terms.heuristic.GND)6 ChiSquare (org.opensearch.search.aggregations.bucket.terms.heuristic.ChiSquare)5 MutualInformation (org.opensearch.search.aggregations.bucket.terms.heuristic.MutualInformation)5 JLHScore (org.opensearch.search.aggregations.bucket.terms.heuristic.JLHScore)4 PercentageScore (org.opensearch.search.aggregations.bucket.terms.heuristic.PercentageScore)2 ArrayList (java.util.ArrayList)1 Matchers.containsString (org.hamcrest.Matchers.containsString)1 SignificanceHeuristic (org.opensearch.search.aggregations.bucket.terms.heuristic.SignificanceHeuristic)1