Search in sources :

Example 1 with FuzzyTokenSourcePredicate

use of edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenSourcePredicate in project textdb by TextDB.

the class PredicateBaseTest method testFuzzyToken.

@Test
public void testFuzzyToken() throws Exception {
    FuzzyTokenPredicate fuzzyTokenPredicate = new FuzzyTokenPredicate("token1 token2 token3", attributeNames, "standard", 0.8, "spanListName");
    testPredicate(fuzzyTokenPredicate);
    FuzzyTokenSourcePredicate fuzzyTokenSourcePredicate = new FuzzyTokenSourcePredicate("token1 token2 token3", attributeNames, "standard", 0.8, "tableName", "spanListName");
    testPredicate(fuzzyTokenSourcePredicate);
}
Also used : FuzzyTokenSourcePredicate(edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenSourcePredicate) FuzzyTokenPredicate(edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenPredicate) Test(org.junit.Test)

Example 2 with FuzzyTokenSourcePredicate

use of edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenSourcePredicate in project textdb by TextDB.

the class FuzzyTokenMatcherPerformanceTest method match.

/*
     * This function does match for a list of queries
     */
public static void match(ArrayList<String> queryList, double threshold, String luceneAnalyzerStr, String tableName, boolean bool) throws TexeraException, IOException {
    List<String> attributeNames = Arrays.asList(MedlineIndexWriter.ABSTRACT);
    for (String query : queryList) {
        FuzzyTokenSourcePredicate predicate = new FuzzyTokenSourcePredicate(query, attributeNames, luceneAnalyzerStr, threshold, tableName, SchemaConstants.SPAN_LIST);
        FuzzyTokenMatcherSourceOperator fuzzyTokenSource = new FuzzyTokenMatcherSourceOperator(predicate);
        long startMatchTime = System.currentTimeMillis();
        fuzzyTokenSource.open();
        int counter = 0;
        Tuple nextTuple = null;
        while ((nextTuple = fuzzyTokenSource.getNextTuple()) != null) {
            ListField<Span> spanListField = nextTuple.getField(SchemaConstants.SPAN_LIST);
            List<Span> spanList = spanListField.getValue();
            counter += spanList.size();
        }
        fuzzyTokenSource.close();
        long endMatchTime = System.currentTimeMillis();
        double matchTime = (endMatchTime - startMatchTime) / 1000.0;
        timeResults.add(Double.parseDouble(String.format("%.4f", matchTime)));
        totalResultCount += counter;
    }
}
Also used : FuzzyTokenMatcherSourceOperator(edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenMatcherSourceOperator) FuzzyTokenSourcePredicate(edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenSourcePredicate) Span(edu.uci.ics.texera.api.span.Span) Tuple(edu.uci.ics.texera.api.tuple.Tuple)

Example 3 with FuzzyTokenSourcePredicate

use of edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenSourcePredicate in project textdb by TextDB.

the class JoinDistanceTest method testOneOfTheOperatorResultContainsNoSpan.

// This case tests for the scenario when one of the operators result lists has no span.
// If one of the operators doesn't have span, then an exception will be thrown.
// Test result: DataflowException is thrown
@Test(expected = DataflowException.class)
public void testOneOfTheOperatorResultContainsNoSpan() throws Exception {
    JoinTestHelper.insertToTable(BOOK_TABLE, JoinTestConstants.bookGroup1.get(0));
    KeywordMatcherSourceOperator keywordSourceOuter = JoinTestHelper.getKeywordSource(BOOK_TABLE, "special", conjunction);
    String fuzzyTokenQuery = "this writer writes well";
    double thresholdRatio = 0.25;
    List<String> textAttributeNames = JoinTestConstants.BOOK_SCHEMA.getAttributes().stream().filter(attr -> attr.getType() != AttributeType.TEXT).map(Attribute::getName).collect(Collectors.toList());
    FuzzyTokenSourcePredicate fuzzySourcePredicateInner = new FuzzyTokenSourcePredicate(fuzzyTokenQuery, textAttributeNames, LuceneAnalyzerConstants.standardAnalyzerString(), thresholdRatio, BOOK_TABLE, SchemaConstants.SPAN_LIST);
    FuzzyTokenMatcherSourceOperator fuzzyMatcherInner = new FuzzyTokenMatcherSourceOperator(fuzzySourcePredicateInner);
    ProjectionPredicate removeSpanListPredicate = new ProjectionPredicate(JoinTestConstants.BOOK_SCHEMA.getAttributeNames());
    ProjectionOperator removeSpanListProjection = new ProjectionOperator(removeSpanListPredicate);
    removeSpanListProjection.setInputOperator(fuzzyMatcherInner);
    JoinTestHelper.getJoinDistanceResults(keywordSourceOuter, removeSpanListProjection, new JoinDistancePredicate(JoinTestConstants.REVIEW, 20), Integer.MAX_VALUE, 0);
}
Also used : FuzzyTokenMatcherSourceOperator(edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenMatcherSourceOperator) ProjectionOperator(edu.uci.ics.texera.dataflow.projection.ProjectionOperator) FuzzyTokenSourcePredicate(edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenSourcePredicate) ProjectionPredicate(edu.uci.ics.texera.dataflow.projection.ProjectionPredicate) JoinDistancePredicate(edu.uci.ics.texera.dataflow.join.JoinDistancePredicate) KeywordMatcherSourceOperator(edu.uci.ics.texera.dataflow.keywordmatcher.KeywordMatcherSourceOperator) Test(org.junit.Test)

Aggregations

FuzzyTokenSourcePredicate (edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenSourcePredicate)3 FuzzyTokenMatcherSourceOperator (edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenMatcherSourceOperator)2 Test (org.junit.Test)2 Span (edu.uci.ics.texera.api.span.Span)1 Tuple (edu.uci.ics.texera.api.tuple.Tuple)1 FuzzyTokenPredicate (edu.uci.ics.texera.dataflow.fuzzytokenmatcher.FuzzyTokenPredicate)1 JoinDistancePredicate (edu.uci.ics.texera.dataflow.join.JoinDistancePredicate)1 KeywordMatcherSourceOperator (edu.uci.ics.texera.dataflow.keywordmatcher.KeywordMatcherSourceOperator)1 ProjectionOperator (edu.uci.ics.texera.dataflow.projection.ProjectionOperator)1 ProjectionPredicate (edu.uci.ics.texera.dataflow.projection.ProjectionPredicate)1