Search in sources :

Example 1 with RegexSourcePredicate

use of edu.uci.ics.texera.dataflow.regexmatcher.RegexSourcePredicate in project textdb by TextDB.

the class PredicateBaseTest method testRegexMatcher.

@Test
public void testRegexMatcher() throws Exception {
    RegexPredicate regexPredicate = new RegexPredicate("regex", attributeNames, "spanListName");
    testPredicate(regexPredicate);
    RegexSourcePredicate regexSourcePredicate = new RegexSourcePredicate("regex", attributeNames, "tableName", "spanListName");
    testPredicate(regexSourcePredicate);
}
Also used : RegexSourcePredicate(edu.uci.ics.texera.dataflow.regexmatcher.RegexSourcePredicate) RegexPredicate(edu.uci.ics.texera.dataflow.regexmatcher.RegexPredicate) Test(org.junit.Test)

Example 2 with RegexSourcePredicate

use of edu.uci.ics.texera.dataflow.regexmatcher.RegexSourcePredicate in project textdb by TextDB.

the class RegexMatcherPerformanceTest method matchRegex.

/*
     *         This function does match for a list of regex queries
     */
public static void matchRegex(List<String> regexes, String tableName) throws TexeraException, IOException {
    List<String> attributeNames = Arrays.asList(MedlineIndexWriter.ABSTRACT);
    for (String regex : regexes) {
        // analyzer should generate grams all in lower case to build a lower
        // case index.
        RegexSourcePredicate predicate = new RegexSourcePredicate(regex, attributeNames, tableName, SchemaConstants.SPAN_LIST);
        RegexMatcherSourceOperator regexSource = new RegexMatcherSourceOperator(predicate);
        long startMatchTime = System.currentTimeMillis();
        regexSource.open();
        int counter = 0;
        Tuple nextTuple = null;
        while ((nextTuple = regexSource.getNextTuple()) != null) {
            ListField<Span> spanListField = nextTuple.getField(SchemaConstants.SPAN_LIST);
            List<Span> spanList = spanListField.getValue();
            counter += spanList.size();
        }
        regexSource.close();
        long endMatchTime = System.currentTimeMillis();
        double matchTime = (endMatchTime - startMatchTime) / 1000.0;
        totalMatchingTime += matchTime;
        totalRegexResultCount += counter;
    }
}
Also used : RegexSourcePredicate(edu.uci.ics.texera.dataflow.regexmatcher.RegexSourcePredicate) RegexMatcherSourceOperator(edu.uci.ics.texera.dataflow.regexmatcher.RegexMatcherSourceOperator) Span(edu.uci.ics.texera.api.span.Span) Tuple(edu.uci.ics.texera.api.tuple.Tuple)

Aggregations

RegexSourcePredicate (edu.uci.ics.texera.dataflow.regexmatcher.RegexSourcePredicate)2 Span (edu.uci.ics.texera.api.span.Span)1 Tuple (edu.uci.ics.texera.api.tuple.Tuple)1 RegexMatcherSourceOperator (edu.uci.ics.texera.dataflow.regexmatcher.RegexMatcherSourceOperator)1 RegexPredicate (edu.uci.ics.texera.dataflow.regexmatcher.RegexPredicate)1 Test (org.junit.Test)1