Search in sources :

Example 1 with RegexMatcher

use of edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher in project textdb by TextDB.

the class LogicalPlanTest method testLogicalPlan3.

/*
     * Test a valid operator graph.
     * 
     *                  --> RegexMatcher -->
     *                  |                    >-- Join1
     * KeywordSource --< -> NlpEntityOperator -->          >-- Join2 --> TupleSink
     *                  |                           /
     *                  --> FuzzyTokenMatcher ----->
     * 
     */
@Test
public void testLogicalPlan3() throws Exception {
    LogicalPlan logicalPlan = getLogicalPlan3();
    Plan queryPlan = logicalPlan.buildQueryPlan();
    ISink tupleSink = queryPlan.getRoot();
    Assert.assertTrue(tupleSink instanceof TupleSink);
    IOperator join2 = ((TupleSink) tupleSink).getInputOperator();
    Assert.assertTrue(join2 instanceof Join);
    IOperator join2Input1 = ((Join) join2).getOuterInputOperator();
    Assert.assertTrue(join2Input1 instanceof Join);
    IOperator join2Input2 = ((Join) join2).getInnerInputOperator();
    Assert.assertTrue(join2Input2 instanceof FuzzyTokenMatcher);
    IOperator join1Input1 = ((Join) join2Input1).getInnerInputOperator();
    Assert.assertTrue(join1Input1 instanceof RegexMatcher);
    IOperator join1Input2 = ((Join) join2Input1).getOuterInputOperator();
    Assert.assertTrue(join1Input2 instanceof NlpEntityOperator);
    IOperator connectorOut1 = ((RegexMatcher) join1Input1).getInputOperator();
    Assert.assertTrue(connectorOut1 instanceof ConnectorOutputOperator);
    IOperator connectorOut2 = ((NlpEntityOperator) join1Input2).getInputOperator();
    Assert.assertTrue(connectorOut2 instanceof ConnectorOutputOperator);
    IOperator connectorOut3 = ((FuzzyTokenMatcher) join2Input2).getInputOperator();
    Assert.assertTrue(connectorOut3 instanceof ConnectorOutputOperator);
    HashSet<Integer> connectorIndices = new HashSet<>();
    connectorIndices.add(((ConnectorOutputOperator) connectorOut1).getOutputIndex());
    connectorIndices.add(((ConnectorOutputOperator) connectorOut2).getOutputIndex());
    connectorIndices.add(((ConnectorOutputOperator) connectorOut3).getOutputIndex());
    Assert.assertEquals(connectorIndices.size(), 3);
    OneToNBroadcastConnector connector1 = ((ConnectorOutputOperator) connectorOut1).getOwnerConnector();
    OneToNBroadcastConnector connector2 = ((ConnectorOutputOperator) connectorOut2).getOwnerConnector();
    OneToNBroadcastConnector connector3 = ((ConnectorOutputOperator) connectorOut3).getOwnerConnector();
    Assert.assertSame(connector1, connector2);
    Assert.assertSame(connector1, connector3);
    IOperator keywordSource = connector1.getInputOperator();
    Assert.assertTrue(keywordSource instanceof KeywordMatcherSourceOperator);
}
Also used : TupleSink(edu.uci.ics.textdb.exp.sink.tuple.TupleSink) IOperator(edu.uci.ics.textdb.api.dataflow.IOperator) Join(edu.uci.ics.textdb.exp.join.Join) Plan(edu.uci.ics.textdb.api.engine.Plan) FuzzyTokenMatcher(edu.uci.ics.textdb.exp.fuzzytokenmatcher.FuzzyTokenMatcher) KeywordMatcherSourceOperator(edu.uci.ics.textdb.exp.keywordmatcher.KeywordMatcherSourceOperator) ISink(edu.uci.ics.textdb.api.dataflow.ISink) ConnectorOutputOperator(edu.uci.ics.textdb.exp.connector.OneToNBroadcastConnector.ConnectorOutputOperator) NlpEntityOperator(edu.uci.ics.textdb.exp.nlp.entity.NlpEntityOperator) RegexMatcher(edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher) OneToNBroadcastConnector(edu.uci.ics.textdb.exp.connector.OneToNBroadcastConnector) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 2 with RegexMatcher

use of edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher in project textdb by TextDB.

the class LogicalPlanTest method testLogicalPlan1.

/*
     * Test a valid operator graph.
     * 
     * KeywordSource --> RegexMatcher --> TupleSink
     * 
     */
@Test
public void testLogicalPlan1() throws Exception {
    LogicalPlan logicalPlan = getLogicalPlan1();
    Plan queryPlan = logicalPlan.buildQueryPlan();
    ISink tupleSink = queryPlan.getRoot();
    Assert.assertTrue(tupleSink instanceof TupleSink);
    IOperator regexMatcher = ((TupleSink) tupleSink).getInputOperator();
    Assert.assertTrue(regexMatcher instanceof RegexMatcher);
    IOperator keywordSource = ((RegexMatcher) regexMatcher).getInputOperator();
    Assert.assertTrue(keywordSource instanceof KeywordMatcherSourceOperator);
}
Also used : ISink(edu.uci.ics.textdb.api.dataflow.ISink) TupleSink(edu.uci.ics.textdb.exp.sink.tuple.TupleSink) IOperator(edu.uci.ics.textdb.api.dataflow.IOperator) RegexMatcher(edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher) Plan(edu.uci.ics.textdb.api.engine.Plan) KeywordMatcherSourceOperator(edu.uci.ics.textdb.exp.keywordmatcher.KeywordMatcherSourceOperator) Test(org.junit.Test)

Example 3 with RegexMatcher

use of edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher in project textdb by TextDB.

the class SimilarityJoinTest method test1.

/*
     * Tests the Similarity Join Predicate on two similar words:
     *   Donald J. Trump
     *   Donald Trump
     * Under the condition of similarity (NormalizedLevenshtein) > 0.8, these two words should match.
     *
     */
@Test
public void test1() throws TextDBException {
    JoinTestHelper.insertToTable(NEWS_TABLE_OUTER, JoinTestConstants.getNewsTuples().get(0));
    JoinTestHelper.insertToTable(NEWS_TABLE_INNER, JoinTestConstants.getNewsTuples().get(1));
    String trumpRegex = "[Dd]onald.{1,5}[Tt]rump";
    RegexMatcher regexMatcherInner = JoinTestHelper.getRegexMatcher(JoinTestHelper.NEWS_TABLE_INNER, trumpRegex, JoinTestConstants.NEWS_BODY);
    RegexMatcher regexMatcherOuter = JoinTestHelper.getRegexMatcher(JoinTestHelper.NEWS_TABLE_OUTER, trumpRegex, JoinTestConstants.NEWS_BODY);
    SimilarityJoinPredicate similarityJoinPredicate = new SimilarityJoinPredicate(JoinTestConstants.NEWS_BODY, 0.8);
    List<Tuple> results = JoinTestHelper.getJoinDistanceResults(regexMatcherInner, regexMatcherOuter, similarityJoinPredicate, Integer.MAX_VALUE, 0);
    Schema joinInputSchema = Utils.addAttributeToSchema(JoinTestConstants.NEWS_SCHEMA, SchemaConstants.SPAN_LIST_ATTRIBUTE);
    Schema resultSchema = similarityJoinPredicate.generateOutputSchema(joinInputSchema, joinInputSchema);
    List<Span> resultSpanList = Arrays.asList(new Span("inner_" + JoinTestConstants.NEWS_BODY, 5, 20, trumpRegex, "Donald J. Trump", -1), new Span("outer_" + JoinTestConstants.NEWS_BODY, 18, 30, trumpRegex, "Donald Trump", -1));
    Tuple resultTuple = new Tuple(resultSchema, new IDField(UUID.randomUUID().toString()), new IntegerField(2), new TextField("Alternative Facts and the Costs of Trump-Branded Reality"), new TextField("When Donald J. Trump swore the presidential oath on Friday, he assumed " + "responsibility not only for the levers of government but also for one of " + "the United States’ most valuable assets, battered though it may be: its credibility. " + "The country’s sentimental reverence for truth and its jealously guarded press freedoms, " + "while never perfect, have been as important to its global standing as the strength of " + "its military and the reliability of its currency. It’s the bedrock of that " + "American exceptionalism we’ve heard so much about for so long."), new IntegerField(1), new TextField("UCI marchers protest as Trump begins his presidency"), new TextField("a few hours after Donald Trump was sworn in Friday as the nation’s 45th president, " + "a line of more than 100 UC Irvine faculty members and students took to the campus " + "in pouring rain to demonstrate their opposition to his policies on immigration and " + "other issues and urge other opponents to keep organizing during Trump’s presidency."), new ListField<>(resultSpanList));
    Assert.assertTrue(TestUtils.equals(Arrays.asList(resultTuple), results));
}
Also used : IDField(edu.uci.ics.textdb.api.field.IDField) SimilarityJoinPredicate(edu.uci.ics.textdb.exp.join.SimilarityJoinPredicate) Schema(edu.uci.ics.textdb.api.schema.Schema) TextField(edu.uci.ics.textdb.api.field.TextField) RegexMatcher(edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher) IntegerField(edu.uci.ics.textdb.api.field.IntegerField) Span(edu.uci.ics.textdb.api.span.Span) Tuple(edu.uci.ics.textdb.api.tuple.Tuple) Test(org.junit.Test)

Example 4 with RegexMatcher

use of edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher in project textdb by TextDB.

the class SimilarityJoinTest method test2.

/*
     * Tests the Similarity Join Predicate on two similar words:
     *   Donald J. Trump
     *   Donald Trump
     * Under the condition of similarity (NormalizedLevenshtein) > 0.9, these two words should NOT match.
     *
     */
@Test
public void test2() throws TextDBException {
    JoinTestHelper.insertToTable(NEWS_TABLE_OUTER, JoinTestConstants.getNewsTuples().get(0));
    JoinTestHelper.insertToTable(NEWS_TABLE_INNER, JoinTestConstants.getNewsTuples().get(1));
    String trumpRegex = "[Dd]onald.{1,5}[Tt]rump";
    RegexMatcher regexMatcherInner = JoinTestHelper.getRegexMatcher(JoinTestHelper.NEWS_TABLE_INNER, trumpRegex, JoinTestConstants.NEWS_BODY);
    RegexMatcher regexMatcherOuter = JoinTestHelper.getRegexMatcher(JoinTestHelper.NEWS_TABLE_OUTER, trumpRegex, JoinTestConstants.NEWS_BODY);
    SimilarityJoinPredicate similarityJoinPredicate = new SimilarityJoinPredicate(JoinTestConstants.NEWS_BODY, 0.9);
    List<Tuple> results = JoinTestHelper.getJoinDistanceResults(regexMatcherInner, regexMatcherOuter, similarityJoinPredicate, Integer.MAX_VALUE, 0);
    Assert.assertTrue(results.isEmpty());
}
Also used : SimilarityJoinPredicate(edu.uci.ics.textdb.exp.join.SimilarityJoinPredicate) RegexMatcher(edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher) Tuple(edu.uci.ics.textdb.api.tuple.Tuple) Test(org.junit.Test)

Example 5 with RegexMatcher

use of edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher in project textdb by TextDB.

the class SimilarityJoinTest method test3.

/*
     * Tests the Similarity Join Predicate on two similar words:
     *   Galaxy S8
     *   Galaxy Note 7
     * Under the condition of similarity (NormalizedLevenshtein) > 0.5, these two words should match.
     *
     */
@Test
public void test3() throws TextDBException {
    JoinTestHelper.insertToTable(NEWS_TABLE_OUTER, JoinTestConstants.getNewsTuples().get(2));
    JoinTestHelper.insertToTable(NEWS_TABLE_INNER, JoinTestConstants.getNewsTuples().get(3));
    String phoneRegex = "[Gg]alaxy.{1,6}\\d";
    RegexMatcher regexMatcherInner = JoinTestHelper.getRegexMatcher(JoinTestHelper.NEWS_TABLE_INNER, phoneRegex, JoinTestConstants.NEWS_BODY);
    RegexMatcher regexMatcherOuter = JoinTestHelper.getRegexMatcher(JoinTestHelper.NEWS_TABLE_OUTER, phoneRegex, JoinTestConstants.NEWS_BODY);
    SimilarityJoinPredicate similarityJoinPredicate = new SimilarityJoinPredicate(JoinTestConstants.NEWS_BODY, 0.5);
    List<Tuple> results = JoinTestHelper.getJoinDistanceResults(regexMatcherInner, regexMatcherOuter, similarityJoinPredicate, Integer.MAX_VALUE, 0);
    Schema joinInputSchema = Utils.addAttributeToSchema(JoinTestConstants.NEWS_SCHEMA, SchemaConstants.SPAN_LIST_ATTRIBUTE);
    Schema resultSchema = similarityJoinPredicate.generateOutputSchema(joinInputSchema, joinInputSchema);
    List<Span> resultSpanList = Arrays.asList(new Span("inner_" + JoinTestConstants.NEWS_BODY, 327, 336, phoneRegex, "Galaxy S8", -1), new Span("outer_" + JoinTestConstants.NEWS_BODY, 21, 34, phoneRegex, "Galaxy Note 7", -1));
    Tuple resultTuple = new Tuple(resultSchema, new IDField(UUID.randomUUID().toString()), new IntegerField(4), new TextField("This is how Samsung plans to prevent future phones from catching fire"), new TextField("Samsung said that it has implemented a new eight-step testing process for " + "its lithium ion batteries, and that it’s forming a battery advisory board as well, " + "comprised of academics from Cambridge, Berkeley, and Stanford. " + "Note, this is for all lithium ion batteries in Samsung products, " + "not just Note phablets or the anticipated Galaxy S8 phone."), new IntegerField(3), new TextField("Samsung Explains Note 7 Battery Explosions, And Turns Crisis Into Opportunity"), new TextField("Samsung launched the Galaxy Note 7 to record preorders and sales in August, " + "but the rosy start soon turned sour. Samsung had to initiate a recall in September of " + "the first version of the Note 7 due to faulty batteries that overheated and exploded. " + "By October it had to recall over 2 million devices and discontinue the product. " + "It’s estimated that the recall will cost Samsung $5.3 billion."), new ListField<>(resultSpanList));
    Assert.assertTrue(TestUtils.equals(Arrays.asList(resultTuple), results));
}
Also used : IDField(edu.uci.ics.textdb.api.field.IDField) SimilarityJoinPredicate(edu.uci.ics.textdb.exp.join.SimilarityJoinPredicate) Schema(edu.uci.ics.textdb.api.schema.Schema) TextField(edu.uci.ics.textdb.api.field.TextField) RegexMatcher(edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher) IntegerField(edu.uci.ics.textdb.api.field.IntegerField) Span(edu.uci.ics.textdb.api.span.Span) Tuple(edu.uci.ics.textdb.api.tuple.Tuple) Test(org.junit.Test)

Aggregations

RegexMatcher (edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher)8 Test (org.junit.Test)7 Tuple (edu.uci.ics.textdb.api.tuple.Tuple)4 SimilarityJoinPredicate (edu.uci.ics.textdb.exp.join.SimilarityJoinPredicate)4 IOperator (edu.uci.ics.textdb.api.dataflow.IOperator)3 ISink (edu.uci.ics.textdb.api.dataflow.ISink)3 Plan (edu.uci.ics.textdb.api.engine.Plan)3 KeywordMatcherSourceOperator (edu.uci.ics.textdb.exp.keywordmatcher.KeywordMatcherSourceOperator)3 TupleSink (edu.uci.ics.textdb.exp.sink.tuple.TupleSink)3 IDField (edu.uci.ics.textdb.api.field.IDField)2 IntegerField (edu.uci.ics.textdb.api.field.IntegerField)2 TextField (edu.uci.ics.textdb.api.field.TextField)2 Schema (edu.uci.ics.textdb.api.schema.Schema)2 Span (edu.uci.ics.textdb.api.span.Span)2 OneToNBroadcastConnector (edu.uci.ics.textdb.exp.connector.OneToNBroadcastConnector)2 ConnectorOutputOperator (edu.uci.ics.textdb.exp.connector.OneToNBroadcastConnector.ConnectorOutputOperator)2 Join (edu.uci.ics.textdb.exp.join.Join)2 NlpEntityOperator (edu.uci.ics.textdb.exp.nlp.entity.NlpEntityOperator)2 HashSet (java.util.HashSet)2 DataFlowException (edu.uci.ics.textdb.api.exception.DataFlowException)1