Search in sources :

Example 1 with IntegerField

use of edu.uci.ics.textdb.api.field.IntegerField in project textdb by TextDB.

the class TestConstantsChinese method getSamplePeopleTuples.

public static List<Tuple> getSamplePeopleTuples() {
    try {
        IField[] fields1 = { new StringField("无忌"), new StringField("长孙"), new IntegerField(46), new DoubleField(5.50), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-14-1970")), new TextField("北京大学电气工程学院") };
        IField[] fields2 = { new StringField("孔明"), new StringField("洛克贝尔"), new IntegerField(42), new DoubleField(5.99), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-13-1974")), new TextField("北京大学计算机学院") };
        IField[] fields3 = { new StringField("宋江"), new StringField("建筑"), new IntegerField(42), new DoubleField(5.99), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-13-1974")), new TextField("伟大的建筑是历史的坐标,具有传承的价值。") };
        Tuple tuple1 = new Tuple(SCHEMA_PEOPLE, fields1);
        Tuple tuple2 = new Tuple(SCHEMA_PEOPLE, fields2);
        Tuple tuple3 = new Tuple(SCHEMA_PEOPLE, fields3);
        return Arrays.asList(tuple1, tuple2, tuple3);
    } catch (ParseException e) {
        // exception should not happen because we know the data is correct
        e.printStackTrace();
        return Arrays.asList();
    }
}
Also used : StringField(edu.uci.ics.textdb.api.field.StringField) TextField(edu.uci.ics.textdb.api.field.TextField) IntegerField(edu.uci.ics.textdb.api.field.IntegerField) DateField(edu.uci.ics.textdb.api.field.DateField) ParseException(java.text.ParseException) IField(edu.uci.ics.textdb.api.field.IField) SimpleDateFormat(java.text.SimpleDateFormat) DoubleField(edu.uci.ics.textdb.api.field.DoubleField)

Example 2 with IntegerField

use of edu.uci.ics.textdb.api.field.IntegerField in project textdb by TextDB.

the class TestConstantsChineseWordCount method getSamplePeopleTuples.

public static List<Tuple> getSamplePeopleTuples() {
    try {
        IField[] fields1 = { new StringField("bruce"), new StringField("john Lee"), new IntegerField(46), new DoubleField(5.50), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-14-1970")), new TextField("中新社北京4月26日电 (记者 刘育英)“中国制造2025”政策措施实施以来,“为稳定工业增长、加快制造业转型升级发" + "挥了重要作用”,效果初步显现。") };
        IField[] fields2 = { new StringField("tom hanks"), new StringField("cruise"), new IntegerField(45), new DoubleField(5.95), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-13-1971")), new TextField("  中国2015年发布了“中国制造2025”通知。中国工业和信息化部运行监测协调局副" + "局长黄利斌26日在国新办新闻发布会上表示,自“中国制造2025”实施以来,国家制造业创新中心建设、智能制造" + "、工业强基、绿色制造、高端装备创新等“五大工程”扎实推进;2016年度15个重大标志性项目中,7个完全落实" + ",4个基本落实,其余正在推进。") };
        IField[] fields3 = { new StringField("brad lie angelina"), new StringField("pitt"), new IntegerField(44), new DoubleField(6.10), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-12-1972")), new TextField("  2017年,工信部将重点推进六方面工作:加大“五大工程”实施力度," + "积极推进创新中心建设;扩大试点示范城市(群)覆盖面;实施新一轮重大技术改造升级工程;" + "推进" + "制造业与互联网融合发展;优化制造业发展环境。") };
        IField[] fields4 = { new StringField("george lin lin"), new StringField("lin clooney"), new IntegerField(43), new DoubleField(6.06), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-13-1973")), new TextField("  黄利斌说,今年继续开展“互联" + "网+”制造业试点示范,加快工业互联网基础设施改造升级。现在,47%的大企业" + "搭建了运营协同创新平台,两化融合(信息化和工业化融合)管理体系贯标企业运" + "营成本平均下降了8.8%,经营利润平均增长了6.9%。") };
        IField[] fields5 = { new StringField("christian john wayne"), new StringField("rock bale"), new IntegerField(42), new DoubleField(5.99), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-13-1974")), new TextField("  工信部今年还将" + "选取20-30个城市(群)继续开展“中国制造2025”试点示范创建," + "指导试点示范城市(群),在落实新发展理念等方面先行先试。") };
        IField[] fields6 = { new StringField("Mary brown"), new StringField("Lake Forest"), new IntegerField(42), new DoubleField(5.99), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-13-1974")), new TextField("资料图:由驻日中" + "资太阳能企业开发、承建并运营维护的日本岛根县滨田市第二期12兆瓦光伏电站项目(滨田MS太阳能发电站),4月25日在当" + "地举行竣工典礼。该大型太阳能电站的九成设备来自“中国制造”。在日本并网发电的特高压太阳能电站中,这是中国产设备占" + "比最高的项目。中新社记者 王健 摄") };
        Tuple tuple1 = new Tuple(SCHEMA_PEOPLE, fields1);
        Tuple tuple2 = new Tuple(SCHEMA_PEOPLE, fields2);
        Tuple tuple3 = new Tuple(SCHEMA_PEOPLE, fields3);
        Tuple tuple4 = new Tuple(SCHEMA_PEOPLE, fields4);
        Tuple tuple5 = new Tuple(SCHEMA_PEOPLE, fields5);
        Tuple tuple6 = new Tuple(SCHEMA_PEOPLE, fields6);
        return Arrays.asList(tuple1, tuple2, tuple3, tuple4, tuple5, tuple6);
    //            return Arrays.asList(tuple1);
    } catch (ParseException e) {
        // exception should not happen because we know the data is correct
        e.printStackTrace();
        return Arrays.asList();
    }
}
Also used : StringField(edu.uci.ics.textdb.api.field.StringField) TextField(edu.uci.ics.textdb.api.field.TextField) IntegerField(edu.uci.ics.textdb.api.field.IntegerField) DateField(edu.uci.ics.textdb.api.field.DateField) ParseException(java.text.ParseException) IField(edu.uci.ics.textdb.api.field.IField) SimpleDateFormat(java.text.SimpleDateFormat) DoubleField(edu.uci.ics.textdb.api.field.DoubleField)

Example 3 with IntegerField

use of edu.uci.ics.textdb.api.field.IntegerField in project textdb by TextDB.

the class TestConstants method getSamplePeopleTuples.

public static List<Tuple> getSamplePeopleTuples() {
    try {
        IField[] fields1 = { new StringField("bruce"), new StringField("john Lee"), new IntegerField(46), new DoubleField(5.50), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-14-1970")), new TextField("Tall Angry") };
        IField[] fields2 = { new StringField("tom hanks"), new StringField("cruise"), new IntegerField(45), new DoubleField(5.95), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-13-1971")), new TextField("Short Brown") };
        IField[] fields3 = { new StringField("brad lie angelina"), new StringField("pitt"), new IntegerField(44), new DoubleField(6.10), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-12-1972")), new TextField("White Angry") };
        IField[] fields4 = { new StringField("george lin lin"), new StringField("lin clooney"), new IntegerField(43), new DoubleField(6.06), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-13-1973")), new TextField("Lin Clooney is Short and lin clooney is Angry") };
        IField[] fields5 = { new StringField("christian john wayne"), new StringField("rock bale"), new IntegerField(42), new DoubleField(5.99), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-13-1974")), new TextField("Tall Fair") };
        IField[] fields6 = { new StringField("Mary brown"), new StringField("Lake Forest"), new IntegerField(42), new DoubleField(5.99), new DateField(new SimpleDateFormat("MM-dd-yyyy").parse("01-13-1974")), new TextField("Short angry") };
        Tuple tuple1 = new Tuple(SCHEMA_PEOPLE, fields1);
        Tuple tuple2 = new Tuple(SCHEMA_PEOPLE, fields2);
        Tuple tuple3 = new Tuple(SCHEMA_PEOPLE, fields3);
        Tuple tuple4 = new Tuple(SCHEMA_PEOPLE, fields4);
        Tuple tuple5 = new Tuple(SCHEMA_PEOPLE, fields5);
        Tuple tuple6 = new Tuple(SCHEMA_PEOPLE, fields6);
        return Arrays.asList(tuple1, tuple2, tuple3, tuple4, tuple5, tuple6);
    } catch (ParseException e) {
        // exception should not happen because we know the data is correct
        e.printStackTrace();
        return Arrays.asList();
    }
}
Also used : StringField(edu.uci.ics.textdb.api.field.StringField) TextField(edu.uci.ics.textdb.api.field.TextField) IntegerField(edu.uci.ics.textdb.api.field.IntegerField) DateField(edu.uci.ics.textdb.api.field.DateField) ParseException(java.text.ParseException) IField(edu.uci.ics.textdb.api.field.IField) SimpleDateFormat(java.text.SimpleDateFormat) DoubleField(edu.uci.ics.textdb.api.field.DoubleField)

Example 4 with IntegerField

use of edu.uci.ics.textdb.api.field.IntegerField in project textdb by TextDB.

the class JsonSerializationTest method testIntegerField.

@Test
public void testIntegerField() {
    IntegerField integerField = new IntegerField(100);
    JsonNode jsonNode = TestUtils.testJsonSerialization(integerField);
    Assert.assertTrue(jsonNode.get(JsonConstants.FIELD_VALUE).isInt());
}
Also used : IntegerField(edu.uci.ics.textdb.api.field.IntegerField) JsonNode(com.fasterxml.jackson.databind.JsonNode) Test(org.junit.Test)

Example 5 with IntegerField

use of edu.uci.ics.textdb.api.field.IntegerField in project textdb by TextDB.

the class SimilarityJoinTest method test1.

/*
     * Tests the Similarity Join Predicate on two similar words:
     *   Donald J. Trump
     *   Donald Trump
     * Under the condition of similarity (NormalizedLevenshtein) > 0.8, these two words should match.
     *
     */
@Test
public void test1() throws TextDBException {
    JoinTestHelper.insertToTable(NEWS_TABLE_OUTER, JoinTestConstants.getNewsTuples().get(0));
    JoinTestHelper.insertToTable(NEWS_TABLE_INNER, JoinTestConstants.getNewsTuples().get(1));
    String trumpRegex = "[Dd]onald.{1,5}[Tt]rump";
    RegexMatcher regexMatcherInner = JoinTestHelper.getRegexMatcher(JoinTestHelper.NEWS_TABLE_INNER, trumpRegex, JoinTestConstants.NEWS_BODY);
    RegexMatcher regexMatcherOuter = JoinTestHelper.getRegexMatcher(JoinTestHelper.NEWS_TABLE_OUTER, trumpRegex, JoinTestConstants.NEWS_BODY);
    SimilarityJoinPredicate similarityJoinPredicate = new SimilarityJoinPredicate(JoinTestConstants.NEWS_BODY, 0.8);
    List<Tuple> results = JoinTestHelper.getJoinDistanceResults(regexMatcherInner, regexMatcherOuter, similarityJoinPredicate, Integer.MAX_VALUE, 0);
    Schema joinInputSchema = Utils.addAttributeToSchema(JoinTestConstants.NEWS_SCHEMA, SchemaConstants.SPAN_LIST_ATTRIBUTE);
    Schema resultSchema = similarityJoinPredicate.generateOutputSchema(joinInputSchema, joinInputSchema);
    List<Span> resultSpanList = Arrays.asList(new Span("inner_" + JoinTestConstants.NEWS_BODY, 5, 20, trumpRegex, "Donald J. Trump", -1), new Span("outer_" + JoinTestConstants.NEWS_BODY, 18, 30, trumpRegex, "Donald Trump", -1));
    Tuple resultTuple = new Tuple(resultSchema, new IDField(UUID.randomUUID().toString()), new IntegerField(2), new TextField("Alternative Facts and the Costs of Trump-Branded Reality"), new TextField("When Donald J. Trump swore the presidential oath on Friday, he assumed " + "responsibility not only for the levers of government but also for one of " + "the United States’ most valuable assets, battered though it may be: its credibility. " + "The country’s sentimental reverence for truth and its jealously guarded press freedoms, " + "while never perfect, have been as important to its global standing as the strength of " + "its military and the reliability of its currency. It’s the bedrock of that " + "American exceptionalism we’ve heard so much about for so long."), new IntegerField(1), new TextField("UCI marchers protest as Trump begins his presidency"), new TextField("a few hours after Donald Trump was sworn in Friday as the nation’s 45th president, " + "a line of more than 100 UC Irvine faculty members and students took to the campus " + "in pouring rain to demonstrate their opposition to his policies on immigration and " + "other issues and urge other opponents to keep organizing during Trump’s presidency."), new ListField<>(resultSpanList));
    Assert.assertTrue(TestUtils.equals(Arrays.asList(resultTuple), results));
}
Also used : IDField(edu.uci.ics.textdb.api.field.IDField) SimilarityJoinPredicate(edu.uci.ics.textdb.exp.join.SimilarityJoinPredicate) Schema(edu.uci.ics.textdb.api.schema.Schema) TextField(edu.uci.ics.textdb.api.field.TextField) RegexMatcher(edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher) IntegerField(edu.uci.ics.textdb.api.field.IntegerField) Span(edu.uci.ics.textdb.api.span.Span) Tuple(edu.uci.ics.textdb.api.tuple.Tuple) Test(org.junit.Test)

Aggregations

IntegerField (edu.uci.ics.textdb.api.field.IntegerField)85 StringField (edu.uci.ics.textdb.api.field.StringField)81 IField (edu.uci.ics.textdb.api.field.IField)80 TextField (edu.uci.ics.textdb.api.field.TextField)80 Tuple (edu.uci.ics.textdb.api.tuple.Tuple)80 ArrayList (java.util.ArrayList)76 Test (org.junit.Test)74 DoubleField (edu.uci.ics.textdb.api.field.DoubleField)68 DateField (edu.uci.ics.textdb.api.field.DateField)64 SimpleDateFormat (java.text.SimpleDateFormat)63 Schema (edu.uci.ics.textdb.api.schema.Schema)62 Span (edu.uci.ics.textdb.api.span.Span)60 Attribute (edu.uci.ics.textdb.api.schema.Attribute)57 Dictionary (edu.uci.ics.textdb.exp.dictionarymatcher.Dictionary)24 JoinDistancePredicate (edu.uci.ics.textdb.exp.join.JoinDistancePredicate)9 KeywordMatcherSourceOperator (edu.uci.ics.textdb.exp.keywordmatcher.KeywordMatcherSourceOperator)9 ParseException (java.text.ParseException)4 JsonNode (com.fasterxml.jackson.databind.JsonNode)2 IOperator (edu.uci.ics.textdb.api.dataflow.IOperator)2 IDField (edu.uci.ics.textdb.api.field.IDField)2