Search in sources :

Example 1 with TwitterJsonConverterPredicate

use of edu.uci.ics.texera.dataflow.twitter.TwitterJsonConverterPredicate in project textdb by TextDB.

the class TwitterJsonConverterTest method getAllSampleTwitterTuples.

public static List<Tuple> getAllSampleTwitterTuples() throws Exception {
    // read the JSON file into a list of JSON string tuples
    JsonNode jsonNode = new ObjectMapper().readTree(new File(twitterFilePath));
    ArrayList<Tuple> jsonStringTupleList = new ArrayList<>();
    Schema tupleSourceSchema = new Schema(SchemaConstants._ID_ATTRIBUTE, new Attribute("twitterJson", AttributeType.STRING));
    for (JsonNode tweet : jsonNode) {
        Tuple tuple = new Tuple(tupleSourceSchema, IDField.newRandomID(), new StringField(tweet.toString()));
        jsonStringTupleList.add(tuple);
    }
    // setup the twitter converter DAG
    // TupleSource --> TwitterJsonConverter --> TupleSink
    TupleSourceOperator tupleSource = new TupleSourceOperator(jsonStringTupleList, tupleSourceSchema);
    TwitterJsonConverter twitterJsonConverter = new TwitterJsonConverterPredicate("twitterJson").newOperator();
    TupleSink tupleSink = new TupleSinkPredicate(null, null).newOperator();
    twitterJsonConverter.setInputOperator(tupleSource);
    tupleSink.setInputOperator(twitterJsonConverter);
    tupleSink.open();
    List<Tuple> tuples = tupleSink.collectAllTuples();
    tupleSink.close();
    return tuples;
}
Also used : TupleSink(edu.uci.ics.texera.dataflow.sink.tuple.TupleSink) Attribute(edu.uci.ics.texera.api.schema.Attribute) TwitterJsonConverterPredicate(edu.uci.ics.texera.dataflow.twitter.TwitterJsonConverterPredicate) Schema(edu.uci.ics.texera.api.schema.Schema) ArrayList(java.util.ArrayList) JsonNode(com.fasterxml.jackson.databind.JsonNode) TupleSourceOperator(edu.uci.ics.texera.dataflow.source.tuple.TupleSourceOperator) TwitterJsonConverter(edu.uci.ics.texera.dataflow.twitter.TwitterJsonConverter) TupleSinkPredicate(edu.uci.ics.texera.dataflow.sink.tuple.TupleSinkPredicate) StringField(edu.uci.ics.texera.api.field.StringField) File(java.io.File) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) Tuple(edu.uci.ics.texera.api.tuple.Tuple)

Example 2 with TwitterJsonConverterPredicate

use of edu.uci.ics.texera.dataflow.twitter.TwitterJsonConverterPredicate in project textdb by TextDB.

the class TwitterSample method createTwitterTable.

/**
 * A helper function to create a table and write twitter data into it.
 *
 * @param tableName
 * @param twitterJsonSourceOperator, a source operator that provides the input raw twitter JSON string tuples
 * @return
 */
public static int createTwitterTable(String tableName, ISourceOperator twitterJsonSourceOperator) {
    TwitterJsonConverter twitterJsonConverter = new TwitterJsonConverterPredicate("twitterJson").newOperator();
    TupleSink tupleSink = new TupleSinkPredicate(null, null).newOperator();
    twitterJsonConverter.setInputOperator(twitterJsonSourceOperator);
    tupleSink.setInputOperator(twitterJsonConverter);
    // open the workflow plan and get the output schema
    tupleSink.open();
    // create the table with TupleSink's output schema
    RelationManager relationManager = RelationManager.getInstance();
    if (relationManager.checkTableExistence(tableName)) {
        relationManager.deleteTable(tableName);
    }
    relationManager.createTable(tableName, Utils.getDefaultIndexDirectory().resolve(tableName), tupleSink.getOutputSchema(), LuceneAnalyzerConstants.standardAnalyzerString());
    DataWriter dataWriter = relationManager.getTableDataWriter(tableName);
    dataWriter.open();
    Tuple tuple;
    int counter = 0;
    while ((tuple = tupleSink.getNextTuple()) != null) {
        dataWriter.insertTuple(tuple);
        counter++;
    }
    dataWriter.close();
    tupleSink.close();
    return counter;
}
Also used : TupleSink(edu.uci.ics.texera.dataflow.sink.tuple.TupleSink) TwitterJsonConverterPredicate(edu.uci.ics.texera.dataflow.twitter.TwitterJsonConverterPredicate) TupleSinkPredicate(edu.uci.ics.texera.dataflow.sink.tuple.TupleSinkPredicate) TwitterJsonConverter(edu.uci.ics.texera.dataflow.twitter.TwitterJsonConverter) Tuple(edu.uci.ics.texera.api.tuple.Tuple) RelationManager(edu.uci.ics.texera.storage.RelationManager) DataWriter(edu.uci.ics.texera.storage.DataWriter)

Aggregations

Tuple (edu.uci.ics.texera.api.tuple.Tuple)2 TupleSink (edu.uci.ics.texera.dataflow.sink.tuple.TupleSink)2 TupleSinkPredicate (edu.uci.ics.texera.dataflow.sink.tuple.TupleSinkPredicate)2 TwitterJsonConverter (edu.uci.ics.texera.dataflow.twitter.TwitterJsonConverter)2 TwitterJsonConverterPredicate (edu.uci.ics.texera.dataflow.twitter.TwitterJsonConverterPredicate)2 JsonNode (com.fasterxml.jackson.databind.JsonNode)1 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)1 StringField (edu.uci.ics.texera.api.field.StringField)1 Attribute (edu.uci.ics.texera.api.schema.Attribute)1 Schema (edu.uci.ics.texera.api.schema.Schema)1 TupleSourceOperator (edu.uci.ics.texera.dataflow.source.tuple.TupleSourceOperator)1 DataWriter (edu.uci.ics.texera.storage.DataWriter)1 RelationManager (edu.uci.ics.texera.storage.RelationManager)1 File (java.io.File)1 ArrayList (java.util.ArrayList)1