use of edu.uci.ics.texera.dataflow.sink.tuple.TupleSinkPredicate in project textdb by TextDB.
the class TwitterJsonConverterTest method getAllSampleTwitterTuples.
public static List<Tuple> getAllSampleTwitterTuples() throws Exception {
// read the JSON file into a list of JSON string tuples
JsonNode jsonNode = new ObjectMapper().readTree(new File(twitterFilePath));
ArrayList<Tuple> jsonStringTupleList = new ArrayList<>();
Schema tupleSourceSchema = new Schema(SchemaConstants._ID_ATTRIBUTE, new Attribute("twitterJson", AttributeType.STRING));
for (JsonNode tweet : jsonNode) {
Tuple tuple = new Tuple(tupleSourceSchema, IDField.newRandomID(), new StringField(tweet.toString()));
jsonStringTupleList.add(tuple);
}
// setup the twitter converter DAG
// TupleSource --> TwitterJsonConverter --> TupleSink
TupleSourceOperator tupleSource = new TupleSourceOperator(jsonStringTupleList, tupleSourceSchema);
TwitterJsonConverter twitterJsonConverter = new TwitterJsonConverterPredicate("twitterJson").newOperator();
TupleSink tupleSink = new TupleSinkPredicate(null, null).newOperator();
twitterJsonConverter.setInputOperator(tupleSource);
tupleSink.setInputOperator(twitterJsonConverter);
tupleSink.open();
List<Tuple> tuples = tupleSink.collectAllTuples();
tupleSink.close();
return tuples;
}
use of edu.uci.ics.texera.dataflow.sink.tuple.TupleSinkPredicate in project textdb by TextDB.
the class TwitterSample method createTwitterTable.
/**
* A helper function to create a table and write twitter data into it.
*
* @param tableName
* @param twitterJsonSourceOperator, a source operator that provides the input raw twitter JSON string tuples
* @return
*/
public static int createTwitterTable(String tableName, ISourceOperator twitterJsonSourceOperator) {
TwitterJsonConverter twitterJsonConverter = new TwitterJsonConverterPredicate("twitterJson").newOperator();
TupleSink tupleSink = new TupleSinkPredicate(null, null).newOperator();
twitterJsonConverter.setInputOperator(twitterJsonSourceOperator);
tupleSink.setInputOperator(twitterJsonConverter);
// open the workflow plan and get the output schema
tupleSink.open();
// create the table with TupleSink's output schema
RelationManager relationManager = RelationManager.getInstance();
if (relationManager.checkTableExistence(tableName)) {
relationManager.deleteTable(tableName);
}
relationManager.createTable(tableName, Utils.getDefaultIndexDirectory().resolve(tableName), tupleSink.getOutputSchema(), LuceneAnalyzerConstants.standardAnalyzerString());
DataWriter dataWriter = relationManager.getTableDataWriter(tableName);
dataWriter.open();
Tuple tuple;
int counter = 0;
while ((tuple = tupleSink.getNextTuple()) != null) {
dataWriter.insertTuple(tuple);
counter++;
}
dataWriter.close();
tupleSink.close();
return counter;
}
use of edu.uci.ics.texera.dataflow.sink.tuple.TupleSinkPredicate in project textdb by TextDB.
the class PredicateBaseTest method testTupleSink.
@Test
public void testTupleSink() throws Exception {
TupleSinkPredicate tupleSinkPredicate = new TupleSinkPredicate();
testPredicate(tupleSinkPredicate);
}
Aggregations