Search in sources :

Example 26 with DataFlowException

use of edu.uci.ics.textdb.api.exception.DataFlowException in project textdb by TextDB.

the class ExcelSink method close.

@Override
public void close() throws TextDBException {
    if (cursor == CLOSED) {
        return;
    }
    inputOperator.close();
    try {
        wb.write(fileOut);
        fileOut.close();
        cursor = CLOSED;
    } catch (IOException e) {
        throw new DataFlowException(e);
    }
}
Also used : DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException) IOException(java.io.IOException)

Example 27 with DataFlowException

use of edu.uci.ics.textdb.api.exception.DataFlowException in project textdb by TextDB.

the class KeywordMatcher method computeSubstringMatchingResult.

private List<Span> computeSubstringMatchingResult(Tuple inputTuple) throws DataFlowException {
    List<Span> matchingResults = new ArrayList<>();
    for (String attributeName : this.predicate.getAttributeNames()) {
        AttributeType attributeType = this.inputSchema.getAttribute(attributeName).getAttributeType();
        String fieldValue = inputTuple.getField(attributeName).getValue().toString();
        // types other than TEXT and STRING: throw Exception for now
        if (attributeType != AttributeType.STRING && attributeType != AttributeType.TEXT) {
            throw new DataFlowException("KeywordMatcher: Fields other than STRING and TEXT are not supported yet");
        }
        // for STRING type, the query should match the fieldValue completely
        if (attributeType == AttributeType.STRING) {
            if (fieldValue.equals(predicate.getQuery())) {
                matchingResults.add(new Span(attributeName, 0, predicate.getQuery().length(), predicate.getQuery(), fieldValue));
            }
        }
        if (attributeType == AttributeType.TEXT) {
            String regex = predicate.getQuery().toLowerCase();
            Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
            Matcher matcher = pattern.matcher(fieldValue.toLowerCase());
            while (matcher.find()) {
                int start = matcher.start();
                int end = matcher.end();
                matchingResults.add(new Span(attributeName, start, end, predicate.getQuery(), fieldValue.substring(start, end)));
            }
        }
    }
    return matchingResults;
}
Also used : Pattern(java.util.regex.Pattern) Matcher(java.util.regex.Matcher) AttributeType(edu.uci.ics.textdb.api.schema.AttributeType) ArrayList(java.util.ArrayList) DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException) Span(edu.uci.ics.textdb.api.span.Span)

Example 28 with DataFlowException

use of edu.uci.ics.textdb.api.exception.DataFlowException in project textdb by TextDB.

the class KeywordMatcherSourceOperator method buildPhraseQuery.

private Query buildPhraseQuery() throws DataFlowException {
    BooleanQuery.Builder booleanQueryBuilder = new BooleanQuery.Builder();
    for (String attributeName : this.predicate.getAttributeNames()) {
        AttributeType attributeType = this.inputSchema.getAttribute(attributeName).getAttributeType();
        // types other than TEXT and STRING: throw Exception for now
        if (attributeType != AttributeType.STRING && attributeType != AttributeType.TEXT) {
            throw new DataFlowException("KeywordPredicate: Fields other than STRING and TEXT are not supported yet");
        }
        if (attributeType == AttributeType.STRING) {
            Query termQuery = new TermQuery(new Term(attributeName, predicate.getQuery()));
            booleanQueryBuilder.add(termQuery, BooleanClause.Occur.SHOULD);
        }
        if (attributeType == AttributeType.TEXT) {
            if (queryTokenList.size() == 1) {
                Query termQuery = new TermQuery(new Term(attributeName, predicate.getQuery().toLowerCase()));
                booleanQueryBuilder.add(termQuery, BooleanClause.Occur.SHOULD);
            } else {
                PhraseQuery.Builder phraseQueryBuilder = new PhraseQuery.Builder();
                for (int i = 0; i < queryTokensWithStopwords.size(); i++) {
                    if (!StandardAnalyzer.STOP_WORDS_SET.contains(queryTokensWithStopwords.get(i))) {
                        phraseQueryBuilder.add(new Term(attributeName, queryTokensWithStopwords.get(i).toLowerCase()), i);
                    }
                }
                PhraseQuery phraseQuery = phraseQueryBuilder.build();
                booleanQueryBuilder.add(phraseQuery, BooleanClause.Occur.SHOULD);
            }
        }
    }
    return booleanQueryBuilder.build();
}
Also used : BooleanQuery(org.apache.lucene.search.BooleanQuery) TermQuery(org.apache.lucene.search.TermQuery) Query(org.apache.lucene.search.Query) PhraseQuery(org.apache.lucene.search.PhraseQuery) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) TermQuery(org.apache.lucene.search.TermQuery) BooleanQuery(org.apache.lucene.search.BooleanQuery) PhraseQuery(org.apache.lucene.search.PhraseQuery) AttributeType(edu.uci.ics.textdb.api.schema.AttributeType) DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException) Term(org.apache.lucene.index.Term)

Example 29 with DataFlowException

use of edu.uci.ics.textdb.api.exception.DataFlowException in project textdb by TextDB.

the class Join method open.

@Override
public void open() throws TextDBException {
    if (cursor != CLOSED) {
        return;
    }
    if (innerOperator == null) {
        throw new DataFlowException("Inner Input Operator is not set.");
    }
    if (outerOperator == null) {
        throw new DataFlowException("Outer Input Operator is not set.");
    }
    // generate output schema from schema of inner and outer operator
    innerOperator.open();
    Schema innerOperatorSchema = innerOperator.getOutputSchema();
    outerOperator.open();
    Schema outerOperatorSchema = outerOperator.getOutputSchema();
    this.outputSchema = joinPredicate.generateOutputSchema(innerOperatorSchema, outerOperatorSchema);
    cursor = OPENED;
}
Also used : Schema(edu.uci.ics.textdb.api.schema.Schema) DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException)

Example 30 with DataFlowException

use of edu.uci.ics.textdb.api.exception.DataFlowException in project textdb by TextDB.

the class RelationManager method getTableAnalyzer.

/**
     * Gets the Lucene analyzer of a table.
     *   
     * @param tableName, the name of the table, case insensitive
     * @return
     * @throws StorageException
     */
public Analyzer getTableAnalyzer(String tableName) throws StorageException {
    String analyzerString = getTableAnalyzerString(tableName);
    // convert a lucene analyzer string to an analyzer object
    Analyzer luceneAnalyzer = null;
    try {
        luceneAnalyzer = LuceneAnalyzerConstants.getLuceneAnalyzer(analyzerString);
    } catch (DataFlowException e) {
        throw new StorageException(e);
    }
    return luceneAnalyzer;
}
Also used : DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException) Analyzer(org.apache.lucene.analysis.Analyzer) StorageException(edu.uci.ics.textdb.api.exception.StorageException)

Aggregations

DataFlowException (edu.uci.ics.textdb.api.exception.DataFlowException)34 TextDBException (edu.uci.ics.textdb.api.exception.TextDBException)13 AttributeType (edu.uci.ics.textdb.api.schema.AttributeType)12 Schema (edu.uci.ics.textdb.api.schema.Schema)11 Tuple (edu.uci.ics.textdb.api.tuple.Tuple)10 Attribute (edu.uci.ics.textdb.api.schema.Attribute)8 Span (edu.uci.ics.textdb.api.span.Span)7 ArrayList (java.util.ArrayList)7 SchemaConstants (edu.uci.ics.textdb.api.constants.SchemaConstants)6 List (java.util.List)6 Collectors (java.util.stream.Collectors)6 StorageException (edu.uci.ics.textdb.api.exception.StorageException)5 ListField (edu.uci.ics.textdb.api.field.ListField)5 IOException (java.io.IOException)5 IField (edu.uci.ics.textdb.api.field.IField)4 Utils (edu.uci.ics.textdb.api.utils.Utils)4 AbstractSingleInputOperator (edu.uci.ics.textdb.exp.common.AbstractSingleInputOperator)4 Iterator (java.util.Iterator)4 ErrorMessages (edu.uci.ics.textdb.api.constants.ErrorMessages)3 IOperator (edu.uci.ics.textdb.api.dataflow.IOperator)3