Search in sources :

Example 16 with DataFlowException

use of edu.uci.ics.textdb.api.exception.DataFlowException in project textdb by TextDB.

the class KeywordMatcher method computeConjunctionMatchingResult.

private List<Span> computeConjunctionMatchingResult(Tuple inputTuple) throws DataFlowException {
    ListField<Span> payloadField = inputTuple.getField(SchemaConstants.PAYLOAD);
    List<Span> payload = payloadField.getValue();
    List<Span> relevantSpans = filterRelevantSpans(payload);
    List<Span> matchingResults = new ArrayList<>();
    for (String attributeName : this.predicate.getAttributeNames()) {
        AttributeType attributeType = this.inputSchema.getAttribute(attributeName).getAttributeType();
        String fieldValue = inputTuple.getField(attributeName).getValue().toString();
        // types other than TEXT and STRING: throw Exception for now
        if (attributeType != AttributeType.STRING && attributeType != AttributeType.TEXT) {
            throw new DataFlowException("KeywordMatcher: Fields other than STRING and TEXT are not supported yet");
        }
        // for STRING type, the query should match the fieldValue completely
        if (attributeType == AttributeType.STRING) {
            if (fieldValue.equals(predicate.getQuery())) {
                Span span = new Span(attributeName, 0, predicate.getQuery().length(), predicate.getQuery(), fieldValue);
                matchingResults.add(span);
            }
        }
        // list for this field
        if (attributeType == AttributeType.TEXT) {
            List<Span> fieldSpanList = relevantSpans.stream().filter(span -> span.getAttributeName().equals(attributeName)).collect(Collectors.toList());
            if (isAllQueryTokensPresent(fieldSpanList, queryTokenSet)) {
                matchingResults.addAll(fieldSpanList);
            }
        }
    }
    return matchingResults;
}
Also used : SchemaConstants(edu.uci.ics.textdb.api.constants.SchemaConstants) Attribute(edu.uci.ics.textdb.api.schema.Attribute) Iterator(java.util.Iterator) ErrorMessages(edu.uci.ics.textdb.api.constants.ErrorMessages) AbstractSingleInputOperator(edu.uci.ics.textdb.exp.common.AbstractSingleInputOperator) DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException) Set(java.util.Set) Utils(edu.uci.ics.textdb.api.utils.Utils) Collectors(java.util.stream.Collectors) ArrayList(java.util.ArrayList) AttributeType(edu.uci.ics.textdb.api.schema.AttributeType) HashSet(java.util.HashSet) Schema(edu.uci.ics.textdb.api.schema.Schema) List(java.util.List) ListField(edu.uci.ics.textdb.api.field.ListField) Matcher(java.util.regex.Matcher) TextDBException(edu.uci.ics.textdb.api.exception.TextDBException) Pattern(java.util.regex.Pattern) Span(edu.uci.ics.textdb.api.span.Span) Collections(java.util.Collections) DataflowUtils(edu.uci.ics.textdb.exp.utils.DataflowUtils) Tuple(edu.uci.ics.textdb.api.tuple.Tuple) AttributeType(edu.uci.ics.textdb.api.schema.AttributeType) ArrayList(java.util.ArrayList) DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException) Span(edu.uci.ics.textdb.api.span.Span)

Example 17 with DataFlowException

use of edu.uci.ics.textdb.api.exception.DataFlowException in project textdb by TextDB.

the class AbstractSingleInputOperator method open.

@Override
public void open() throws TextDBException {
    if (cursor != CLOSED) {
        return;
    }
    try {
        if (this.inputOperator == null) {
            throw new DataFlowException(ErrorMessages.INPUT_OPERATOR_NOT_SPECIFIED);
        }
        inputOperator.open();
        setUp();
    } catch (Exception e) {
        throw new DataFlowException(e.getMessage(), e);
    }
    cursor = OPENED;
}
Also used : DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException) TextDBException(edu.uci.ics.textdb.api.exception.TextDBException) DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException)

Example 18 with DataFlowException

use of edu.uci.ics.textdb.api.exception.DataFlowException in project textdb by TextDB.

the class ComparableMatcher method compareInt.

private boolean compareInt(Tuple inputTuple) {
    Object compareToObject = predicate.getCompareToValue();
    Class<?> compareToType = compareToObject.getClass();
    Integer value = inputTuple.getField(predicate.getAttributeName(), IntegerField.class).getValue();
    if (compareToType.equals(Integer.class)) {
        return compareValues(value, (int) compareToObject, predicate.getComparisonType());
    } else if (compareToType.equals(Double.class)) {
        return compareValues((double) value, (double) compareToObject, predicate.getComparisonType());
    } else if (compareToType.equals(String.class)) {
        try {
            Double compareToValue = Double.parseDouble((String) predicate.getCompareToValue());
            return compareValues((double) value, compareToValue, predicate.getComparisonType());
        } catch (NumberFormatException e) {
            throw new DataFlowException("Unable to parse to number " + e.getMessage());
        }
    } else {
        throw new DataFlowException("Value " + predicate.getCompareToValue() + " is not a valid number type");
    }
}
Also used : DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException) IntegerField(edu.uci.ics.textdb.api.field.IntegerField)

Example 19 with DataFlowException

use of edu.uci.ics.textdb.api.exception.DataFlowException in project textdb by TextDB.

the class RunTests method main.

/*
     * Write Indices Run all performance tests.
     * 
     * Passed in below arguments: 
     * file folder path (where data set stored)
     * result folder path (where performance test results stored) 
     * standard index folder path (where standard index stored) 
     * trigram index folder path(where trigram index stored) 
     * queries folder path (where query files stored)
     * 
     * If above arguments are not passed in, default paths will be used (refer
     * to PerfTestUtils.java) If some of the arguments are not applicable,
     * define them as empty string.
     * 
     * Make necessary changes for arguments, such as query file name, threshold
     * list, and regexQueries
     *
     */
public static void main(String[] args) {
    try {
        PerfTestUtils.setFileFolder(args[0]);
        PerfTestUtils.setResultFolder(args[1]);
        PerfTestUtils.setStandardIndexFolder(args[2]);
        PerfTestUtils.setTrigramIndexFolder(args[3]);
        PerfTestUtils.setQueryFolder(args[4]);
    } catch (ArrayIndexOutOfBoundsException e) {
        System.out.println("missing arguments will be set to default");
    }
    try {
        PerfTestUtils.deleteDirectory(new File(PerfTestUtils.standardIndexFolder));
        PerfTestUtils.deleteDirectory(new File(PerfTestUtils.trigramIndexFolder));
        PerfTestUtils.writeStandardAnalyzerIndices();
        PerfTestUtils.writeTrigramIndices();
        List<Double> thresholds = Arrays.asList(0.8, 0.65, 0.5, 0.35);
        List<String> regexQueries = Arrays.asList("mosquitos?", "v[ir]{2}[us]{2}", "market(ing)?", "medic(ine|al|ation|are|aid)?", "[A-Z][aeiou|AEIOU][A-Za-z]*");
        KeywordMatcherPerformanceTest.runTest("sample_queries.txt");
        DictionaryMatcherPerformanceTest.runTest("sample_queries.txt");
        FuzzyTokenMatcherPerformanceTest.runTest("sample_queries.txt", thresholds);
        RegexMatcherPerformanceTest.runTest(regexQueries);
        NlpExtractorPerformanceTest.runTest();
    } catch (StorageException | DataFlowException | IOException e) {
        e.printStackTrace();
    } catch (Exception e) {
        e.printStackTrace();
    }
}
Also used : DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException) IOException(java.io.IOException) File(java.io.File) StorageException(edu.uci.ics.textdb.api.exception.StorageException) DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException) StorageException(edu.uci.ics.textdb.api.exception.StorageException) IOException(java.io.IOException)

Example 20 with DataFlowException

use of edu.uci.ics.textdb.api.exception.DataFlowException in project textdb by TextDB.

the class JoinTestHelper method getRegexMatcher.

public static RegexMatcher getRegexMatcher(String tableName, String query, String attrName) {
    try {
        ScanBasedSourceOperator scanBasedSourceOperator = new ScanBasedSourceOperator(new ScanSourcePredicate(tableName));
        RegexMatcher regexMatcher = new RegexMatcher(new RegexPredicate(query, Arrays.asList(attrName), SchemaConstants.SPAN_LIST));
        regexMatcher.setInputOperator(scanBasedSourceOperator);
        return regexMatcher;
    } catch (DataFlowException e) {
        e.printStackTrace();
        return null;
    }
}
Also used : RegexPredicate(edu.uci.ics.textdb.exp.regexmatcher.RegexPredicate) DataFlowException(edu.uci.ics.textdb.api.exception.DataFlowException) RegexMatcher(edu.uci.ics.textdb.exp.regexmatcher.RegexMatcher) ScanBasedSourceOperator(edu.uci.ics.textdb.exp.source.scan.ScanBasedSourceOperator) ScanSourcePredicate(edu.uci.ics.textdb.exp.source.scan.ScanSourcePredicate)

Aggregations

DataFlowException (edu.uci.ics.textdb.api.exception.DataFlowException)34 TextDBException (edu.uci.ics.textdb.api.exception.TextDBException)13 AttributeType (edu.uci.ics.textdb.api.schema.AttributeType)12 Schema (edu.uci.ics.textdb.api.schema.Schema)11 Tuple (edu.uci.ics.textdb.api.tuple.Tuple)10 Attribute (edu.uci.ics.textdb.api.schema.Attribute)8 Span (edu.uci.ics.textdb.api.span.Span)7 ArrayList (java.util.ArrayList)7 SchemaConstants (edu.uci.ics.textdb.api.constants.SchemaConstants)6 List (java.util.List)6 Collectors (java.util.stream.Collectors)6 StorageException (edu.uci.ics.textdb.api.exception.StorageException)5 ListField (edu.uci.ics.textdb.api.field.ListField)5 IOException (java.io.IOException)5 IField (edu.uci.ics.textdb.api.field.IField)4 Utils (edu.uci.ics.textdb.api.utils.Utils)4 AbstractSingleInputOperator (edu.uci.ics.textdb.exp.common.AbstractSingleInputOperator)4 Iterator (java.util.Iterator)4 ErrorMessages (edu.uci.ics.textdb.api.constants.ErrorMessages)3 IOperator (edu.uci.ics.textdb.api.dataflow.IOperator)3