Search in sources :

Example 1 with StringValue

use of org.wikidata.wdtk.datamodel.interfaces.StringValue in project OpenRefine by OpenRefine.

the class LaxValueMatcherTests method testUrls.

@Test
public void testUrls() {
    StringValue value1 = Datamodel.makeStringValue("https://gnu.org");
    StringValue value2 = Datamodel.makeStringValue("http://gnu.org/");
    StringValue value3 = Datamodel.makeStringValue("http://gnu.org/page");
    assertTrue(SUT.match(value1, value2));
    assertTrue(SUT.match(value1, value1));
    assertFalse(SUT.match(value2, value3));
}
Also used : StringValue(org.wikidata.wdtk.datamodel.interfaces.StringValue) Test(org.testng.annotations.Test)

Example 2 with StringValue

use of org.wikidata.wdtk.datamodel.interfaces.StringValue in project OpenRefine by OpenRefine.

the class FormatScrutinizer method scrutinize.

@Override
public void scrutinize(Snak snak, EntityIdValue entityId, boolean added) {
    if (snak instanceof ValueSnak && ((ValueSnak) snak).getValue() instanceof StringValue) {
        String value = ((StringValue) ((ValueSnak) snak).getValue()).getString();
        PropertyIdValue pid = snak.getPropertyId();
        Set<Pattern> patterns = getPattern(pid);
        for (Pattern pattern : patterns) {
            if (!pattern.matcher(value).matches()) {
                if (added) {
                    QAWarning issue = new QAWarning(type, pid.getId(), QAWarning.Severity.IMPORTANT, 1);
                    issue.setProperty("property_entity", pid);
                    issue.setProperty("regex", pattern.toString());
                    issue.setProperty("example_value", value);
                    issue.setProperty("example_item_entity", entityId);
                    addIssue(issue);
                } else {
                    info("remove-statements-with-invalid-format");
                }
            }
        }
    }
}
Also used : PropertyIdValue(org.wikidata.wdtk.datamodel.interfaces.PropertyIdValue) Pattern(java.util.regex.Pattern) ValueSnak(org.wikidata.wdtk.datamodel.interfaces.ValueSnak) StringValue(org.wikidata.wdtk.datamodel.interfaces.StringValue) QAWarning(org.openrefine.wikidata.qa.QAWarning)

Example 3 with StringValue

use of org.wikidata.wdtk.datamodel.interfaces.StringValue in project OpenRefine by OpenRefine.

the class WbQuantityExpr method evaluate.

@Override
public QuantityValue evaluate(ExpressionContext ctxt) throws SkipSchemaExpressionException {
    StringValue amount = getAmountExpr().evaluate(ctxt);
    // we know the amount is nonnull, nonempty here
    BigDecimal parsedAmount = null;
    BigDecimal lowerBound = null;
    BigDecimal upperBound = null;
    String originalAmount = amount.getString().toUpperCase();
    try {
        parsedAmount = new BigDecimal(originalAmount);
        if (originalAmount.contains("E")) {
            // engineering notation: we derive the precision from
            // the expression (feature!)
            BigDecimal uncertainty = new BigDecimal("0.5").scaleByPowerOfTen(-parsedAmount.scale());
            lowerBound = new BigDecimal(parsedAmount.subtract(uncertainty).toPlainString());
            upperBound = new BigDecimal(parsedAmount.add(uncertainty).toPlainString());
        }
        // workaround for https://github.com/Wikidata/Wikidata-Toolkit/issues/341
        parsedAmount = new BigDecimal(parsedAmount.toPlainString());
    } catch (NumberFormatException e) {
        if (!originalAmount.isEmpty()) {
            QAWarning issue = new QAWarning("ignored-amount", null, QAWarning.Severity.WARNING, 1);
            issue.setProperty("example_value", originalAmount);
            ctxt.addWarning(issue);
        }
        throw new SkipSchemaExpressionException();
    }
    if (getUnitExpr() != null) {
        ItemIdValue unit = getUnitExpr().evaluate(ctxt);
        return Datamodel.makeQuantityValue(parsedAmount, lowerBound, upperBound, unit);
    }
    return Datamodel.makeQuantityValue(parsedAmount, lowerBound, upperBound);
}
Also used : ItemIdValue(org.wikidata.wdtk.datamodel.interfaces.ItemIdValue) SkipSchemaExpressionException(org.openrefine.wikidata.schema.exceptions.SkipSchemaExpressionException) StringValue(org.wikidata.wdtk.datamodel.interfaces.StringValue) QAWarning(org.openrefine.wikidata.qa.QAWarning) BigDecimal(java.math.BigDecimal)

Example 4 with StringValue

use of org.wikidata.wdtk.datamodel.interfaces.StringValue in project OpenRefine by OpenRefine.

the class LaxValueMatcher method match.

@Override
public boolean match(Value existing, Value added) {
    if (existing instanceof EntityIdValue && added instanceof EntityIdValue) {
        // to mix up entities from different Wikibases in the same data slot
        return ((EntityIdValue) existing).getId().equals(((EntityIdValue) added).getId());
    } else if (existing instanceof StringValue && added instanceof StringValue) {
        // disregard trailing whitespace differences
        String existingStr = ((StringValue) existing).getString().trim();
        String addedStr = ((StringValue) added).getString().trim();
        // if they look like URLs, then http(s) and trailing slashes do not matter
        try {
            URI existingUrl = extraURINormalize(new URI(existingStr).normalize());
            URI addedUrl = extraURINormalize(new URI(addedStr).normalize());
            return existingUrl.equals(addedUrl);
        } catch (URISyntaxException e) {
            // fall back on basic comparison
            ;
        }
        return existingStr.equals(addedStr);
    } else if (existing instanceof MonolingualTextValue && added instanceof MonolingualTextValue) {
        // ignore differences of trailing whitespace
        MonolingualTextValue existingMTV = (MonolingualTextValue) existing;
        MonolingualTextValue addedMTV = (MonolingualTextValue) added;
        return (existingMTV.getLanguageCode().equals(addedMTV.getLanguageCode()) && existingMTV.getText().trim().equals(addedMTV.getText().trim()));
    } else if (existing instanceof QuantityValue && added instanceof QuantityValue) {
        QuantityValue existingQuantity = (QuantityValue) existing;
        QuantityValue addedQuantity = (QuantityValue) added;
        BigDecimal existingLowerBound = existingQuantity.getLowerBound();
        BigDecimal addedLowerBound = addedQuantity.getLowerBound();
        BigDecimal existingUpperBound = existingQuantity.getUpperBound();
        BigDecimal addedUpperBound = addedQuantity.getUpperBound();
        // artificially set bounds for quantities which have neither lower nor upper bounds
        if (existingLowerBound == null && existingUpperBound == null) {
            existingLowerBound = existingQuantity.getNumericValue();
            existingUpperBound = existingQuantity.getNumericValue();
        }
        if (addedLowerBound == null && addedUpperBound == null) {
            addedLowerBound = addedQuantity.getNumericValue();
            addedUpperBound = addedQuantity.getNumericValue();
        }
        if (existingQuantity.getUnit().equals(addedQuantity.getUnit()) && (existingLowerBound != null) && (addedLowerBound != null) && (existingUpperBound != null) && (addedUpperBound != null)) {
            // Consider the two values to be equal when their confidence interval overlaps
            return ((existingLowerBound.compareTo(addedLowerBound) <= 0 && addedLowerBound.compareTo(existingUpperBound) <= 0) || (addedLowerBound.compareTo(existingLowerBound) <= 0 && existingLowerBound.compareTo(addedUpperBound) <= 0));
        }
    } else if (existing instanceof GlobeCoordinatesValue && added instanceof GlobeCoordinatesValue) {
        GlobeCoordinatesValue addedCoords = (GlobeCoordinatesValue) added;
        GlobeCoordinatesValue existingCoords = (GlobeCoordinatesValue) existing;
        if (!addedCoords.getGlobeItemId().getId().equals(existingCoords.getGlobeItemId().getId())) {
            return false;
        }
        double addedMinLon = addedCoords.getLongitude() - addedCoords.getPrecision();
        double addedMaxLon = addedCoords.getLongitude() + addedCoords.getPrecision();
        double addedMinLat = addedCoords.getLatitude() - addedCoords.getPrecision();
        double addedMaxLat = addedCoords.getLatitude() + addedCoords.getPrecision();
        double existingMinLon = existingCoords.getLongitude() - existingCoords.getPrecision();
        double existingMaxLon = existingCoords.getLongitude() + existingCoords.getPrecision();
        double existingMinLat = existingCoords.getLatitude() - existingCoords.getPrecision();
        double existingMaxLat = existingCoords.getLatitude() + existingCoords.getPrecision();
        // return true when the two "rectangles" (in coordinate space) overlap (not strictly)
        return ((addedMinLon <= existingMinLon && addedMinLat <= existingMinLat && existingMinLon <= addedMaxLon && existingMinLat <= addedMaxLat) || (existingMinLon <= addedMinLon && existingMinLat <= addedMinLat && addedMinLon <= existingMaxLon && addedMinLat <= existingMaxLat));
    } else if (existing instanceof TimeValue && added instanceof TimeValue) {
        TimeValue existingTime = (TimeValue) existing;
        TimeValue addedTime = (TimeValue) added;
        if (!existingTime.getPreferredCalendarModel().equals(addedTime.getPreferredCalendarModel())) {
            return false;
        }
        int minPrecision = Math.min(existingTime.getPrecision(), addedTime.getPrecision());
        if (minPrecision <= 9) {
            // the precision is a multiple of years
            long yearPrecision = (long) Math.pow(10, 9 - minPrecision);
            long addedValue = addedTime.getYear() / yearPrecision;
            long existingValue = existingTime.getYear() / yearPrecision;
            return addedValue == existingValue;
        } else if (minPrecision == 10) {
            // month precision
            return (addedTime.getYear() == existingTime.getYear() && addedTime.getMonth() == existingTime.getMonth());
        } else if (minPrecision == 11) {
            // day precision
            return (addedTime.getYear() == existingTime.getYear() && addedTime.getMonth() == existingTime.getMonth() && addedTime.getDay() == existingTime.getDay());
        }
    // TODO possible improvements: bounds support, timezone support
    }
    // fall back to exact comparison for other datatypes
    return existing.equals(added);
}
Also used : GlobeCoordinatesValue(org.wikidata.wdtk.datamodel.interfaces.GlobeCoordinatesValue) QuantityValue(org.wikidata.wdtk.datamodel.interfaces.QuantityValue) EntityIdValue(org.wikidata.wdtk.datamodel.interfaces.EntityIdValue) MonolingualTextValue(org.wikidata.wdtk.datamodel.interfaces.MonolingualTextValue) URISyntaxException(java.net.URISyntaxException) StringValue(org.wikidata.wdtk.datamodel.interfaces.StringValue) URI(java.net.URI) BigDecimal(java.math.BigDecimal) TimeValue(org.wikidata.wdtk.datamodel.interfaces.TimeValue)

Example 5 with StringValue

use of org.wikidata.wdtk.datamodel.interfaces.StringValue in project OpenRefine by OpenRefine.

the class LaxValueMatcherTests method testWhitespace.

@Test
public void testWhitespace() {
    StringValue value1 = Datamodel.makeStringValue("foo");
    StringValue value2 = Datamodel.makeStringValue("\tfoo ");
    StringValue value3 = Datamodel.makeStringValue("bar");
    assertTrue(SUT.match(value1, value2));
    assertFalse(SUT.match(value1, value3));
}
Also used : StringValue(org.wikidata.wdtk.datamodel.interfaces.StringValue) Test(org.testng.annotations.Test)

Aggregations

StringValue (org.wikidata.wdtk.datamodel.interfaces.StringValue)5 BigDecimal (java.math.BigDecimal)2 QAWarning (org.openrefine.wikidata.qa.QAWarning)2 Test (org.testng.annotations.Test)2 URI (java.net.URI)1 URISyntaxException (java.net.URISyntaxException)1 Pattern (java.util.regex.Pattern)1 SkipSchemaExpressionException (org.openrefine.wikidata.schema.exceptions.SkipSchemaExpressionException)1 EntityIdValue (org.wikidata.wdtk.datamodel.interfaces.EntityIdValue)1 GlobeCoordinatesValue (org.wikidata.wdtk.datamodel.interfaces.GlobeCoordinatesValue)1 ItemIdValue (org.wikidata.wdtk.datamodel.interfaces.ItemIdValue)1 MonolingualTextValue (org.wikidata.wdtk.datamodel.interfaces.MonolingualTextValue)1 PropertyIdValue (org.wikidata.wdtk.datamodel.interfaces.PropertyIdValue)1 QuantityValue (org.wikidata.wdtk.datamodel.interfaces.QuantityValue)1 TimeValue (org.wikidata.wdtk.datamodel.interfaces.TimeValue)1 ValueSnak (org.wikidata.wdtk.datamodel.interfaces.ValueSnak)1