Search in sources :

Example 6 with UnicodeSet

use of com.ibm.icu.text.UnicodeSet in project antlr4 by antlr.

the class UnicodeDataTemplateController method addIntPropertyRanges.

private static void addIntPropertyRanges(int property, String namePrefix, Map<String, IntervalSet> propertyCodePointRanges) {
    for (int propertyValue = UCharacter.getIntPropertyMinValue(property); propertyValue <= UCharacter.getIntPropertyMaxValue(property); propertyValue++) {
        UnicodeSet set = new UnicodeSet();
        set.applyIntPropertyValue(property, propertyValue);
        String propertyName = namePrefix + UCharacter.getPropertyValueName(property, propertyValue, UProperty.NameChoice.SHORT);
        IntervalSet intervalSet = propertyCodePointRanges.get(propertyName);
        if (intervalSet == null) {
            intervalSet = new IntervalSet();
            propertyCodePointRanges.put(propertyName, intervalSet);
        }
        addUnicodeSetToIntervalSet(set, intervalSet);
    }
}
Also used : IntervalSet(org.antlr.v4.runtime.misc.IntervalSet) UnicodeSet(com.ibm.icu.text.UnicodeSet)

Example 7 with UnicodeSet

use of com.ibm.icu.text.UnicodeSet in project antlr4 by antlr.

the class UnicodeDataTemplateController method addEmojiPresentationPropertyCodesToCodePointRanges.

private static void addEmojiPresentationPropertyCodesToCodePointRanges(Map<String, IntervalSet> propertyCodePointRanges) {
    UnicodeSet emojiDefaultUnicodeSet = new UnicodeSet("[[\\p{Emoji=Yes}]&[\\p{Emoji_Presentation=Yes}]]");
    IntervalSet emojiDefaultIntervalSet = new IntervalSet();
    addUnicodeSetToIntervalSet(emojiDefaultUnicodeSet, emojiDefaultIntervalSet);
    propertyCodePointRanges.put("EmojiPresentation=EmojiDefault", emojiDefaultIntervalSet);
    UnicodeSet textDefaultUnicodeSet = new UnicodeSet("[[\\p{Emoji=Yes}]&[\\p{Emoji_Presentation=No}]]");
    IntervalSet textDefaultIntervalSet = new IntervalSet();
    addUnicodeSetToIntervalSet(textDefaultUnicodeSet, textDefaultIntervalSet);
    propertyCodePointRanges.put("EmojiPresentation=TextDefault", textDefaultIntervalSet);
    UnicodeSet textUnicodeSet = new UnicodeSet("[\\p{Emoji=No}]");
    IntervalSet textIntervalSet = new IntervalSet();
    addUnicodeSetToIntervalSet(textUnicodeSet, textIntervalSet);
    propertyCodePointRanges.put("EmojiPresentation=Text", textIntervalSet);
}
Also used : IntervalSet(org.antlr.v4.runtime.misc.IntervalSet) UnicodeSet(com.ibm.icu.text.UnicodeSet)

Example 8 with UnicodeSet

use of com.ibm.icu.text.UnicodeSet in project lucene-solr by apache.

the class TestICUTransformFilter method testOptimizer.

public void testOptimizer() throws Exception {
    // convert a's to b's and b's to c's
    String rules = "a > b; b > c;";
    Transliterator custom = Transliterator.createFromRules("test", rules, Transliterator.FORWARD);
    assertTrue(custom.getFilter() == null);
    final KeywordTokenizer input = new KeywordTokenizer();
    input.setReader(new StringReader(""));
    new ICUTransformFilter(input, custom);
    assertTrue(custom.getFilter().equals(new UnicodeSet("[ab]")));
}
Also used : StringReader(java.io.StringReader) KeywordTokenizer(org.apache.lucene.analysis.core.KeywordTokenizer) UnicodeSet(com.ibm.icu.text.UnicodeSet) Transliterator(com.ibm.icu.text.Transliterator)

Aggregations

UnicodeSet (com.ibm.icu.text.UnicodeSet)8 IntervalSet (org.antlr.v4.runtime.misc.IntervalSet)4 Transliterator (com.ibm.icu.text.Transliterator)2 StringReader (java.io.StringReader)2 KeywordTokenizer (org.apache.lucene.analysis.core.KeywordTokenizer)2 FilteredNormalizer2 (com.ibm.icu.text.FilteredNormalizer2)1 Normalizer2 (com.ibm.icu.text.Normalizer2)1 UnicodeSetIterator (com.ibm.icu.text.UnicodeSetIterator)1 ICUFoldingFilter (org.apache.lucene.analysis.icu.ICUFoldingFilter)1