Search in sources :

Example 1 with OriginalWordAttribute

use of peltomaa.sukija.attributes.OriginalWordAttribute in project sukija by ahomansikka.

the class SuggestionTester method analyze.

public static void analyze(Reader reader, Writer writer, Voikko voikko, String suggestionFile, boolean stopOnSuccess, boolean useHyphenFilter, TokenStream t) throws IOException {
    List<Analysis> analysis = null;
    ((Tokenizer) t).setReader(reader);
    //    t = new VoikkoFilter (t, voikko);
    t = new SuggestionFilter(t, voikko, suggestionFile, false);
    CharTermAttribute termAtt = t.addAttribute(CharTermAttribute.class);
    BaseFormAttribute baseFormAtt = t.addAttribute(BaseFormAttribute.class);
    FlagsAttribute flagsAtt = t.addAttribute(FlagsAttribute.class);
    OriginalWordAttribute originalWordAtt = t.addAttribute(OriginalWordAttribute.class);
    try {
        t.reset();
        while (t.incrementToken()) {
            writer.write("Sana: " + originalWordAtt.getOriginalWord() + " | " + termAtt.toString() + " | ");
            writer.write(Constants.toString(flagsAtt));
            writer.write("\n");
            writer.flush();
        }
        t.end();
    } finally {
        t.close();
    }
}
Also used : FlagsAttribute(org.apache.lucene.analysis.tokenattributes.FlagsAttribute) CharTermAttribute(org.apache.lucene.analysis.tokenattributes.CharTermAttribute) BaseFormAttribute(peltomaa.sukija.attributes.BaseFormAttribute) Analysis(org.puimula.libvoikko.Analysis) OriginalWordAttribute(peltomaa.sukija.attributes.OriginalWordAttribute) Tokenizer(org.apache.lucene.analysis.Tokenizer) HVTokenizer(peltomaa.sukija.finnish.HVTokenizer)

Example 2 with OriginalWordAttribute

use of peltomaa.sukija.attributes.OriginalWordAttribute in project sukija by ahomansikka.

the class BaseFormTester method test.

public static void test(Reader reader, Writer writer, Voikko voikko, boolean successOnly) throws IOException {
    TokenStream t = new HVTokenizer();
    ((Tokenizer) t).setReader(reader);
    t = new BaseFormFilter(t, voikko, successOnly);
    CharTermAttribute termAtt = t.addAttribute(CharTermAttribute.class);
    BaseFormAttribute baseFormAtt = t.addAttribute(BaseFormAttribute.class);
    FlagsAttribute flagsAtt = t.addAttribute(FlagsAttribute.class);
    OriginalWordAttribute originalWordAtt = t.addAttribute(OriginalWordAttribute.class);
    String orig = "";
    TreeSet<String> tset = new TreeSet<String>();
    FlagsAttribute flagsA = new FlagsAttributeImpl();
    try {
        t.reset();
        while (t.incrementToken()) {
            if (!orig.equals("") && !orig.equals(originalWordAtt.getOriginalWord())) {
                writer.write("Sana: " + orig);
                if (Constants.hasFlag(flagsA, Constants.FOUND)) {
                    writer.write(" M " + toString(tset));
                }
                writer.write("\n");
                writer.flush();
                tset.clear();
            }
            orig = originalWordAtt.getOriginalWord();
            tset.addAll(baseFormAtt.getBaseForms());
            flagsA.setFlags(flagsAtt.getFlags());
        }
        writer.write("Sana: " + orig);
        if (Constants.hasFlag(flagsA, Constants.FOUND)) {
            writer.write(" M " + toString(tset));
        }
        writer.write("\n");
        writer.flush();
        t.end();
    } finally {
        t.close();
    }
/*
    try {
      t.reset();
      while (t.incrementToken()) {
        writer.write ("Sana: " + originalWordAtt.getOriginalWord()
                      + " " + termAtt.toString()
                      + " " + Constants.toString (flagsAtt)
                      + " " + baseFormAtt.getBaseForms().toString()
                      + "\n");
        writer.flush();
      }
      t.end();
    }
    finally {
      t.close();
    }
*/
}
Also used : HVTokenizer(peltomaa.sukija.finnish.HVTokenizer) TokenStream(org.apache.lucene.analysis.TokenStream) FlagsAttribute(org.apache.lucene.analysis.tokenattributes.FlagsAttribute) CharTermAttribute(org.apache.lucene.analysis.tokenattributes.CharTermAttribute) FlagsAttributeImpl(org.apache.lucene.analysis.tokenattributes.FlagsAttributeImpl) BaseFormAttribute(peltomaa.sukija.attributes.BaseFormAttribute) TreeSet(java.util.TreeSet) OriginalWordAttribute(peltomaa.sukija.attributes.OriginalWordAttribute) Tokenizer(org.apache.lucene.analysis.Tokenizer) HVTokenizer(peltomaa.sukija.finnish.HVTokenizer)

Example 3 with OriginalWordAttribute

use of peltomaa.sukija.attributes.OriginalWordAttribute in project sukija by ahomansikka.

the class KeepFilterTester method test.

public static void test(Reader reader, Writer writer, Voikko voikko, CharArraySet wordSet, String from, String to, Suggestion[] suggestion, boolean stopOnSuccess) throws IOException {
    Set<String> set = new TreeSet<String>();
    TokenStream t = new HVTokenizer();
    ((Tokenizer) t).setReader(reader);
    t = new KeepFilter(t, voikko, wordSet, from, to, suggestion);
    CharTermAttribute termAtt = t.addAttribute(CharTermAttribute.class);
    BaseFormAttribute baseFormAtt = t.addAttribute(BaseFormAttribute.class);
    FlagsAttribute flagsAtt = t.addAttribute(FlagsAttribute.class);
    OriginalWordAttribute originalWordAtt = t.addAttribute(OriginalWordAttribute.class);
    try {
        t.reset();
        while (t.incrementToken()) {
            writer.write("Sana: " + originalWordAtt.getOriginalWord() + " " + termAtt.toString() + " " + Constants.toString(flagsAtt) + " " + baseFormAtt.getBaseForms().toString() + "\n");
            writer.flush();
        }
        t.end();
    } finally {
        t.close();
    }
}
Also used : HVTokenizer(peltomaa.sukija.finnish.HVTokenizer) TokenStream(org.apache.lucene.analysis.TokenStream) FlagsAttribute(org.apache.lucene.analysis.tokenattributes.FlagsAttribute) CharTermAttribute(org.apache.lucene.analysis.tokenattributes.CharTermAttribute) BaseFormAttribute(peltomaa.sukija.attributes.BaseFormAttribute) TreeSet(java.util.TreeSet) OriginalWordAttribute(peltomaa.sukija.attributes.OriginalWordAttribute) Tokenizer(org.apache.lucene.analysis.Tokenizer) HVTokenizer(peltomaa.sukija.finnish.HVTokenizer)

Aggregations

Tokenizer (org.apache.lucene.analysis.Tokenizer)3 CharTermAttribute (org.apache.lucene.analysis.tokenattributes.CharTermAttribute)3 FlagsAttribute (org.apache.lucene.analysis.tokenattributes.FlagsAttribute)3 BaseFormAttribute (peltomaa.sukija.attributes.BaseFormAttribute)3 OriginalWordAttribute (peltomaa.sukija.attributes.OriginalWordAttribute)3 HVTokenizer (peltomaa.sukija.finnish.HVTokenizer)3 TreeSet (java.util.TreeSet)2 TokenStream (org.apache.lucene.analysis.TokenStream)2 FlagsAttributeImpl (org.apache.lucene.analysis.tokenattributes.FlagsAttributeImpl)1 Analysis (org.puimula.libvoikko.Analysis)1