Examples with TextAnnotationBuilder - edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder

Example 16 with TextAnnotationBuilder

use of edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder in project cogcomp-nlp by CogComp.

the class Demo method main.

public static void main(String[] args) throws IOException, AnnotatorException {
    Options options = new Options();
    Option inputtext = new Option("t", "text", true, "input text to be processed");
    inputtext.setRequired(false);
    options.addOption(inputtext);
    CommandLineParser parser = new DefaultParser();
    HelpFormatter formatter = new HelpFormatter();
    try {
        CommandLine cmd = parser.parse(options, args);
        String defaultText = "The flu season is winding down, and it has killed 105 children so far - about the average toll.\n" + "\n" + "The season started about a month earlier than usual, sparking concerns it might turn into the worst in " + "a decade. It ended up being very hard on the elderly, but was moderately severe overall, according to " + "the Centers for Disease Control and Prevention.\n" + "\n" + "Six of the pediatric deaths were reported in the last week, and it's possible there will be more, said " + "the CDC's Dr. Michael Jhung said Friday.\n" + "\n" + "Roughly 100 children die in an average flu season. One exception was the swine flu pandemic of " + "2009-2010, when 348 children died.\n" + "\n" + "The CDC recommends that all children ages 6 months and older be vaccinated against flu each season, " + "though only about half get a flu shot or nasal spray.\n" + "\n" + "All but four of the children who died were old enough to be vaccinated, but 90 percent of them did " + "not get vaccinated, CDC officials said.\n" + "\n" + "This year's vaccine was considered effective in children, though it didn't work very well in older " + "people. And the dominant flu strain early in the season was one that tends to " + "cause more severe illness.\n" + "\n" + "The government only does a national flu death count for children. But it does track hospitalization " + "rates for people 65 and older, and those statistics have been grim.\n" + "\n" + "In that group, 177 out of every 100,000 were hospitalized with flu-related illness in the past " + "several months. That's more than 2 1/2 times higher than any other recent season.\n" + "\n" + "This flu season started in early December, a month earlier than usual, and peaked by the end " + "of year. Since then, flu reports have been dropping off throughout the country.\n" + "\n" + "\"We appear to be getting close to the end of flu season,\" Jhung said.";
        String text = cmd.getOptionValue("text", defaultText);
        TextAnnotationBuilder tab = new TokenizerTextAnnotationBuilder(new StatefulTokenizer());
        TextAnnotation ta = tab.createTextAnnotation("corpus", "id", text);
        POSAnnotator annotator = new POSAnnotator();
        try {
            annotator.getView(ta);
        } catch (AnnotatorException e) {
            fail("AnnotatorException thrown!\n" + e.getMessage());
        }
        Properties rmProps = new TemporalChunkerConfigurator().getDefaultConfig().getProperties();
        rmProps.setProperty("useHeidelTime", "False");
        TemporalChunkerAnnotator tca = new TemporalChunkerAnnotator(new ResourceManager(rmProps));
        tca.addView(ta);
        View temporalViews = ta.getView(ViewNames.TIMEX3);
        List<Constituent> constituents = temporalViews.getConstituents();
        System.out.printf("There're %d time expressions (TIMEX) in total.\n", constituents.size());
        for (Constituent c : constituents) {
            System.out.printf("TIMEX #%d: Text=%s, Type=%s, Value=%s\n", constituents.indexOf(c), c, c.getAttribute("type"), c.getAttribute("value"));
        }
    } catch (ParseException e) {
        System.out.println(e.getMessage());
        formatter.printHelp("Temporal Normalizer Demo", options);
        System.exit(1);
    }
}

Also used : TextAnnotationBuilder(edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) POSAnnotator(edu.illinois.cs.cogcomp.pos.POSAnnotator) AnnotatorException(edu.illinois.cs.cogcomp.annotation.AnnotatorException) ResourceManager(edu.illinois.cs.cogcomp.core.utilities.configuration.ResourceManager) Properties(java.util.Properties) View(edu.illinois.cs.cogcomp.core.datastructures.textannotation.View) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) StatefulTokenizer(edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation) Constituent(edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent)

Example 17 with TextAnnotationBuilder

use of edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder in project cogcomp-nlp by CogComp.

the class NerOntonotesTest method testOntonotesNer.

@Test
public void testOntonotesNer() {
    TextAnnotationBuilder tab = new TokenizerTextAnnotationBuilder(new StatefulTokenizer());
    Properties props = new Properties();
    NERAnnotator nerOntonotes = NerAnnotatorManager.buildNerAnnotator(new ResourceManager(props), ViewNames.NER_ONTONOTES);
    TextAnnotation taOnto = tab.createTextAnnotation("", "", TEST_INPUT);
    try {
        nerOntonotes.getView(taOnto);
    } catch (AnnotatorException e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
    View v = taOnto.getView(nerOntonotes.getViewName());
    assertEquals(3, v.getConstituents().size());
}

Also used : TextAnnotationBuilder(edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) StatefulTokenizer(edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer) AnnotatorException(edu.illinois.cs.cogcomp.annotation.AnnotatorException) ResourceManager(edu.illinois.cs.cogcomp.core.utilities.configuration.ResourceManager) Properties(java.util.Properties) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation) View(edu.illinois.cs.cogcomp.core.datastructures.textannotation.View) Test(org.junit.Test)

Example 18 with TextAnnotationBuilder

use of edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder in project cogcomp-nlp by CogComp.

the class MainClass method annotate.

private static void annotate(String filepath) throws IOException {
    DepAnnotator annotator = new DepAnnotator();
    TextAnnotationBuilder taBuilder = new TokenizerTextAnnotationBuilder(new StatefulTokenizer(true, false));
    Preprocessor preprocessor = new Preprocessor();
    Files.lines(Paths.get(filepath)).forEach(line -> {
        TextAnnotation ta = taBuilder.createTextAnnotation(line);
        try {
            preprocessor.annotate(ta);
            annotator.addView(ta);
            System.out.println(ta.getView(annotator.getViewName()).toString());
        } catch (AnnotatorException e) {
            e.printStackTrace();
        }
    });
}

Also used : TextAnnotationBuilder(edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) StatefulTokenizer(edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer) AnnotatorException(edu.illinois.cs.cogcomp.annotation.AnnotatorException) Preprocessor(edu.illinois.cs.cogcomp.depparse.io.Preprocessor) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation)

Example 19 with TextAnnotationBuilder

use of edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder in project cogcomp-nlp by CogComp.

the class MultiLingualTokenizer method main.

public static void main(String[] args) {
    TextAnnotationBuilder tokenizer = MultiLingualTokenizer.getTokenizer("ja");
    String text = "\"ペンシルベニアドイツ語\",\"text\":\"ペンシルベニアドイツ語（標準ドイ" + "ツ語：Pennsylvania-Dutch, Pennsilfaani-Deitsch、アレマン語：Pennsylvania-Ditsch、英語：Pennsylvania-German）" + "は、北アメリカのカナダおよびアメリカ中西部でおよそ15万から25万人の人びとに話されているドイツ語の系統である。高地ドイツ語の" + "うち上部ドイツ語の一派アレマン語の一方言である。ペンシルベニアアレマン語(Pennsilfaani-Alemanisch, Pennsylvania-Alemannic)" + "とも呼ばれる。";
    TextAnnotation ta = tokenizer.createTextAnnotation(text);
    for (int i = 0; i < ta.getNumberOfSentences(); i++) System.out.println(ta.getSentence(i).getTokenizedText());
}

Example 20 with TextAnnotationBuilder

use of edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder in project cogcomp-nlp by CogComp.

the class MultiLingualTokenizer method getTokenizer.

public static TextAnnotationBuilder getTokenizer(String lang) {
    if (tokenizerMap == null)
        tokenizerMap = new HashMap<>();
    if (!tokenizerMap.containsKey(lang)) {
        TextAnnotationBuilder tokenizer = null;
        if (lang.equals("en"))
            tokenizer = new TokenizerTextAnnotationBuilder(new StatefulTokenizer());
        else if (lang.equals("es"))
            tokenizer = new TokenizerTextAnnotationBuilder(new StanfordAnalyzer());
        else if (lang.equals("zh"))
            tokenizer = new TokenizerTextAnnotationBuilder(new CharacterTokenizer());
        else if (lang.equals("th"))
            tokenizer = new TokenizerTextAnnotationBuilder(new ThaiTokenizer());
        else if (lang.equals("ja"))
            tokenizer = new TokenizerTextAnnotationBuilder(new JapaneseTokenizer());
        else
            tokenizer = new TokenizerTextAnnotationBuilder(new WhiteSpaceTokenizer());
        tokenizerMap.put(lang, tokenizer);
    }
    return tokenizerMap.get(lang);
}

Also used : TextAnnotationBuilder(edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) HashMap(java.util.HashMap) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) StatefulTokenizer(edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer)

Aggregations

TextAnnotationBuilder (edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder)22 TokenizerTextAnnotationBuilder (edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder)20 StatefulTokenizer (edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer)16 TextAnnotation (edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation)15 AnnotatorException (edu.illinois.cs.cogcomp.annotation.AnnotatorException)7 POSAnnotator (edu.illinois.cs.cogcomp.pos.POSAnnotator)7 Constituent (edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent)6 ResourceManager (edu.illinois.cs.cogcomp.core.utilities.configuration.ResourceManager)6 Properties (java.util.Properties)6 Test (org.junit.Test)5 ChunkerAnnotator (edu.illinois.cs.cogcomp.chunker.main.ChunkerAnnotator)3 ChunkerConfigurator (edu.illinois.cs.cogcomp.chunker.main.ChunkerConfigurator)3 IntPair (edu.illinois.cs.cogcomp.core.datastructures.IntPair)3 View (edu.illinois.cs.cogcomp.core.datastructures.textannotation.View)3 StanfordDepHandler (edu.illinois.cs.cogcomp.pipeline.handlers.StanfordDepHandler)3 POSTaggerAnnotator (edu.stanford.nlp.pipeline.POSTaggerAnnotator)3 ParserAnnotator (edu.stanford.nlp.pipeline.ParserAnnotator)3 File (java.io.File)3 ArrayList (java.util.ArrayList)3 EREEventReader (edu.illinois.cs.cogcomp.nlp.corpusreaders.ereReader.EREEventReader)2