use of edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder in project cogcomp-nlp by CogComp.
the class Demo method main.
public static void main(String[] args) throws IOException, AnnotatorException {
Options options = new Options();
Option inputtext = new Option("t", "text", true, "input text to be processed");
inputtext.setRequired(false);
options.addOption(inputtext);
CommandLineParser parser = new DefaultParser();
HelpFormatter formatter = new HelpFormatter();
try {
CommandLine cmd = parser.parse(options, args);
String defaultText = "The flu season is winding down, and it has killed 105 children so far - about the average toll.\n" + "\n" + "The season started about a month earlier than usual, sparking concerns it might turn into the worst in " + "a decade. It ended up being very hard on the elderly, but was moderately severe overall, according to " + "the Centers for Disease Control and Prevention.\n" + "\n" + "Six of the pediatric deaths were reported in the last week, and it's possible there will be more, said " + "the CDC's Dr. Michael Jhung said Friday.\n" + "\n" + "Roughly 100 children die in an average flu season. One exception was the swine flu pandemic of " + "2009-2010, when 348 children died.\n" + "\n" + "The CDC recommends that all children ages 6 months and older be vaccinated against flu each season, " + "though only about half get a flu shot or nasal spray.\n" + "\n" + "All but four of the children who died were old enough to be vaccinated, but 90 percent of them did " + "not get vaccinated, CDC officials said.\n" + "\n" + "This year's vaccine was considered effective in children, though it didn't work very well in older " + "people. And the dominant flu strain early in the season was one that tends to " + "cause more severe illness.\n" + "\n" + "The government only does a national flu death count for children. But it does track hospitalization " + "rates for people 65 and older, and those statistics have been grim.\n" + "\n" + "In that group, 177 out of every 100,000 were hospitalized with flu-related illness in the past " + "several months. That's more than 2 1/2 times higher than any other recent season.\n" + "\n" + "This flu season started in early December, a month earlier than usual, and peaked by the end " + "of year. Since then, flu reports have been dropping off throughout the country.\n" + "\n" + "\"We appear to be getting close to the end of flu season,\" Jhung said.";
String text = cmd.getOptionValue("text", defaultText);
TextAnnotationBuilder tab = new TokenizerTextAnnotationBuilder(new StatefulTokenizer());
TextAnnotation ta = tab.createTextAnnotation("corpus", "id", text);
POSAnnotator annotator = new POSAnnotator();
try {
annotator.getView(ta);
} catch (AnnotatorException e) {
fail("AnnotatorException thrown!\n" + e.getMessage());
}
Properties rmProps = new TemporalChunkerConfigurator().getDefaultConfig().getProperties();
rmProps.setProperty("useHeidelTime", "False");
TemporalChunkerAnnotator tca = new TemporalChunkerAnnotator(new ResourceManager(rmProps));
tca.addView(ta);
View temporalViews = ta.getView(ViewNames.TIMEX3);
List<Constituent> constituents = temporalViews.getConstituents();
System.out.printf("There're %d time expressions (TIMEX) in total.\n", constituents.size());
for (Constituent c : constituents) {
System.out.printf("TIMEX #%d: Text=%s, Type=%s, Value=%s\n", constituents.indexOf(c), c, c.getAttribute("type"), c.getAttribute("value"));
}
} catch (ParseException e) {
System.out.println(e.getMessage());
formatter.printHelp("Temporal Normalizer Demo", options);
System.exit(1);
}
}
use of edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder in project cogcomp-nlp by CogComp.
the class NerOntonotesTest method testOntonotesNer.
@Test
public void testOntonotesNer() {
TextAnnotationBuilder tab = new TokenizerTextAnnotationBuilder(new StatefulTokenizer());
Properties props = new Properties();
NERAnnotator nerOntonotes = NerAnnotatorManager.buildNerAnnotator(new ResourceManager(props), ViewNames.NER_ONTONOTES);
TextAnnotation taOnto = tab.createTextAnnotation("", "", TEST_INPUT);
try {
nerOntonotes.getView(taOnto);
} catch (AnnotatorException e) {
e.printStackTrace();
fail(e.getMessage());
}
View v = taOnto.getView(nerOntonotes.getViewName());
assertEquals(3, v.getConstituents().size());
}
use of edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder in project cogcomp-nlp by CogComp.
the class MainClass method annotate.
private static void annotate(String filepath) throws IOException {
DepAnnotator annotator = new DepAnnotator();
TextAnnotationBuilder taBuilder = new TokenizerTextAnnotationBuilder(new StatefulTokenizer(true, false));
Preprocessor preprocessor = new Preprocessor();
Files.lines(Paths.get(filepath)).forEach(line -> {
TextAnnotation ta = taBuilder.createTextAnnotation(line);
try {
preprocessor.annotate(ta);
annotator.addView(ta);
System.out.println(ta.getView(annotator.getViewName()).toString());
} catch (AnnotatorException e) {
e.printStackTrace();
}
});
}
use of edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder in project cogcomp-nlp by CogComp.
the class MultiLingualTokenizer method main.
public static void main(String[] args) {
TextAnnotationBuilder tokenizer = MultiLingualTokenizer.getTokenizer("ja");
String text = "\"ペンシルベニアドイツ語\",\"text\":\"ペンシルベニアドイツ語(標準ドイ" + "ツ語:Pennsylvania-Dutch, Pennsilfaani-Deitsch、アレマン語:Pennsylvania-Ditsch、英語:Pennsylvania-German)" + "は、北アメリカのカナダおよびアメリカ中西部でおよそ15万から25万人の人びとに話されているドイツ語の系統である。高地ドイツ語の" + "うち上部ドイツ語の一派アレマン語の一方言である。ペンシルベニアアレマン語(Pennsilfaani-Alemanisch, Pennsylvania-Alemannic)" + "とも呼ばれる。";
TextAnnotation ta = tokenizer.createTextAnnotation(text);
for (int i = 0; i < ta.getNumberOfSentences(); i++) System.out.println(ta.getSentence(i).getTokenizedText());
}
use of edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder in project cogcomp-nlp by CogComp.
the class MultiLingualTokenizer method getTokenizer.
public static TextAnnotationBuilder getTokenizer(String lang) {
if (tokenizerMap == null)
tokenizerMap = new HashMap<>();
if (!tokenizerMap.containsKey(lang)) {
TextAnnotationBuilder tokenizer = null;
if (lang.equals("en"))
tokenizer = new TokenizerTextAnnotationBuilder(new StatefulTokenizer());
else if (lang.equals("es"))
tokenizer = new TokenizerTextAnnotationBuilder(new StanfordAnalyzer());
else if (lang.equals("zh"))
tokenizer = new TokenizerTextAnnotationBuilder(new CharacterTokenizer());
else if (lang.equals("th"))
tokenizer = new TokenizerTextAnnotationBuilder(new ThaiTokenizer());
else if (lang.equals("ja"))
tokenizer = new TokenizerTextAnnotationBuilder(new JapaneseTokenizer());
else
tokenizer = new TokenizerTextAnnotationBuilder(new WhiteSpaceTokenizer());
tokenizerMap.put(lang, tokenizer);
}
return tokenizerMap.get(lang);
}
Aggregations