Examples with StatefulTokenizer - edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer

Example 11 with StatefulTokenizer

use of edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer in project cogcomp-nlp by CogComp.

the class ExampleUsage method AnnotatorExample.

public static void AnnotatorExample() {
    String text = "He went to Chicago after his Father moved there.";
    String corpus = "story";
    String textId = "001";
    // Create a TextAnnotation From Text
    TextAnnotationBuilder stab = new TokenizerTextAnnotationBuilder(new StatefulTokenizer());
    TextAnnotation ta = stab.createTextAnnotation(corpus, textId, text);
    POSAnnotator pos_annotator = new POSAnnotator();
    ChunkerAnnotator chunker = new ChunkerAnnotator(true);
    chunker.initialize(new ChunkerConfigurator().getDefaultConfig());
    Properties stanfordProps = new Properties();
    stanfordProps.put("annotators", "pos, parse");
    stanfordProps.put("parse.originalDependencies", true);
    stanfordProps.put("parse.maxlen", Stanford331Configurator.STFRD_MAX_SENTENCE_LENGTH);
    stanfordProps.put("parse.maxtime", Stanford331Configurator.STFRD_TIME_PER_SENTENCE);
    POSTaggerAnnotator posAnnotator = new POSTaggerAnnotator("pos", stanfordProps);
    ParserAnnotator parseAnnotator = new ParserAnnotator("parse", stanfordProps);
    StanfordDepHandler stanfordDepHandler = new StanfordDepHandler(posAnnotator, parseAnnotator);
    RelationAnnotator relationAnnotator = new RelationAnnotator();
    try {
        ta.addView(pos_annotator);
        chunker.addView(ta);
        stanfordDepHandler.addView(ta);
        relationAnnotator.addView(ta);
    } catch (Exception e) {
        e.printStackTrace();
    }
    View mentionView = ta.getView(ViewNames.MENTION);
    List<Constituent> predictedMentions = mentionView.getConstituents();
    List<Relation> predictedRelations = mentionView.getRelations();
    for (Relation r : predictedRelations) {
        IOHelper.printRelation(r);
    }
}

Also used : ChunkerConfigurator(edu.illinois.cs.cogcomp.chunker.main.ChunkerConfigurator) TextAnnotationBuilder(edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) ParserAnnotator(edu.stanford.nlp.pipeline.ParserAnnotator) POSAnnotator(edu.illinois.cs.cogcomp.pos.POSAnnotator) Properties(java.util.Properties) ChunkerAnnotator(edu.illinois.cs.cogcomp.chunker.main.ChunkerAnnotator) POSTaggerAnnotator(edu.stanford.nlp.pipeline.POSTaggerAnnotator) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) StatefulTokenizer(edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer) StanfordDepHandler(edu.illinois.cs.cogcomp.pipeline.handlers.StanfordDepHandler)

Example 12 with StatefulTokenizer

use of edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer in project cogcomp-nlp by CogComp.

the class TemporalNormalizerBenchmark method testTemporalChunker.

/**
 * Normalize the dataset using our Chunker for temporal phrases extraction
 * @param outputFolder
 * @param verbose
 * @throws Exception
 */
public void testTemporalChunker(String outputFolder, boolean verbose) throws Exception {
    TextAnnotationBuilder tab = new TokenizerTextAnnotationBuilder(new StatefulTokenizer(false, false));
    ResourceManager nerRm = new TemporalChunkerConfigurator().getDefaultConfig();
    IOUtilities.existsInClasspath(TemporalChunkerAnnotator.class, nerRm.getString("modelDirPath"));
    java.util.logging.Logger.getLogger("HeidelTimeStandalone").setLevel(Level.OFF);
    List<TextAnnotation> taList = new ArrayList<>();
    long preprocessTime = System.currentTimeMillis();
    POSAnnotator annotator = new POSAnnotator();
    for (int j = 0; j < testText.size(); j++) {
        TextAnnotation ta = tab.createTextAnnotation("corpus", "id", testText.get(j));
        try {
            annotator.getView(ta);
        } catch (AnnotatorException e) {
            fail("AnnotatorException thrown!\n" + e.getMessage());
        }
        taList.add(ta);
    }
    if (verbose) {
        System.out.println("Start");
    }
    long startTime = System.currentTimeMillis();
    File outDir = new File(outputFolder);
    if (!outDir.exists()) {
        outDir.mkdir();
    }
    for (int j = 0; j < testText.size(); j++) {
        tca.addDocumentCreationTime(DCTs.get(j));
        TextAnnotation ta = taList.get(j);
        try {
            tca.addView(ta);
        } catch (AnnotatorException e) {
            fail("Exception while adding TIMEX3 VIEW " + e.getStackTrace());
        }
        String outputFileName = "./" + outputFolder + "/" + docIDs.get(j) + ".tml";
        if (verbose) {
            System.out.println(docIDs.get(j));
            for (TimexChunk tc : tca.getTimex()) {
                System.out.println(tc.toTIMEXString());
            }
            System.out.println("\n");
        }
        tca.write2Text(outputFileName, docIDs.get(j), testText.get(j));
        tca.deleteTimex();
    }
    long endTime = System.currentTimeMillis();
    long totalTime = endTime - startTime;
    if (verbose) {
        System.out.println("Process time: " + totalTime);
        System.out.println("Preprocess + process time: " + (endTime - preprocessTime));
    }
}

Also used : TextAnnotationBuilder(edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) TimexChunk(edu.illinois.cs.cogcomp.temporal.normalizer.main.timex2interval.TimexChunk) POSAnnotator(edu.illinois.cs.cogcomp.pos.POSAnnotator) ArrayList(java.util.ArrayList) AnnotatorException(edu.illinois.cs.cogcomp.annotation.AnnotatorException) ResourceManager(edu.illinois.cs.cogcomp.core.utilities.configuration.ResourceManager) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) StatefulTokenizer(edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation)

Example 13 with StatefulTokenizer

use of edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer in project cogcomp-nlp by CogComp.

the class ExternalAnnotatorServiceFactory method buildPipeline.

/**
     * create an AnnotatorService with components specified by the ResourceManager (to override
     * defaults in {@link ExternalToolsConfigurator}
     *
     * @param rm non-default config options
     * @return AnnotatorService with specified NLP components
     * @throws IOException
     * @throws AnnotatorException
     */
public static BasicAnnotatorService buildPipeline(ResourceManager rm) throws IOException, AnnotatorException {
    // Merges default configuration with the user-specified overrides.
    ResourceManager fullRm = (new ExternalToolsConfigurator()).getConfig(rm);
    Boolean splitOnDash = fullRm.getBoolean(ExternalToolsConfigurator.SPLIT_ON_DASH);
    boolean isSentencePipeline = fullRm.getBoolean(ExternalToolsConfigurator.USE_SENTENCE_PIPELINE.key);
    TextAnnotationBuilder taBldr = new TokenizerTextAnnotationBuilder(new StatefulTokenizer(splitOnDash));
    Map<String, Annotator> annotators = buildAnnotators();
    return isSentencePipeline ? new SentencePipeline(taBldr, annotators, fullRm) : new BasicAnnotatorService(taBldr, annotators, fullRm);
}

Also used : TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) StatefulTokenizer(edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer) ExternalToolsConfigurator(edu.illinois.cs.cogcomp.pipeline.common.ExternalToolsConfigurator) ResourceManager(edu.illinois.cs.cogcomp.core.utilities.configuration.ResourceManager)

Example 14 with StatefulTokenizer

use of edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer in project cogcomp-nlp by CogComp.

the class TemporalNormalizerBenchmark method testNormalizationWithTrueExtraction.

/**
 * Normalize the dataset using real extraction
 * @param outputFolder
 * @param verbose
 * @throws Exception
 */
public void testNormalizationWithTrueExtraction(String outputFolder, boolean verbose) throws Exception {
    TextAnnotationBuilder tab = new TokenizerTextAnnotationBuilder(new StatefulTokenizer(false, false));
    System.out.println("Working Directory = " + System.getProperty("user.dir"));
    ResourceManager nerRm = new TemporalChunkerConfigurator().getDefaultConfig();
    IOUtilities.existsInClasspath(TemporalChunkerAnnotator.class, nerRm.getString("modelDirPath"));
    java.util.logging.Logger.getLogger("HeidelTimeStandalone").setLevel(Level.OFF);
    long preprocessTime = System.currentTimeMillis();
    List<TextAnnotation> taList = new ArrayList<>();
    POSAnnotator annotator = new POSAnnotator();
    for (int j = 0; j < te3inputText.size(); j++) {
        String text = testText.get(j);
        text = text.replaceAll("\\n", " ");
        TextAnnotation ta = tab.createTextAnnotation("corpus", "id", text);
        try {
            annotator.getView(ta);
        } catch (AnnotatorException e) {
            fail("AnnotatorException thrown!\n" + e.getMessage());
        }
        taList.add(ta);
    }
    long startTime = System.currentTimeMillis();
    int numTimex = 0;
    File outDir = new File(outputFolder);
    if (!outDir.exists()) {
        outDir.mkdir();
    }
    for (int j = 0; j < te3inputText.size(); j++) {
        TextAnnotation ta = taList.get(j);
        tca.addDocumentCreationTime(DCTs.get(j));
        if (verbose) {
            System.out.println(docIDs.get(j));
        }
        try {
            List<TimexChunk> timex = tca.extractTimexFromFile(te3inputText.get(j), testText.get(j), ta, verbose);
            tca.setTimex(timex);
            String outputFileName = outputFolder + "/" + docIDs.get(j) + ".tml";
            tca.write2Text(outputFileName, docIDs.get(j), testText.get(j));
            numTimex += timex.size();
            tca.deleteTimex();
        } catch (AnnotatorException e) {
            fail("Exception while adding TIMEX3 VIEW " + e.getStackTrace());
        }
    }
    long endTime = System.currentTimeMillis();
    long totalTime = endTime - startTime;
    System.out.println("Process time: " + totalTime);
    System.out.println("Preprocess + process time: " + (endTime - preprocessTime));
    System.out.println("Total timex3: " + numTimex);
}

Example 15 with StatefulTokenizer

use of edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer in project cogcomp-nlp by CogComp.

the class TestTemporalChunker method testTemporalChunkerWithPlainText.

@Test
public void testTemporalChunkerWithPlainText() throws Exception {
    String text = "The flu season is winding down, and it has killed 105 children so far - about the average toll.\n" + "\n" + "The season started about a month earlier than usual, sparking concerns it might turn into the worst in " + "a decade. It ended up being very hard on the elderly, but was moderately severe overall, according to " + "the Centers for Disease Control and Prevention.\n" + "\n" + "Six of the pediatric deaths were reported in the last week, and it's possible there will be more, said " + "the CDC's Dr. Michael Jhung said Friday.\n" + "\n" + "Roughly 100 children die in an average flu season. One exception was the swine flu pandemic of " + "2009-2010, when 348 children died.\n" + "\n" + "The CDC recommends that all children ages 6 months and older be vaccinated against flu each season, " + "though only about half get a flu shot or nasal spray.\n" + "\n" + "All but four of the children who died were old enough to be vaccinated, but 90 percent of them did " + "not get vaccinated, CDC officials said.\n" + "\n" + "This year's vaccine was considered effective in children, though it didn't work very well in older " + "people. And the dominant flu strain early in the season was one that tends to " + "cause more severe illness.\n" + "\n" + "The government only does a national flu death count for children. But it does track hospitalization " + "rates for people 65 and older, and those statistics have been grim.\n" + "\n" + "In that group, 177 out of every 100,000 were hospitalized with flu-related illness in the past " + "several months. That's more than 2 1/2 times higher than any other recent season.\n" + "\n" + "This flu season started in early December, a month earlier than usual, and peaked by the end " + "of year. Since then, flu reports have been dropping off throughout the country.\n" + "\n" + "\"We appear to be getting close to the end of flu season,\" Jhung said.";
    TextAnnotationBuilder tab = new TokenizerTextAnnotationBuilder(new StatefulTokenizer());
    TextAnnotation ta = tab.createTextAnnotation("corpus", "id", text);
    POSAnnotator annotator = new POSAnnotator();
    try {
        annotator.getView(ta);
    } catch (AnnotatorException e) {
        fail("AnnotatorException thrown!\n" + e.getMessage());
    }
    tca.addView(ta);
    View temporalViews = ta.getView(ViewNames.TIMEX3);
    List<Constituent> constituents = temporalViews.getConstituents();
    assertEquals("<TIMEX3 type=\"DURATION\" value=\"P1M\">", constituents.get(0).getLabel());
}

Also used : TextAnnotationBuilder(edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) POSAnnotator(edu.illinois.cs.cogcomp.pos.POSAnnotator) TokenizerTextAnnotationBuilder(edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder) StatefulTokenizer(edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer) AnnotatorException(edu.illinois.cs.cogcomp.annotation.AnnotatorException) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation) View(edu.illinois.cs.cogcomp.core.datastructures.textannotation.View) Constituent(edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent) Test(org.junit.Test)

Aggregations

StatefulTokenizer (edu.illinois.cs.cogcomp.nlp.tokenizer.StatefulTokenizer)30 TokenizerTextAnnotationBuilder (edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder)29 TextAnnotation (edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation)19 TextAnnotationBuilder (edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder)16 ResourceManager (edu.illinois.cs.cogcomp.core.utilities.configuration.ResourceManager)12 AnnotatorException (edu.illinois.cs.cogcomp.annotation.AnnotatorException)9 POSAnnotator (edu.illinois.cs.cogcomp.pos.POSAnnotator)9 Constituent (edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent)7 Properties (java.util.Properties)7 ChunkerAnnotator (edu.illinois.cs.cogcomp.chunker.main.ChunkerAnnotator)5 View (edu.illinois.cs.cogcomp.core.datastructures.textannotation.View)5 POSTaggerAnnotator (edu.stanford.nlp.pipeline.POSTaggerAnnotator)5 ParserAnnotator (edu.stanford.nlp.pipeline.ParserAnnotator)5 Test (org.junit.Test)5 XmlTextAnnotationMaker (edu.illinois.cs.cogcomp.annotation.XmlTextAnnotationMaker)4 IntPair (edu.illinois.cs.cogcomp.core.datastructures.IntPair)4 XmlDocumentProcessor (edu.illinois.cs.cogcomp.core.utilities.XmlDocumentProcessor)4 ChunkerConfigurator (edu.illinois.cs.cogcomp.chunker.main.ChunkerConfigurator)3 XmlTextAnnotation (edu.illinois.cs.cogcomp.core.datastructures.textannotation.XmlTextAnnotation)3 SpanLabelView (edu.illinois.cs.cogcomp.core.datastructures.textannotation.SpanLabelView)2