Search in sources :

Example 11 with WeightedOutcome

use of com.joliciel.talismane.utils.WeightedOutcome in project talismane by joliciel-informatique.

the class TokeniserPatternsFeature method checkInternal.

@Override
public FeatureResult<List<WeightedOutcome<String>>> checkInternal(TokenWrapper tokenWrapper, RuntimeEnvironment env) throws TalismaneException {
    Token token = tokenWrapper.getToken();
    List<WeightedOutcome<String>> resultList = new ArrayList<WeightedOutcome<String>>();
    for (TokenPatternMatch tokenMatch : token.getMatches()) {
        if (tokenMatch.getIndex() == tokenMatch.getPattern().getIndexesToTest().get(0)) {
            resultList.add(new WeightedOutcome<String>(tokenMatch.getPattern().getName(), 1.0));
        }
    }
    return this.generateResult(resultList);
}
Also used : ArrayList(java.util.ArrayList) WeightedOutcome(com.joliciel.talismane.utils.WeightedOutcome) Token(com.joliciel.talismane.tokeniser.Token) TokenPatternMatch(com.joliciel.talismane.tokeniser.patterns.TokenPatternMatch)

Example 12 with WeightedOutcome

use of com.joliciel.talismane.utils.WeightedOutcome in project talismane by joliciel-informatique.

the class AbstractLexicalAttributeFeature method checkInternal.

@Override
public FeatureResult<List<WeightedOutcome<String>>> checkInternal(T context, RuntimeEnvironment env) throws TalismaneException {
    PosTaggedTokenWrapper innerWrapper = this.getToken(context, env);
    if (innerWrapper == null)
        return null;
    PosTaggedToken posTaggedToken = innerWrapper.getPosTaggedToken();
    if (posTaggedToken == null)
        return null;
    FeatureResult<List<WeightedOutcome<String>>> featureResult = null;
    List<String> attributes = this.getAttributes(innerWrapper, env);
    Set<String> results = new HashSet<>();
    for (LexicalEntry lexicalEntry : posTaggedToken.getLexicalEntries()) {
        boolean haveAtLeastOne = false;
        Set<String> previousAttributeStrings = new HashSet<>();
        previousAttributeStrings.add("");
        for (String attribute : attributes) {
            List<String> values = lexicalEntry.getAttributeAsList(attribute);
            if (values.size() > 0) {
                Set<String> currentAttributeStrings = new HashSet<>();
                haveAtLeastOne = true;
                for (String value : values) {
                    for (String prevString : previousAttributeStrings) {
                        if (prevString.length() > 0)
                            currentAttributeStrings.add(prevString + "|" + value);
                        else
                            currentAttributeStrings.add(value);
                    }
                }
                previousAttributeStrings = currentAttributeStrings;
            }
        }
        if (haveAtLeastOne) {
            results.addAll(previousAttributeStrings);
        }
    }
    if (results.size() > 0) {
        List<WeightedOutcome<String>> outcomes = new ArrayList<>(results.size());
        for (String result : results) {
            outcomes.add(new WeightedOutcome<String>(result, 1.0));
        }
        featureResult = this.generateResult(outcomes);
    }
    return featureResult;
}
Also used : PosTaggedToken(com.joliciel.talismane.posTagger.PosTaggedToken) ArrayList(java.util.ArrayList) WeightedOutcome(com.joliciel.talismane.utils.WeightedOutcome) List(java.util.List) ArrayList(java.util.ArrayList) LexicalEntry(com.joliciel.talismane.lexicon.LexicalEntry) HashSet(java.util.HashSet)

Example 13 with WeightedOutcome

use of com.joliciel.talismane.utils.WeightedOutcome in project talismane by joliciel-informatique.

the class DependencyLabelSetFeature method checkInternal.

@Override
public FeatureResult<List<WeightedOutcome<String>>> checkInternal(ParseConfigurationWrapper context, RuntimeEnvironment env) {
    TransitionSystem transitionSystem = TalismaneSession.get(sessionId).getTransitionSystem();
    List<WeightedOutcome<String>> resultList = new ArrayList<WeightedOutcome<String>>();
    for (String label : transitionSystem.getDependencyLabelSet().getDependencyLabels()) {
        resultList.add(new WeightedOutcome<String>(label, 1.0));
    }
    return this.generateResult(resultList);
}
Also used : TransitionSystem(com.joliciel.talismane.parser.TransitionSystem) ArrayList(java.util.ArrayList) WeightedOutcome(com.joliciel.talismane.utils.WeightedOutcome)

Example 14 with WeightedOutcome

use of com.joliciel.talismane.utils.WeightedOutcome in project talismane by joliciel-informatique.

the class LinearSVMModelTrainer method getFeatureMatrix.

private Feature[][] getFeatureMatrix(ClassificationEventStream corpusEventStream, TObjectIntMap<String> featureIndexMap, TObjectIntMap<String> outcomeIndexMap, TIntList outcomeList, TIntIntMap featureCountMap, CountingInfo countingInfo) {
    try {
        int maxFeatureCount = 0;
        List<Feature[]> fullFeatureList = new ArrayList<Feature[]>();
        while (corpusEventStream.hasNext()) {
            ClassificationEvent corpusEvent = corpusEventStream.next();
            int outcomeIndex = outcomeIndexMap.get(corpusEvent.getClassification());
            if (outcomeIndex < 0) {
                outcomeIndex = countingInfo.currentOutcomeIndex++;
                outcomeIndexMap.put(corpusEvent.getClassification(), outcomeIndex);
            }
            outcomeList.add(outcomeIndex);
            Map<Integer, Feature> featureList = new TreeMap<Integer, Feature>();
            for (FeatureResult<?> featureResult : corpusEvent.getFeatureResults()) {
                if (featureResult.getOutcome() instanceof List) {
                    @SuppressWarnings("unchecked") FeatureResult<List<WeightedOutcome<String>>> stringCollectionResult = (FeatureResult<List<WeightedOutcome<String>>>) featureResult;
                    for (WeightedOutcome<String> stringOutcome : stringCollectionResult.getOutcome()) {
                        String featureName = featureResult.getTrainingName() + "|" + featureResult.getTrainingOutcome(stringOutcome.getOutcome());
                        double value = stringOutcome.getWeight();
                        this.addFeatureResult(featureName, value, featureList, featureIndexMap, featureCountMap, countingInfo);
                    }
                } else {
                    double value = 1.0;
                    if (featureResult.getOutcome() instanceof Double) {
                        @SuppressWarnings("unchecked") FeatureResult<Double> doubleResult = (FeatureResult<Double>) featureResult;
                        value = doubleResult.getOutcome().doubleValue();
                    }
                    this.addFeatureResult(featureResult.getTrainingName(), value, featureList, featureIndexMap, featureCountMap, countingInfo);
                }
            }
            if (featureList.size() > maxFeatureCount)
                maxFeatureCount = featureList.size();
            // convert to array immediately, to avoid double storage
            int j = 0;
            Feature[] featureArray = new Feature[featureList.size()];
            for (Feature feature : featureList.values()) {
                featureArray[j] = feature;
                j++;
            }
            fullFeatureList.add(featureArray);
            countingInfo.numEvents++;
            if (countingInfo.numEvents % 1000 == 0) {
                LOG.debug("Processed " + countingInfo.numEvents + " events.");
            }
        }
        Feature[][] featureMatrix = new Feature[countingInfo.numEvents][];
        int i = 0;
        for (Feature[] featureArray : fullFeatureList) {
            featureMatrix[i] = featureArray;
            i++;
        }
        fullFeatureList = null;
        LOG.debug("Event count: " + countingInfo.numEvents);
        LOG.debug("Feature count: " + featureIndexMap.size());
        return featureMatrix;
    } catch (TalismaneException e) {
        LOG.error(e.getMessage(), e);
        throw new RuntimeException(e);
    } catch (IOException e) {
        LOG.error(e.getMessage(), e);
        throw new RuntimeException(e);
    }
}
Also used : TalismaneException(com.joliciel.talismane.TalismaneException) TIntArrayList(gnu.trove.list.array.TIntArrayList) ArrayList(java.util.ArrayList) WeightedOutcome(com.joliciel.talismane.utils.WeightedOutcome) IOException(java.io.IOException) TreeMap(java.util.TreeMap) Feature(de.bwaldvogel.liblinear.Feature) TIntArrayList(gnu.trove.list.array.TIntArrayList) ArrayList(java.util.ArrayList) TIntList(gnu.trove.list.TIntList) List(java.util.List) ClassificationEvent(com.joliciel.talismane.machineLearning.ClassificationEvent) FeatureResult(com.joliciel.talismane.machineLearning.features.FeatureResult)

Example 15 with WeightedOutcome

use of com.joliciel.talismane.utils.WeightedOutcome in project talismane by joliciel-informatique.

the class PerceptronDetailedAnalysisWriter method onAnalyse.

/*
   * (non-Javadoc)
   * 
   * @see com.joliciel.talismane.maxent.MaxentObserver#onAnalyse(java.util.List,
   * java.util.Collection)
   */
@Override
public void onAnalyse(Object event, List<FeatureResult<?>> featureResults, Collection<Decision> decisions) throws IOException {
    Map<String, Double> outcomeTotals = new TreeMap<String, Double>();
    for (String outcome : modelParams.getOutcomes()) outcomeTotals.put(outcome, 0.0);
    writer.append("####### Event: " + event.toString() + "\n");
    writer.append("### Feature results:\n");
    for (FeatureResult<?> featureResult : featureResults) {
        if (featureResult.getOutcome() instanceof List) {
            @SuppressWarnings("unchecked") FeatureResult<List<WeightedOutcome<String>>> stringCollectionResult = (FeatureResult<List<WeightedOutcome<String>>>) featureResult;
            for (WeightedOutcome<String> stringOutcome : stringCollectionResult.getOutcome()) {
                String featureName = featureResult.getTrainingName() + "|" + featureResult.getTrainingOutcome(stringOutcome.getOutcome());
                String featureOutcome = stringOutcome.getOutcome();
                double value = stringOutcome.getWeight();
                this.writeFeatureResult(featureName, featureOutcome, value, outcomeTotals);
            }
        } else {
            double value = 1.0;
            if (featureResult.getFeature() instanceof DoubleFeature) {
                value = (Double) featureResult.getOutcome();
            }
            this.writeFeatureResult(featureResult.getTrainingName(), featureResult.getOutcome().toString(), value, outcomeTotals);
        }
    }
    List<Integer> featureIndexList = new ArrayList<Integer>();
    List<Double> featureValueList = new ArrayList<Double>();
    modelParams.prepareData(featureResults, featureIndexList, featureValueList);
    double[] results = decisionMaker.predict(featureIndexList, featureValueList);
    writer.append("### Outcome totals:\n");
    writer.append(String.format("%1$-30s", "outcome") + String.format("%1$#15s", "total") + String.format("%1$#15s", "normalised") + "\n");
    int j = 0;
    for (String outcome : modelParams.getOutcomes()) {
        double total = outcomeTotals.get(outcome);
        double normalised = results[j++];
        writer.append(String.format("%1$-30s", outcome) + String.format("%1$#15s", decFormat.format(total)) + String.format("%1$#15s", decFormat.format(normalised)) + "\n");
    }
    writer.append("\n");
    Map<String, Double> outcomeWeights = new TreeMap<String, Double>();
    for (Decision decision : decisions) {
        outcomeWeights.put(decision.getOutcome(), decision.getProbability());
    }
    writer.append("### Outcome list:\n");
    Set<WeightedOutcome<String>> weightedOutcomes = new TreeSet<WeightedOutcome<String>>();
    for (String outcome : modelParams.getOutcomes()) {
        Double weightObj = outcomeWeights.get(outcome);
        double weight = (weightObj == null ? 0.0 : weightObj.doubleValue());
        WeightedOutcome<String> weightedOutcome = new WeightedOutcome<String>(outcome, weight);
        weightedOutcomes.add(weightedOutcome);
    }
    for (WeightedOutcome<String> weightedOutcome : weightedOutcomes) {
        writer.append(String.format("%1$-30s", weightedOutcome.getOutcome()) + String.format("%1$#15s", decFormat.format(weightedOutcome.getWeight())) + "\n");
    }
    writer.append("\n");
    writer.flush();
}
Also used : ArrayList(java.util.ArrayList) WeightedOutcome(com.joliciel.talismane.utils.WeightedOutcome) TreeMap(java.util.TreeMap) DoubleFeature(com.joliciel.talismane.machineLearning.features.DoubleFeature) Decision(com.joliciel.talismane.machineLearning.Decision) TreeSet(java.util.TreeSet) ArrayList(java.util.ArrayList) List(java.util.List) FeatureResult(com.joliciel.talismane.machineLearning.features.FeatureResult)

Aggregations

WeightedOutcome (com.joliciel.talismane.utils.WeightedOutcome)18 ArrayList (java.util.ArrayList)15 List (java.util.List)11 Decision (com.joliciel.talismane.machineLearning.Decision)7 Token (com.joliciel.talismane.tokeniser.Token)6 FeatureResult (com.joliciel.talismane.machineLearning.features.FeatureResult)5 RuntimeEnvironment (com.joliciel.talismane.machineLearning.features.RuntimeEnvironment)4 PosTaggedToken (com.joliciel.talismane.posTagger.PosTaggedToken)4 TreeMap (java.util.TreeMap)4 TreeSet (java.util.TreeSet)4 TalismaneTest (com.joliciel.talismane.TalismaneTest)3 StringLiteralFeature (com.joliciel.talismane.machineLearning.features.StringLiteralFeature)3 PosTag (com.joliciel.talismane.posTagger.PosTag)3 PosTagSequence (com.joliciel.talismane.posTagger.PosTagSequence)3 PosTaggerContext (com.joliciel.talismane.posTagger.PosTaggerContext)3 PosTaggerContextImpl (com.joliciel.talismane.posTagger.PosTaggerContextImpl)3 Sentence (com.joliciel.talismane.rawText.Sentence)3 TokenSequence (com.joliciel.talismane.tokeniser.TokenSequence)3 Config (com.typesafe.config.Config)3 Test (org.junit.Test)3