Search in sources :

Example 1 with ClassificationEventStream

use of com.joliciel.talismane.machineLearning.ClassificationEventStream in project talismane by joliciel-informatique.

the class MaxentModelTrainer method trainModel.

@Override
public ClassificationModel trainModel(ClassificationEventStream corpusEventStream, Map<String, List<String>> descriptors) throws IOException {
    MaxentModel maxentModel = null;
    EventStream eventStream = new OpenNLPEventStream(corpusEventStream);
    DataIndexer dataIndexer = new TwoPassRealValueDataIndexer(eventStream, cutoff);
    GISTrainer trainer = new GISTrainer(true);
    if (this.getSmoothing() > 0) {
        trainer.setSmoothing(true);
        trainer.setSmoothingObservation(this.getSmoothing());
    } else if (this.getSigma() > 0) {
        trainer.setGaussianSigma(this.getSigma());
    }
    maxentModel = trainer.trainModel(iterations, dataIndexer, cutoff);
    MaximumEntropyModel model = new MaximumEntropyModel(maxentModel, config, descriptors);
    model.addModelAttribute("cutoff", this.getCutoff());
    model.addModelAttribute("iterations", this.getIterations());
    model.addModelAttribute("sigma", this.getSigma());
    model.addModelAttribute("smoothing", this.getSmoothing());
    model.getModelAttributes().putAll(corpusEventStream.getAttributes());
    return model;
}
Also used : TwoPassRealValueDataIndexer(com.joliciel.talismane.machineLearning.maxent.custom.TwoPassRealValueDataIndexer) DataIndexer(opennlp.model.DataIndexer) ClassificationEventStream(com.joliciel.talismane.machineLearning.ClassificationEventStream) EventStream(opennlp.model.EventStream) MaxentModel(opennlp.model.MaxentModel) GISTrainer(com.joliciel.talismane.machineLearning.maxent.custom.GISTrainer) TwoPassRealValueDataIndexer(com.joliciel.talismane.machineLearning.maxent.custom.TwoPassRealValueDataIndexer)

Example 2 with ClassificationEventStream

use of com.joliciel.talismane.machineLearning.ClassificationEventStream in project jochre by urieli.

the class Jochre method doCommandTrain.

/**
 * Train a letter guessing model.
 *
 * @param featureDescriptors
 *          the feature descriptors for training
 * @param criteria
 *          criteria for selecting images to include when training
 * @param reconstructLetters
 *          whether or not complete letters should be reconstructed for
 *          training, from merged/split letters
 */
public void doCommandTrain(List<String> featureDescriptors, CorpusSelectionCriteria criteria, boolean reconstructLetters) {
    if (jochreSession.getLetterModelPath() == null)
        throw new RuntimeException("Missing argument: letterModel");
    if (featureDescriptors == null)
        throw new JochreException("features is required");
    LetterFeatureParser letterFeatureParser = new LetterFeatureParser();
    Set<LetterFeature<?>> features = letterFeatureParser.getLetterFeatureSet(featureDescriptors);
    BoundaryDetector boundaryDetector = null;
    if (reconstructLetters) {
        ShapeSplitter splitter = new TrainingCorpusShapeSplitter(jochreSession);
        ShapeMerger merger = new TrainingCorpusShapeMerger();
        boundaryDetector = new LetterByLetterBoundaryDetector(splitter, merger, jochreSession);
    } else {
        boundaryDetector = new OriginalBoundaryDetector();
    }
    LetterValidator letterValidator = new ComponentCharacterValidator(jochreSession);
    ClassificationEventStream corpusEventStream = new JochreLetterEventStream(features, boundaryDetector, letterValidator, criteria, jochreSession);
    File letterModelFile = new File(jochreSession.getLetterModelPath());
    letterModelFile.getParentFile().mkdirs();
    ModelTrainerFactory modelTrainerFactory = new ModelTrainerFactory();
    ClassificationModelTrainer trainer = modelTrainerFactory.constructTrainer(jochreSession.getConfig());
    ClassificationModel letterModel = trainer.trainModel(corpusEventStream, featureDescriptors);
    letterModel.persist(letterModelFile);
}
Also used : LetterByLetterBoundaryDetector(com.joliciel.jochre.boundaries.LetterByLetterBoundaryDetector) OriginalBoundaryDetector(com.joliciel.jochre.boundaries.OriginalBoundaryDetector) BoundaryDetector(com.joliciel.jochre.boundaries.BoundaryDetector) LetterByLetterBoundaryDetector(com.joliciel.jochre.boundaries.LetterByLetterBoundaryDetector) DeterministicBoundaryDetector(com.joliciel.jochre.boundaries.DeterministicBoundaryDetector) TrainingCorpusShapeMerger(com.joliciel.jochre.boundaries.TrainingCorpusShapeMerger) LetterValidator(com.joliciel.jochre.letterGuesser.LetterValidator) ClassificationEventStream(com.joliciel.talismane.machineLearning.ClassificationEventStream) OriginalBoundaryDetector(com.joliciel.jochre.boundaries.OriginalBoundaryDetector) JochreLetterEventStream(com.joliciel.jochre.letterGuesser.JochreLetterEventStream) ModelTrainerFactory(com.joliciel.talismane.machineLearning.ModelTrainerFactory) JochreException(com.joliciel.jochre.utils.JochreException) ClassificationModelTrainer(com.joliciel.talismane.machineLearning.ClassificationModelTrainer) LetterFeature(com.joliciel.jochre.letterGuesser.features.LetterFeature) TrainingCorpusShapeMerger(com.joliciel.jochre.boundaries.TrainingCorpusShapeMerger) ShapeMerger(com.joliciel.jochre.boundaries.ShapeMerger) LetterFeatureParser(com.joliciel.jochre.letterGuesser.features.LetterFeatureParser) TrainingCorpusShapeSplitter(com.joliciel.jochre.boundaries.TrainingCorpusShapeSplitter) RecursiveShapeSplitter(com.joliciel.jochre.boundaries.RecursiveShapeSplitter) TrainingCorpusShapeSplitter(com.joliciel.jochre.boundaries.TrainingCorpusShapeSplitter) ShapeSplitter(com.joliciel.jochre.boundaries.ShapeSplitter) ComponentCharacterValidator(com.joliciel.jochre.letterGuesser.ComponentCharacterValidator) File(java.io.File) ClassificationModel(com.joliciel.talismane.machineLearning.ClassificationModel)

Example 3 with ClassificationEventStream

use of com.joliciel.talismane.machineLearning.ClassificationEventStream in project jochre by urieli.

the class Jochre method doCommandTrainMerge.

/**
 * Train the letter merging model.
 *
 * @param featureDescriptors
 *          feature descriptors for training
 * @param multiplier
 *          if &gt; 0, will be used to equalize the outcomes
 * @param criteria
 *          the criteria used to select the training corpus
 */
public void doCommandTrainMerge(List<String> featureDescriptors, int multiplier, CorpusSelectionCriteria criteria) {
    if (jochreSession.getMergeModelPath() == null)
        throw new RuntimeException("Missing argument: mergeModel");
    if (featureDescriptors == null)
        throw new JochreException("features is required");
    File mergeModelFile = new File(jochreSession.getMergeModelPath());
    mergeModelFile.getParentFile().mkdirs();
    MergeFeatureParser mergeFeatureParser = new MergeFeatureParser();
    Set<MergeFeature<?>> mergeFeatures = mergeFeatureParser.getMergeFeatureSet(featureDescriptors);
    ClassificationEventStream corpusEventStream = new JochreMergeEventStream(criteria, mergeFeatures, jochreSession);
    if (multiplier > 0) {
        corpusEventStream = new OutcomeEqualiserEventStream(corpusEventStream, multiplier);
    }
    ModelTrainerFactory modelTrainerFactory = new ModelTrainerFactory();
    ClassificationModelTrainer trainer = modelTrainerFactory.constructTrainer(jochreSession.getConfig());
    ClassificationModel mergeModel = trainer.trainModel(corpusEventStream, featureDescriptors);
    mergeModel.persist(mergeModelFile);
}
Also used : MergeFeatureParser(com.joliciel.jochre.boundaries.features.MergeFeatureParser) ClassificationEventStream(com.joliciel.talismane.machineLearning.ClassificationEventStream) ModelTrainerFactory(com.joliciel.talismane.machineLearning.ModelTrainerFactory) JochreException(com.joliciel.jochre.utils.JochreException) ClassificationModelTrainer(com.joliciel.talismane.machineLearning.ClassificationModelTrainer) MergeFeature(com.joliciel.jochre.boundaries.features.MergeFeature) File(java.io.File) JochreMergeEventStream(com.joliciel.jochre.boundaries.JochreMergeEventStream) OutcomeEqualiserEventStream(com.joliciel.talismane.machineLearning.OutcomeEqualiserEventStream) ClassificationModel(com.joliciel.talismane.machineLearning.ClassificationModel)

Example 4 with ClassificationEventStream

use of com.joliciel.talismane.machineLearning.ClassificationEventStream in project jochre by urieli.

the class Jochre method doCommandTrainSplits.

/**
 * Train the letter splitting model.
 *
 * @param featureDescriptors
 *          the feature descriptors for training this model
 * @param criteria
 *          the criteria used to select the training corpus
 */
public void doCommandTrainSplits(List<String> featureDescriptors, CorpusSelectionCriteria criteria) {
    if (jochreSession.getSplitModelPath() == null)
        throw new RuntimeException("Missing argument: splitModel");
    if (featureDescriptors == null)
        throw new JochreException("features is required");
    File splitModelFile = new File(jochreSession.getSplitModelPath());
    splitModelFile.getParentFile().mkdirs();
    SplitFeatureParser splitFeatureParser = new SplitFeatureParser();
    Set<SplitFeature<?>> splitFeatures = splitFeatureParser.getSplitFeatureSet(featureDescriptors);
    ClassificationEventStream corpusEventStream = new JochreSplitEventStream(criteria, splitFeatures, jochreSession);
    ModelTrainerFactory modelTrainerFactory = new ModelTrainerFactory();
    ClassificationModelTrainer trainer = modelTrainerFactory.constructTrainer(jochreSession.getConfig());
    ClassificationModel splitModel = trainer.trainModel(corpusEventStream, featureDescriptors);
    splitModel.persist(splitModelFile);
}
Also used : ClassificationEventStream(com.joliciel.talismane.machineLearning.ClassificationEventStream) ModelTrainerFactory(com.joliciel.talismane.machineLearning.ModelTrainerFactory) JochreException(com.joliciel.jochre.utils.JochreException) ClassificationModelTrainer(com.joliciel.talismane.machineLearning.ClassificationModelTrainer) SplitFeatureParser(com.joliciel.jochre.boundaries.features.SplitFeatureParser) SplitFeature(com.joliciel.jochre.boundaries.features.SplitFeature) JochreSplitEventStream(com.joliciel.jochre.boundaries.JochreSplitEventStream) File(java.io.File) ClassificationModel(com.joliciel.talismane.machineLearning.ClassificationModel)

Aggregations

ClassificationEventStream (com.joliciel.talismane.machineLearning.ClassificationEventStream)4 JochreException (com.joliciel.jochre.utils.JochreException)3 ClassificationModel (com.joliciel.talismane.machineLearning.ClassificationModel)3 ClassificationModelTrainer (com.joliciel.talismane.machineLearning.ClassificationModelTrainer)3 ModelTrainerFactory (com.joliciel.talismane.machineLearning.ModelTrainerFactory)3 File (java.io.File)3 BoundaryDetector (com.joliciel.jochre.boundaries.BoundaryDetector)1 DeterministicBoundaryDetector (com.joliciel.jochre.boundaries.DeterministicBoundaryDetector)1 JochreMergeEventStream (com.joliciel.jochre.boundaries.JochreMergeEventStream)1 JochreSplitEventStream (com.joliciel.jochre.boundaries.JochreSplitEventStream)1 LetterByLetterBoundaryDetector (com.joliciel.jochre.boundaries.LetterByLetterBoundaryDetector)1 OriginalBoundaryDetector (com.joliciel.jochre.boundaries.OriginalBoundaryDetector)1 RecursiveShapeSplitter (com.joliciel.jochre.boundaries.RecursiveShapeSplitter)1 ShapeMerger (com.joliciel.jochre.boundaries.ShapeMerger)1 ShapeSplitter (com.joliciel.jochre.boundaries.ShapeSplitter)1 TrainingCorpusShapeMerger (com.joliciel.jochre.boundaries.TrainingCorpusShapeMerger)1 TrainingCorpusShapeSplitter (com.joliciel.jochre.boundaries.TrainingCorpusShapeSplitter)1 MergeFeature (com.joliciel.jochre.boundaries.features.MergeFeature)1 MergeFeatureParser (com.joliciel.jochre.boundaries.features.MergeFeatureParser)1 SplitFeature (com.joliciel.jochre.boundaries.features.SplitFeature)1