Search in sources :

Example 11 with DataPurifier

use of ml.shifu.shifu.core.DataPurifier in project shifu by ShifuML.

the class ShifuTestProcessor method runFilterTest.

private int runFilterTest(ModelConfig modelConfig) throws IOException {
    ModelSourceDataConf dataset = modelConfig.getDataSet();
    if (StringUtils.isBlank(dataset.getFilterExpressions())) {
        LOG.warn("No filter expression set in train dataset. Skip it!");
        return 0;
    }
    LOG.info("Start to test the filter against the training dataset.");
    DataPurifier dataPurifier = new DataPurifier(modelConfig, false);
    int status = doFilterTest(dataPurifier, dataset.getDataPath(), dataset.getSource());
    if (status > 0) {
        return status;
    }
    if (StringUtils.isNotBlank(dataset.getValidationFilterExpressions())) {
        LOG.info("Start to test the filter against the validation dataset.");
        dataPurifier = new DataPurifier(modelConfig, true);
        status = doFilterTest(dataPurifier, dataset.getValidationDataPath(), dataset.getSource());
    }
    return status;
}
Also used : DataPurifier(ml.shifu.shifu.core.DataPurifier) ModelSourceDataConf(ml.shifu.shifu.container.obj.ModelSourceDataConf)

Aggregations

DataPurifier (ml.shifu.shifu.core.DataPurifier)11 IntWritable (org.apache.hadoop.io.IntWritable)5 ColumnConfig (ml.shifu.shifu.container.obj.ColumnConfig)4 HashMap (java.util.HashMap)3 RawSourceData (ml.shifu.shifu.container.obj.RawSourceData)2 IOException (java.io.IOException)1 InvocationTargetException (java.lang.reflect.InvocationTargetException)1 ArrayList (java.util.ArrayList)1 List (java.util.List)1 Map (java.util.Map)1 Properties (java.util.Properties)1 ModelConfig (ml.shifu.shifu.container.obj.ModelConfig)1 ModelSourceDataConf (ml.shifu.shifu.container.obj.ModelSourceDataConf)1 SourceType (ml.shifu.shifu.container.obj.RawSourceData.SourceType)1 ModelRunner (ml.shifu.shifu.core.ModelRunner)1 CountAndFrequentItems (ml.shifu.shifu.core.autotype.AutoTypeDistinctCountMapper.CountAndFrequentItems)1 TrainingDataSet (ml.shifu.shifu.core.dvarsel.dataset.TrainingDataSet)1 ShifuException (ml.shifu.shifu.exception.ShifuException)1 DoubleWritable (org.apache.hadoop.io.DoubleWritable)1 NullWritable (org.apache.hadoop.io.NullWritable)1