Search in sources :

Example 1 with Bucketizer

use of org.apache.spark.ml.feature.Bucketizer in project jpmml-sparkml by jpmml.

the class BucketizerConverter method encodeFeatures.

@Override
public List<Feature> encodeFeatures(SparkMLEncoder encoder) {
    Bucketizer transformer = getTransformer();
    Feature feature = encoder.getOnlyFeature(transformer.getInputCol());
    ContinuousFeature continuousFeature = feature.toContinuousFeature();
    Discretize discretize = new Discretize(continuousFeature.getName());
    List<String> categories = new ArrayList<>();
    double[] splits = transformer.getSplits();
    for (int i = 0; i < (splits.length - 1); i++) {
        String category = String.valueOf(i);
        categories.add(category);
        Interval interval = new Interval((i < (splits.length - 2)) ? Interval.Closure.CLOSED_OPEN : Interval.Closure.CLOSED_CLOSED).setLeftMargin(formatMargin(splits[i])).setRightMargin(formatMargin(splits[i + 1]));
        DiscretizeBin discretizeBin = new DiscretizeBin(category, interval);
        discretize.addDiscretizeBins(discretizeBin);
    }
    DerivedField derivedField = encoder.createDerivedField(formatName(transformer), OpType.CATEGORICAL, DataType.INTEGER, discretize);
    return Collections.<Feature>singletonList(new CategoricalFeature(encoder, derivedField, categories));
}
Also used : ArrayList(java.util.ArrayList) Feature(org.jpmml.converter.Feature) ContinuousFeature(org.jpmml.converter.ContinuousFeature) CategoricalFeature(org.jpmml.converter.CategoricalFeature) CategoricalFeature(org.jpmml.converter.CategoricalFeature) ContinuousFeature(org.jpmml.converter.ContinuousFeature) Discretize(org.dmg.pmml.Discretize) Bucketizer(org.apache.spark.ml.feature.Bucketizer) DiscretizeBin(org.dmg.pmml.DiscretizeBin) DerivedField(org.dmg.pmml.DerivedField) Interval(org.dmg.pmml.Interval)

Aggregations

ArrayList (java.util.ArrayList)1 Bucketizer (org.apache.spark.ml.feature.Bucketizer)1 DerivedField (org.dmg.pmml.DerivedField)1 Discretize (org.dmg.pmml.Discretize)1 DiscretizeBin (org.dmg.pmml.DiscretizeBin)1 Interval (org.dmg.pmml.Interval)1 CategoricalFeature (org.jpmml.converter.CategoricalFeature)1 ContinuousFeature (org.jpmml.converter.ContinuousFeature)1 Feature (org.jpmml.converter.Feature)1