Search in sources :

Example 1 with Tokenizer

use of org.apache.spark.ml.feature.Tokenizer in project jpmml-sparkml by jpmml.

the class TokenizerConverter method encodeFeatures.

@Override
public List<Feature> encodeFeatures(SparkMLEncoder encoder) {
    Tokenizer transformer = getTransformer();
    Feature feature = encoder.getOnlyFeature(transformer.getInputCol());
    Apply apply = PMMLUtil.createApply("lowercase", feature.ref());
    DerivedField derivedField = encoder.createDerivedField(FeatureUtil.createName("lowercase", feature), OpType.CATEGORICAL, DataType.STRING, apply);
    return Collections.<Feature>singletonList(new DocumentFeature(encoder, derivedField, "\\s+"));
}
Also used : Apply(org.dmg.pmml.Apply) DocumentFeature(org.jpmml.sparkml.DocumentFeature) Tokenizer(org.apache.spark.ml.feature.Tokenizer) Feature(org.jpmml.converter.Feature) DocumentFeature(org.jpmml.sparkml.DocumentFeature) DerivedField(org.dmg.pmml.DerivedField)

Aggregations

Tokenizer (org.apache.spark.ml.feature.Tokenizer)1 Apply (org.dmg.pmml.Apply)1 DerivedField (org.dmg.pmml.DerivedField)1 Feature (org.jpmml.converter.Feature)1 DocumentFeature (org.jpmml.sparkml.DocumentFeature)1