Search in sources :

Example 6 with VectorAssembler

use of com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler in project Alink by alibaba.

the class PipelineSaveAndLoadTest method testLocalPredictor.

@Test
public void testLocalPredictor() throws Exception {
    VectorAssembler va = new VectorAssembler().setSelectedCols(Iris.getFeatureColNames()).setOutputCol("features");
    MultilayerPerceptronClassifier classifier = new MultilayerPerceptronClassifier().setVectorCol("features").setLabelCol(Iris.getLabelColName()).setLayers(new int[] { 4, 5, 3 }).setMaxIter(30).setPredictionCol("pred_label").setPredictionDetailCol("pred_detail").setReservedCols(Iris.getLabelColName());
    Pipeline pipeline = new Pipeline().add(va).add(classifier);
    PipelineModel model = pipeline.fit(data);
    FilePath filePath = new FilePath(folder.newFile().getAbsolutePath());
    model.save(filePath, true);
    BatchOperator.execute();
    LocalPredictor localPredictor = new LocalPredictor(filePath, new TableSchema(ArrayUtils.add(data.getColNames(), "features"), ArrayUtils.add(data.getColTypes(), VectorTypes.DENSE_VECTOR)));
    Row result = localPredictor.map(Row.of(5.1, 3.5, 1.4, 0.2, "Iris-setosanew", new DenseVector(new double[] { 5.1, 3.5, 1.4, 0.2 })));
    System.out.println(JsonConverter.toJson(result));
}
Also used : FilePath(com.alibaba.alink.common.io.filesystem.FilePath) MultilayerPerceptronClassifier(com.alibaba.alink.pipeline.classification.MultilayerPerceptronClassifier) TableSchema(org.apache.flink.table.api.TableSchema) VectorAssembler(com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler) Row(org.apache.flink.types.Row) DenseVector(com.alibaba.alink.common.linalg.DenseVector) Test(org.junit.Test)

Example 7 with VectorAssembler

use of com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler in project Alink by alibaba.

the class LogisticRegressionMixVecTest method batchMixVecTest12.

@Test
public void batchMixVecTest12() {
    BatchOperator<?> trainData = (BatchOperator<?>) getData();
    Pipeline pipeline = new Pipeline().add(new VectorAssembler().setSelectedCols(new String[] { "svec", "vec", "f0", "f1", "f2", "f3" }).setOutputCol("allvec")).add(new LogisticRegression().setVectorCol("allvec").setWithIntercept(true).setReservedCols(new String[] { "labels", "allvec" }).setLabelCol("labels").setPredictionCol("pred"));
    PipelineModel model = pipeline.fit(trainData);
    model.transform(trainData).collect();
}
Also used : VectorAssembler(com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler) LogisticRegression(com.alibaba.alink.pipeline.classification.LogisticRegression) BatchOperator(com.alibaba.alink.operator.batch.BatchOperator) Pipeline(com.alibaba.alink.pipeline.Pipeline) PipelineModel(com.alibaba.alink.pipeline.PipelineModel) Test(org.junit.Test)

Example 8 with VectorAssembler

use of com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler in project Alink by alibaba.

the class LogisticRegressionMixVecTest method batchMixVecTest13.

@Test
public void batchMixVecTest13() {
    BatchOperator<?> trainData = (BatchOperator<?>) getData();
    Pipeline pipeline = new Pipeline().add(new VectorAssembler().setSelectedCols(new String[] { "svec", "vec", "f0", "f1", "f2", "f3" }).setOutputCol("allvec")).add(new LogisticRegression().setVectorCol("allvec").setWithIntercept(false).setStandardization(false).setLabelCol("labels").setReservedCols(new String[] { "labels" }).setPredictionCol("pred"));
    PipelineModel model = pipeline.fit(trainData);
    model.transform(trainData).collect();
}
Also used : VectorAssembler(com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler) LogisticRegression(com.alibaba.alink.pipeline.classification.LogisticRegression) BatchOperator(com.alibaba.alink.operator.batch.BatchOperator) Pipeline(com.alibaba.alink.pipeline.Pipeline) PipelineModel(com.alibaba.alink.pipeline.PipelineModel) Test(org.junit.Test)

Example 9 with VectorAssembler

use of com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler in project Alink by alibaba.

the class LogisticRegressionMixVecTest method batchMixVecTest11.

@Test
public void batchMixVecTest11() {
    BatchOperator<?> trainData = (BatchOperator<?>) getData();
    Pipeline pipeline = new Pipeline().add(new VectorAssembler().setSelectedCols(new String[] { "svec", "vec", "f0", "f1", "f2", "f3" }).setOutputCol("allvec")).add(new LogisticRegression().setVectorCol("allvec").setWithIntercept(true).setReservedCols(new String[] { "labels" }).setStandardization(false).setLabelCol("labels").setPredictionCol("pred"));
    PipelineModel model = pipeline.fit(trainData);
    model.transform(trainData).collect();
}
Also used : VectorAssembler(com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler) LogisticRegression(com.alibaba.alink.pipeline.classification.LogisticRegression) BatchOperator(com.alibaba.alink.operator.batch.BatchOperator) Pipeline(com.alibaba.alink.pipeline.Pipeline) PipelineModel(com.alibaba.alink.pipeline.PipelineModel) Test(org.junit.Test)

Example 10 with VectorAssembler

use of com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler in project Alink by alibaba.

the class FmRecommTrainBatchOp method createFeatureVectors.

private static BatchOperator<?> createFeatureVectors(BatchOperator<?> featureTable, String idCol, String[] featureCols, String[] categoricalCols) {
    TableUtil.assertSelectedColExist(featureCols, categoricalCols);
    String[] numericalCols = subtract(featureCols, categoricalCols);
    final Long envId = featureTable.getMLEnvironmentId();
    if (categoricalCols.length > 0) {
        OneHotEncoder onehot = new OneHotEncoder().setMLEnvironmentId(envId).setSelectedCols(categoricalCols).setOutputCols("__fm_features__").setDropLast(false);
        featureTable = onehot.fit(featureTable).transform(featureTable);
        numericalCols = (String[]) ArrayUtils.add(numericalCols, "__fm_features__");
    }
    VectorAssembler va = new VectorAssembler().setMLEnvironmentId(envId).setSelectedCols(numericalCols).setOutputCol("__fm_features__").setReservedCols(idCol);
    featureTable = va.transform(featureTable);
    featureTable = featureTable.udf("__fm_features__", "__fm_features__", new ConvertVec());
    return featureTable;
}
Also used : OneHotEncoder(com.alibaba.alink.pipeline.feature.OneHotEncoder) VectorAssembler(com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler)

Aggregations

VectorAssembler (com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler)26 Test (org.junit.Test)16 Pipeline (com.alibaba.alink.pipeline.Pipeline)11 MultilayerPerceptronClassifier (com.alibaba.alink.pipeline.classification.MultilayerPerceptronClassifier)9 LogisticRegression (com.alibaba.alink.pipeline.classification.LogisticRegression)8 BatchOperator (com.alibaba.alink.operator.batch.BatchOperator)7 PipelineModel (com.alibaba.alink.pipeline.PipelineModel)7 Row (org.apache.flink.types.Row)7 FilePath (com.alibaba.alink.common.io.filesystem.FilePath)4 EvalBinaryClassBatchOp (com.alibaba.alink.operator.batch.evaluation.EvalBinaryClassBatchOp)4 AkSourceBatchOp (com.alibaba.alink.operator.batch.source.AkSourceBatchOp)4 OneHotEncoder (com.alibaba.alink.pipeline.feature.OneHotEncoder)3 TableSchema (org.apache.flink.table.api.TableSchema)3 DenseVector (com.alibaba.alink.common.linalg.DenseVector)2 MemSourceBatchOp (com.alibaba.alink.operator.batch.source.MemSourceBatchOp)2 MemSourceStreamOp (com.alibaba.alink.operator.stream.source.MemSourceStreamOp)2 Lda (com.alibaba.alink.pipeline.clustering.Lda)2 Binarizer (com.alibaba.alink.pipeline.feature.Binarizer)2 QuantileDiscretizer (com.alibaba.alink.pipeline.feature.QuantileDiscretizer)2 DocCountVectorizer (com.alibaba.alink.pipeline.nlp.DocCountVectorizer)2