Search in sources :

Example 1 with KMeansTrainBatchOp

use of com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp in project Alink by alibaba.

the class Chap18 method c_2.

static void c_2() throws Exception {
    AkSourceBatchOp batch_source = new AkSourceBatchOp().setFilePath(DATA_DIR + SPARSE_TRAIN_FILE);
    AkSourceStreamOp stream_source = new AkSourceStreamOp().setFilePath(DATA_DIR + SPARSE_TRAIN_FILE);
    if (!new File(DATA_DIR + INIT_MODEL_FILE).exists()) {
        batch_source.sampleWithSize(100).link(new KMeansTrainBatchOp().setVectorCol(VECTOR_COL_NAME).setK(10)).link(new AkSinkBatchOp().setFilePath(DATA_DIR + INIT_MODEL_FILE));
        BatchOperator.execute();
    }
    AkSourceBatchOp init_model = new AkSourceBatchOp().setFilePath(DATA_DIR + INIT_MODEL_FILE);
    new KMeansPredictBatchOp().setPredictionCol(PREDICTION_COL_NAME).linkFrom(init_model, batch_source).link(new EvalClusterBatchOp().setVectorCol(VECTOR_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).setLabelCol(LABEL_COL_NAME).lazyPrintMetrics("Batch Prediction"));
    BatchOperator.execute();
    stream_source.link(new KMeansPredictStreamOp(init_model).setPredictionCol(PREDICTION_COL_NAME)).link(new AkSinkStreamOp().setFilePath(DATA_DIR + TEMP_STREAM_FILE).setOverwriteSink(true));
    StreamOperator.execute();
    new AkSourceBatchOp().setFilePath(DATA_DIR + TEMP_STREAM_FILE).link(new EvalClusterBatchOp().setVectorCol(VECTOR_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).setLabelCol(LABEL_COL_NAME).lazyPrintMetrics("Stream Prediction"));
    BatchOperator.execute();
}
Also used : KMeansPredictBatchOp(com.alibaba.alink.operator.batch.clustering.KMeansPredictBatchOp) AkSinkStreamOp(com.alibaba.alink.operator.stream.sink.AkSinkStreamOp) AkSourceBatchOp(com.alibaba.alink.operator.batch.source.AkSourceBatchOp) AkSourceStreamOp(com.alibaba.alink.operator.stream.source.AkSourceStreamOp) KMeansPredictStreamOp(com.alibaba.alink.operator.stream.clustering.KMeansPredictStreamOp) AkSinkBatchOp(com.alibaba.alink.operator.batch.sink.AkSinkBatchOp) File(java.io.File) KMeansTrainBatchOp(com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp) EvalClusterBatchOp(com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp)

Example 2 with KMeansTrainBatchOp

use of com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp in project Alink by alibaba.

the class KMeansTest method testInitializer.

@Test
public void testInitializer() {
    KMeansModel model = new KMeansModel();
    assertEquals(model.getParams().size(), 0);
    KMeans kMeans = new KMeans(new Params());
    assertEquals(kMeans.getParams().size(), 0);
    KMeansTrainBatchOp op = new KMeansTrainBatchOp();
    assertEquals(op.getParams().size(), 0);
    KMeansPredictBatchOp predict = new KMeansPredictBatchOp(new Params());
    assertEquals(predict.getParams().size(), 0);
    predict = new KMeansPredictBatchOp();
    assertEquals(predict.getParams().size(), 0);
    KMeansPredictStreamOp predictStream = new KMeansPredictStreamOp(op, new Params());
    assertEquals(predictStream.getParams().size(), 0);
    predictStream = new KMeansPredictStreamOp(predict);
    assertEquals(predictStream.getParams().size(), 0);
}
Also used : KMeansPredictBatchOp(com.alibaba.alink.operator.batch.clustering.KMeansPredictBatchOp) Params(org.apache.flink.ml.api.misc.param.Params) KMeansPredictStreamOp(com.alibaba.alink.operator.stream.clustering.KMeansPredictStreamOp) KMeansTrainBatchOp(com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp) Test(org.junit.Test)

Example 3 with KMeansTrainBatchOp

use of com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp in project Alink by alibaba.

the class StreamingKMeansStreamOpTest method testStreamingKmeans.

@Test
public void testStreamingKmeans() throws Exception {
    BatchOperator<?> model = new KMeansTrainBatchOp().setVectorCol("vec").setK(2).linkFrom(trainDataBatchOp);
    StreamingKMeansStreamOp streamingKMeansStreamOp = new StreamingKMeansStreamOp(model).setPredictionCol("pred").setTimeInterval(1L).setHalfLife(1).setReservedCols("vec").linkFrom(predictDataStreamOp, predictDataStreamOp);
    CollectSinkStreamOp predSinkData = streamingKMeansStreamOp.link(new CollectSinkStreamOp());
    StreamOperator.execute();
    verifyExecutionResult(predSinkData.getAndRemoveValues());
}
Also used : CollectSinkStreamOp(com.alibaba.alink.operator.stream.sink.CollectSinkStreamOp) KMeansTrainBatchOp(com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp) Test(org.junit.Test)

Example 4 with KMeansTrainBatchOp

use of com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp in project Alink by alibaba.

the class Chap17 method c_2_2.

static void c_2_2() throws Exception {
    if (!new File(DATA_DIR + VECTOR_FILE).exists()) {
        new CsvSourceBatchOp().setFilePath(DATA_DIR + ORIGIN_FILE).setSchemaStr(SCHEMA_STRING).link(new VectorAssemblerBatchOp().setSelectedCols(FEATURE_COL_NAMES).setOutputCol(VECTOR_COL_NAME).setReservedCols(LABEL_COL_NAME)).link(new AkSinkBatchOp().setFilePath(DATA_DIR + VECTOR_FILE));
        BatchOperator.execute();
    }
    AkSourceBatchOp source = new AkSourceBatchOp().setFilePath(DATA_DIR + VECTOR_FILE);
    source.lazyPrint(5);
    KMeansTrainBatchOp kmeans_model = new KMeansTrainBatchOp().setK(2).setVectorCol(VECTOR_COL_NAME);
    KMeansPredictBatchOp kmeans_pred = new KMeansPredictBatchOp().setPredictionCol(PREDICTION_COL_NAME);
    source.link(kmeans_model);
    kmeans_pred.linkFrom(kmeans_model, source);
    kmeans_model.lazyPrintModelInfo();
    kmeans_pred.lazyPrint(5);
    kmeans_pred.link(new EvalClusterBatchOp().setVectorCol(VECTOR_COL_NAME).setLabelCol(LABEL_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).lazyPrintMetrics("KMeans EUCLIDEAN"));
    kmeans_pred.orderBy(PREDICTION_COL_NAME + ", " + LABEL_COL_NAME, 200, false).lazyPrint(-1, "all data");
    BatchOperator.execute();
    new KMeans().setK(2).setDistanceType(DistanceType.COSINE).setVectorCol(VECTOR_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).enableLazyPrintModelInfo().fit(source).transform(source).link(new EvalClusterBatchOp().setVectorCol(VECTOR_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).setLabelCol(LABEL_COL_NAME).lazyPrintMetrics("KMeans COSINE"));
    BatchOperator.execute();
}
Also used : KMeansPredictBatchOp(com.alibaba.alink.operator.batch.clustering.KMeansPredictBatchOp) AkSourceBatchOp(com.alibaba.alink.operator.batch.source.AkSourceBatchOp) BisectingKMeans(com.alibaba.alink.pipeline.clustering.BisectingKMeans) KMeans(com.alibaba.alink.pipeline.clustering.KMeans) GeoKMeans(com.alibaba.alink.pipeline.clustering.GeoKMeans) VectorAssemblerBatchOp(com.alibaba.alink.operator.batch.dataproc.vector.VectorAssemblerBatchOp) AkSinkBatchOp(com.alibaba.alink.operator.batch.sink.AkSinkBatchOp) File(java.io.File) CsvSourceBatchOp(com.alibaba.alink.operator.batch.source.CsvSourceBatchOp) KMeansTrainBatchOp(com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp) EvalClusterBatchOp(com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp)

Aggregations

KMeansTrainBatchOp (com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp)4 KMeansPredictBatchOp (com.alibaba.alink.operator.batch.clustering.KMeansPredictBatchOp)3 EvalClusterBatchOp (com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp)2 AkSinkBatchOp (com.alibaba.alink.operator.batch.sink.AkSinkBatchOp)2 AkSourceBatchOp (com.alibaba.alink.operator.batch.source.AkSourceBatchOp)2 KMeansPredictStreamOp (com.alibaba.alink.operator.stream.clustering.KMeansPredictStreamOp)2 File (java.io.File)2 Test (org.junit.Test)2 VectorAssemblerBatchOp (com.alibaba.alink.operator.batch.dataproc.vector.VectorAssemblerBatchOp)1 CsvSourceBatchOp (com.alibaba.alink.operator.batch.source.CsvSourceBatchOp)1 AkSinkStreamOp (com.alibaba.alink.operator.stream.sink.AkSinkStreamOp)1 CollectSinkStreamOp (com.alibaba.alink.operator.stream.sink.CollectSinkStreamOp)1 AkSourceStreamOp (com.alibaba.alink.operator.stream.source.AkSourceStreamOp)1 BisectingKMeans (com.alibaba.alink.pipeline.clustering.BisectingKMeans)1 GeoKMeans (com.alibaba.alink.pipeline.clustering.GeoKMeans)1 KMeans (com.alibaba.alink.pipeline.clustering.KMeans)1 Params (org.apache.flink.ml.api.misc.param.Params)1