Search in sources :

Example 26 with AkSinkBatchOp

use of com.alibaba.alink.operator.batch.sink.AkSinkBatchOp in project Alink by alibaba.

the class Chap14 method c_4.

static void c_4() throws Exception {
    // load pipeline model
    PipelineModel feature_pipelineModel = PipelineModel.load(DATA_DIR + FEATURE_MODEL_FILE);
    // prepare stream train data
    CsvSourceStreamOp data = new CsvSourceStreamOp().setFilePath("http://alink-release.oss-cn-beijing.aliyuncs.com/data-files/avazu-ctr-train-8M.csv").setSchemaStr(SCHEMA_STRING);
    if (!new File(DATA_DIR + INIT_MODEL_FILE).exists()) {
        CsvSourceBatchOp trainBatchData = new CsvSourceBatchOp().setFilePath("http://alink-release.oss-cn-beijing.aliyuncs.com/data-files/avazu-small.csv").setSchemaStr(SCHEMA_STRING);
        // train initial batch model
        LogisticRegressionTrainBatchOp lr = new LogisticRegressionTrainBatchOp().setVectorCol(VEC_COL_NAME).setLabelCol(LABEL_COL_NAME).setWithIntercept(true).setMaxIter(10);
        feature_pipelineModel.transform(trainBatchData).link(lr).link(new AkSinkBatchOp().setFilePath(DATA_DIR + INIT_MODEL_FILE));
        BatchOperator.execute();
    }
}
Also used : LogisticRegressionTrainBatchOp(com.alibaba.alink.operator.batch.classification.LogisticRegressionTrainBatchOp) AkSinkBatchOp(com.alibaba.alink.operator.batch.sink.AkSinkBatchOp) CsvSourceStreamOp(com.alibaba.alink.operator.stream.source.CsvSourceStreamOp) File(java.io.File) CsvSourceBatchOp(com.alibaba.alink.operator.batch.source.CsvSourceBatchOp) PipelineModel(com.alibaba.alink.pipeline.PipelineModel)

Example 27 with AkSinkBatchOp

use of com.alibaba.alink.operator.batch.sink.AkSinkBatchOp in project Alink by alibaba.

the class Chap17 method c_2_2.

static void c_2_2() throws Exception {
    if (!new File(DATA_DIR + VECTOR_FILE).exists()) {
        new CsvSourceBatchOp().setFilePath(DATA_DIR + ORIGIN_FILE).setSchemaStr(SCHEMA_STRING).link(new VectorAssemblerBatchOp().setSelectedCols(FEATURE_COL_NAMES).setOutputCol(VECTOR_COL_NAME).setReservedCols(LABEL_COL_NAME)).link(new AkSinkBatchOp().setFilePath(DATA_DIR + VECTOR_FILE));
        BatchOperator.execute();
    }
    AkSourceBatchOp source = new AkSourceBatchOp().setFilePath(DATA_DIR + VECTOR_FILE);
    source.lazyPrint(5);
    KMeansTrainBatchOp kmeans_model = new KMeansTrainBatchOp().setK(2).setVectorCol(VECTOR_COL_NAME);
    KMeansPredictBatchOp kmeans_pred = new KMeansPredictBatchOp().setPredictionCol(PREDICTION_COL_NAME);
    source.link(kmeans_model);
    kmeans_pred.linkFrom(kmeans_model, source);
    kmeans_model.lazyPrintModelInfo();
    kmeans_pred.lazyPrint(5);
    kmeans_pred.link(new EvalClusterBatchOp().setVectorCol(VECTOR_COL_NAME).setLabelCol(LABEL_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).lazyPrintMetrics("KMeans EUCLIDEAN"));
    kmeans_pred.orderBy(PREDICTION_COL_NAME + ", " + LABEL_COL_NAME, 200, false).lazyPrint(-1, "all data");
    BatchOperator.execute();
    new KMeans().setK(2).setDistanceType(DistanceType.COSINE).setVectorCol(VECTOR_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).enableLazyPrintModelInfo().fit(source).transform(source).link(new EvalClusterBatchOp().setVectorCol(VECTOR_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).setLabelCol(LABEL_COL_NAME).lazyPrintMetrics("KMeans COSINE"));
    BatchOperator.execute();
}
Also used : KMeansPredictBatchOp(com.alibaba.alink.operator.batch.clustering.KMeansPredictBatchOp) AkSourceBatchOp(com.alibaba.alink.operator.batch.source.AkSourceBatchOp) BisectingKMeans(com.alibaba.alink.pipeline.clustering.BisectingKMeans) KMeans(com.alibaba.alink.pipeline.clustering.KMeans) GeoKMeans(com.alibaba.alink.pipeline.clustering.GeoKMeans) VectorAssemblerBatchOp(com.alibaba.alink.operator.batch.dataproc.vector.VectorAssemblerBatchOp) AkSinkBatchOp(com.alibaba.alink.operator.batch.sink.AkSinkBatchOp) File(java.io.File) CsvSourceBatchOp(com.alibaba.alink.operator.batch.source.CsvSourceBatchOp) KMeansTrainBatchOp(com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp) EvalClusterBatchOp(com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp)

Aggregations

AkSinkBatchOp (com.alibaba.alink.operator.batch.sink.AkSinkBatchOp)27 AkSourceBatchOp (com.alibaba.alink.operator.batch.source.AkSourceBatchOp)20 File (java.io.File)15 CsvSourceBatchOp (com.alibaba.alink.operator.batch.source.CsvSourceBatchOp)6 MemSourceBatchOp (com.alibaba.alink.operator.batch.source.MemSourceBatchOp)5 BatchOperator (com.alibaba.alink.operator.batch.BatchOperator)4 EvalClusterBatchOp (com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp)4 PipelineModel (com.alibaba.alink.pipeline.PipelineModel)3 LogisticRegression (com.alibaba.alink.pipeline.classification.LogisticRegression)3 FilePath (com.alibaba.alink.common.io.filesystem.FilePath)2 HadoopFileSystem (com.alibaba.alink.common.io.filesystem.HadoopFileSystem)2 OssFileSystem (com.alibaba.alink.common.io.filesystem.OssFileSystem)2 Stopwatch (com.alibaba.alink.common.utils.Stopwatch)2 LogisticRegressionTrainBatchOp (com.alibaba.alink.operator.batch.classification.LogisticRegressionTrainBatchOp)2 KMeansPredictBatchOp (com.alibaba.alink.operator.batch.clustering.KMeansPredictBatchOp)2 KMeansTrainBatchOp (com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp)2 SplitBatchOp (com.alibaba.alink.operator.batch.dataproc.SplitBatchOp)2 EvalBinaryClassBatchOp (com.alibaba.alink.operator.batch.evaluation.EvalBinaryClassBatchOp)2 SegmentBatchOp (com.alibaba.alink.operator.batch.nlp.SegmentBatchOp)2 StopWordsRemoverBatchOp (com.alibaba.alink.operator.batch.nlp.StopWordsRemoverBatchOp)2