Search in sources :

Example 1 with AkSourceStreamOp

use of com.alibaba.alink.operator.stream.source.AkSourceStreamOp in project Alink by alibaba.

the class Chap03 method c_2_3_1.

static void c_2_3_1() throws Exception {
    HadoopFileSystem hdfs = new HadoopFileSystem(HADOOP_VERSION, HDFS_URI);
    OssFileSystem oss = new OssFileSystem(OSS_VERSION, OSS_END_POINT, OSS_BUCKET_NAME, OSS_ACCESS_ID, OSS_ACCESS_KEY);
    FilePath[] filePaths = new FilePath[] { new FilePath(LOCAL_DIR + "iris.ak"), new FilePath(HDFS_URI + "user/yangxu/alink/data/temp/iris.ak", hdfs), new FilePath(OSS_PREFIX_URI + "alink/data/temp/iris.ak", oss) };
    for (FilePath filePath : filePaths) {
        new CsvSourceBatchOp().setFilePath(IRIS_HTTP_URL).setSchemaStr(IRIS_SCHEMA_STR).link(new AkSinkBatchOp().setFilePath(filePath).setOverwriteSink(true));
        BatchOperator.execute();
        System.out.println(new AkSourceBatchOp().setFilePath(filePath).count());
    }
    for (FilePath filePath : filePaths) {
        new CsvSourceStreamOp().setFilePath(IRIS_HTTP_URL).setSchemaStr(IRIS_SCHEMA_STR).link(new AkSinkStreamOp().setFilePath(filePath).setOverwriteSink(true));
        StreamOperator.execute();
        new AkSourceStreamOp().setFilePath(filePath).filter("sepal_length < 4.5").print();
        StreamOperator.execute();
    }
}
Also used : FilePath(com.alibaba.alink.common.io.filesystem.FilePath) AkSinkStreamOp(com.alibaba.alink.operator.stream.sink.AkSinkStreamOp) AkSourceBatchOp(com.alibaba.alink.operator.batch.source.AkSourceBatchOp) AkSourceStreamOp(com.alibaba.alink.operator.stream.source.AkSourceStreamOp) AkSinkBatchOp(com.alibaba.alink.operator.batch.sink.AkSinkBatchOp) HadoopFileSystem(com.alibaba.alink.common.io.filesystem.HadoopFileSystem) CsvSourceStreamOp(com.alibaba.alink.operator.stream.source.CsvSourceStreamOp) OssFileSystem(com.alibaba.alink.common.io.filesystem.OssFileSystem) CsvSourceBatchOp(com.alibaba.alink.operator.batch.source.CsvSourceBatchOp)

Example 2 with AkSourceStreamOp

use of com.alibaba.alink.operator.stream.source.AkSourceStreamOp in project Alink by alibaba.

the class Chap18 method c_2.

static void c_2() throws Exception {
    AkSourceBatchOp batch_source = new AkSourceBatchOp().setFilePath(DATA_DIR + SPARSE_TRAIN_FILE);
    AkSourceStreamOp stream_source = new AkSourceStreamOp().setFilePath(DATA_DIR + SPARSE_TRAIN_FILE);
    if (!new File(DATA_DIR + INIT_MODEL_FILE).exists()) {
        batch_source.sampleWithSize(100).link(new KMeansTrainBatchOp().setVectorCol(VECTOR_COL_NAME).setK(10)).link(new AkSinkBatchOp().setFilePath(DATA_DIR + INIT_MODEL_FILE));
        BatchOperator.execute();
    }
    AkSourceBatchOp init_model = new AkSourceBatchOp().setFilePath(DATA_DIR + INIT_MODEL_FILE);
    new KMeansPredictBatchOp().setPredictionCol(PREDICTION_COL_NAME).linkFrom(init_model, batch_source).link(new EvalClusterBatchOp().setVectorCol(VECTOR_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).setLabelCol(LABEL_COL_NAME).lazyPrintMetrics("Batch Prediction"));
    BatchOperator.execute();
    stream_source.link(new KMeansPredictStreamOp(init_model).setPredictionCol(PREDICTION_COL_NAME)).link(new AkSinkStreamOp().setFilePath(DATA_DIR + TEMP_STREAM_FILE).setOverwriteSink(true));
    StreamOperator.execute();
    new AkSourceBatchOp().setFilePath(DATA_DIR + TEMP_STREAM_FILE).link(new EvalClusterBatchOp().setVectorCol(VECTOR_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).setLabelCol(LABEL_COL_NAME).lazyPrintMetrics("Stream Prediction"));
    BatchOperator.execute();
}
Also used : KMeansPredictBatchOp(com.alibaba.alink.operator.batch.clustering.KMeansPredictBatchOp) AkSinkStreamOp(com.alibaba.alink.operator.stream.sink.AkSinkStreamOp) AkSourceBatchOp(com.alibaba.alink.operator.batch.source.AkSourceBatchOp) AkSourceStreamOp(com.alibaba.alink.operator.stream.source.AkSourceStreamOp) KMeansPredictStreamOp(com.alibaba.alink.operator.stream.clustering.KMeansPredictStreamOp) AkSinkBatchOp(com.alibaba.alink.operator.batch.sink.AkSinkBatchOp) File(java.io.File) KMeansTrainBatchOp(com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp) EvalClusterBatchOp(com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp)

Example 3 with AkSourceStreamOp

use of com.alibaba.alink.operator.stream.source.AkSourceStreamOp in project Alink by alibaba.

the class Chap18 method c_3.

static void c_3() throws Exception {
    AkSourceStreamOp stream_source = new AkSourceStreamOp().setFilePath(DATA_DIR + SPARSE_TRAIN_FILE);
    AkSourceBatchOp init_model = new AkSourceBatchOp().setFilePath(DATA_DIR + INIT_MODEL_FILE);
    StreamOperator<?> stream_pred = stream_source.link(new StreamingKMeansStreamOp(init_model).setTimeInterval(1L).setHalfLife(1).setPredictionCol(PREDICTION_COL_NAME)).select(PREDICTION_COL_NAME + ", " + LABEL_COL_NAME + ", " + VECTOR_COL_NAME);
    stream_pred.sample(0.001).print();
    stream_pred.link(new AkSinkStreamOp().setFilePath(DATA_DIR + TEMP_STREAM_FILE).setOverwriteSink(true));
    StreamOperator.execute();
    new AkSourceBatchOp().setFilePath(DATA_DIR + TEMP_STREAM_FILE).link(new EvalClusterBatchOp().setVectorCol(VECTOR_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).setLabelCol(LABEL_COL_NAME).lazyPrintMetrics("StreamingKMeans"));
    BatchOperator.execute();
}
Also used : AkSinkStreamOp(com.alibaba.alink.operator.stream.sink.AkSinkStreamOp) AkSourceBatchOp(com.alibaba.alink.operator.batch.source.AkSourceBatchOp) AkSourceStreamOp(com.alibaba.alink.operator.stream.source.AkSourceStreamOp) StreamingKMeansStreamOp(com.alibaba.alink.operator.stream.clustering.StreamingKMeansStreamOp) EvalClusterBatchOp(com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp)

Example 4 with AkSourceStreamOp

use of com.alibaba.alink.operator.stream.source.AkSourceStreamOp in project Alink by alibaba.

the class AkSourceSinkTest method testStreamSource.

@Test
public void testStreamSource() throws Exception {
    StreamOperator data1 = new AkSourceStreamOp().setFilePath(new File(path, "af1s").getAbsolutePath());
    StreamOperator data3 = new AkSourceStreamOp().setFilePath(new File(path, "ad2s").getAbsolutePath());
    data1.sample(0.1).print();
    data3.sample(0.1).print();
    StreamOperator.execute();
}
Also used : AkSourceStreamOp(com.alibaba.alink.operator.stream.source.AkSourceStreamOp) StreamOperator(com.alibaba.alink.operator.stream.StreamOperator) File(java.io.File) Test(org.junit.Test)

Example 5 with AkSourceStreamOp

use of com.alibaba.alink.operator.stream.source.AkSourceStreamOp in project Alink by alibaba.

the class Chap23 method c_4.

static void c_4() throws Exception {
    AkSourceBatchOp train_set = new AkSourceBatchOp().setFilePath(DATA_DIR + TRAIN_FILE);
    if (!new File(DATA_DIR + PIPELINE_MODEL).exists()) {
        new Pipeline().add(new RegexTokenizer().setPattern("\\W+").setSelectedCol(TXT_COL_NAME)).add(new DocCountVectorizer().setFeatureType("WORD_COUNT").setSelectedCol(TXT_COL_NAME).setOutputCol(VECTOR_COL_NAME)).add(new NGram().setN(2).setSelectedCol(TXT_COL_NAME).setOutputCol("v_2")).add(new DocCountVectorizer().setFeatureType("WORD_COUNT").setVocabSize(50000).setSelectedCol("v_2").setOutputCol("v_2")).add(new NGram().setN(3).setSelectedCol(TXT_COL_NAME).setOutputCol("v_3")).add(new DocCountVectorizer().setFeatureType("WORD_COUNT").setVocabSize(10000).setSelectedCol("v_3").setOutputCol("v_3")).add(new VectorAssembler().setSelectedCols(VECTOR_COL_NAME, "v_2", "v_3").setOutputCol(VECTOR_COL_NAME)).add(new LogisticRegression().setMaxIter(30).setVectorCol(VECTOR_COL_NAME).setLabelCol(LABEL_COL_NAME).setPredictionCol(PREDICTION_COL_NAME).setPredictionDetailCol(PRED_DETAIL_COL_NAME)).fit(train_set).save(DATA_DIR + PIPELINE_MODEL);
        BatchOperator.execute();
    }
    PipelineModel pipeline_model = PipelineModel.load(DATA_DIR + PIPELINE_MODEL);
    AkSourceBatchOp test_set = new AkSourceBatchOp().setFilePath(DATA_DIR + TEST_FILE);
    pipeline_model.transform(test_set).link(new EvalBinaryClassBatchOp().setPositiveLabelValueString("pos").setLabelCol(LABEL_COL_NAME).setPredictionDetailCol(PRED_DETAIL_COL_NAME).lazyPrintMetrics("NGram 2 and 3"));
    BatchOperator.execute();
    AkSourceStreamOp test_stream = new AkSourceStreamOp().setFilePath(DATA_DIR + TEST_FILE);
    pipeline_model.transform(test_stream).sample(0.001).select(PREDICTION_COL_NAME + ", " + LABEL_COL_NAME + ", " + TXT_COL_NAME).print();
    StreamOperator.execute();
    String str = "Oh dear. good cast, but to write and direct is an art and to write wit and direct wit is a bit of a " + "task. Even doing good comedy you have to get the timing and moment right. Im not putting it all down " + "there were parts where i laughed loud but that was at very few times. The main focus to me was on the " + "fast free flowing dialogue, that made some people in the film annoying. It may sound great while " + "reading the script in your head but getting that out and to the camera is a different task. And the " + "hand held camera work does give energy to few parts of the film. Overall direction was good but the " + "script was not all that to me, but I'm sure you was reading the script in your head it would sound good" + ". Sorry.";
    Row pred_row;
    LocalPredictor local_predictor = pipeline_model.collectLocalPredictor("review string");
    System.out.println(local_predictor.getOutputSchema());
    pred_row = local_predictor.map(Row.of(str));
    System.out.println(pred_row.getField(4));
    LocalPredictor local_predictor_2 = new LocalPredictor(DATA_DIR + PIPELINE_MODEL, "review string");
    System.out.println(local_predictor_2.getOutputSchema());
    pred_row = local_predictor_2.map(Row.of(str));
    System.out.println(pred_row.getField(4));
}
Also used : LocalPredictor(com.alibaba.alink.pipeline.LocalPredictor) VectorAssembler(com.alibaba.alink.pipeline.dataproc.vector.VectorAssembler) NGram(com.alibaba.alink.pipeline.nlp.NGram) DocCountVectorizer(com.alibaba.alink.pipeline.nlp.DocCountVectorizer) Pipeline(com.alibaba.alink.pipeline.Pipeline) PipelineModel(com.alibaba.alink.pipeline.PipelineModel) EvalBinaryClassBatchOp(com.alibaba.alink.operator.batch.evaluation.EvalBinaryClassBatchOp) AkSourceBatchOp(com.alibaba.alink.operator.batch.source.AkSourceBatchOp) RegexTokenizer(com.alibaba.alink.pipeline.nlp.RegexTokenizer) AkSourceStreamOp(com.alibaba.alink.operator.stream.source.AkSourceStreamOp) Row(org.apache.flink.types.Row) LogisticRegression(com.alibaba.alink.pipeline.classification.LogisticRegression) File(java.io.File)

Aggregations

AkSourceStreamOp (com.alibaba.alink.operator.stream.source.AkSourceStreamOp)5 AkSourceBatchOp (com.alibaba.alink.operator.batch.source.AkSourceBatchOp)4 AkSinkStreamOp (com.alibaba.alink.operator.stream.sink.AkSinkStreamOp)3 File (java.io.File)3 EvalClusterBatchOp (com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp)2 AkSinkBatchOp (com.alibaba.alink.operator.batch.sink.AkSinkBatchOp)2 FilePath (com.alibaba.alink.common.io.filesystem.FilePath)1 HadoopFileSystem (com.alibaba.alink.common.io.filesystem.HadoopFileSystem)1 OssFileSystem (com.alibaba.alink.common.io.filesystem.OssFileSystem)1 KMeansPredictBatchOp (com.alibaba.alink.operator.batch.clustering.KMeansPredictBatchOp)1 KMeansTrainBatchOp (com.alibaba.alink.operator.batch.clustering.KMeansTrainBatchOp)1 EvalBinaryClassBatchOp (com.alibaba.alink.operator.batch.evaluation.EvalBinaryClassBatchOp)1 CsvSourceBatchOp (com.alibaba.alink.operator.batch.source.CsvSourceBatchOp)1 StreamOperator (com.alibaba.alink.operator.stream.StreamOperator)1 KMeansPredictStreamOp (com.alibaba.alink.operator.stream.clustering.KMeansPredictStreamOp)1 StreamingKMeansStreamOp (com.alibaba.alink.operator.stream.clustering.StreamingKMeansStreamOp)1 CsvSourceStreamOp (com.alibaba.alink.operator.stream.source.CsvSourceStreamOp)1 LocalPredictor (com.alibaba.alink.pipeline.LocalPredictor)1 Pipeline (com.alibaba.alink.pipeline.Pipeline)1 PipelineModel (com.alibaba.alink.pipeline.PipelineModel)1