Search in sources :

Example 1 with RandomTableSourceBatchOp

use of com.alibaba.alink.operator.batch.source.RandomTableSourceBatchOp in project Alink by alibaba.

the class OneHotTrainBatchOpTest method testLazyPrint.

@Test
public void testLazyPrint() throws Exception {
    RandomTableSourceBatchOp op = new RandomTableSourceBatchOp().setIdCol("id").setNumCols(40).setNumRows(200L);
    OneHotTrainBatchOp qop = new OneHotTrainBatchOp().setSelectedCols(ArrayUtils.removeElements(op.getColNames(), "id")).linkFrom(op);
    qop.lazyCollectModelInfo(new Consumer<OneHotModelInfo>() {

        @Override
        public void accept(OneHotModelInfo oneHotModelInfo) {
            for (String s : oneHotModelInfo.getSelectedColsInModel()) {
                Assert.assertEquals(oneHotModelInfo.getDistinctTokenNumber(s).intValue(), 200);
            }
        }
    });
    qop.lazyPrintModelInfo();
    BatchOperator.execute();
}
Also used : OneHotModelInfo(com.alibaba.alink.operator.common.feature.OneHotModelInfo) RandomTableSourceBatchOp(com.alibaba.alink.operator.batch.source.RandomTableSourceBatchOp) Test(org.junit.Test)

Example 2 with RandomTableSourceBatchOp

use of com.alibaba.alink.operator.batch.source.RandomTableSourceBatchOp in project Alink by alibaba.

the class QuantileDiscretizerTrainBatchOpTest method testLazyPrint.

@Test
public void testLazyPrint() throws Exception {
    RandomTableSourceBatchOp op = new RandomTableSourceBatchOp().setIdCol("id").setNumCols(40).setNumRows(200L);
    QuantileDiscretizerTrainBatchOp qop = new QuantileDiscretizerTrainBatchOp().setNumBuckets(20).setSelectedCols(ArrayUtils.removeElements(op.getColNames(), "id")).linkFrom(op);
    qop.lazyCollectModelInfo(new Consumer<QuantileDiscretizerModelInfo>() {

        @Override
        public void accept(QuantileDiscretizerModelInfo quantileDiscretizerModelInfo) {
            for (String s : quantileDiscretizerModelInfo.getSelectedColsInModel()) {
                System.out.println(s + ":" + JsonConverter.toJson(quantileDiscretizerModelInfo.getCutsArray(s)));
            }
        }
    });
    qop.lazyPrintModelInfo();
    BatchOperator.execute();
}
Also used : RandomTableSourceBatchOp(com.alibaba.alink.operator.batch.source.RandomTableSourceBatchOp) QuantileDiscretizerModelInfo(com.alibaba.alink.operator.common.feature.QuantileDiscretizerModelInfo) Test(org.junit.Test)

Example 3 with RandomTableSourceBatchOp

use of com.alibaba.alink.operator.batch.source.RandomTableSourceBatchOp in project Alink by alibaba.

the class LSTNetTrainBatchOpTest method testStreamMultiVar.

@Test
public void testStreamMultiVar() throws Exception {
    BatchOperator.setParallelism(1);
    final int numCols = 10;
    final String timeColName = "ts";
    final String vecColName = "vec";
    final String selectClause = "TO_TIMESTAMP(" + timeColName + ") as " + timeColName + ", " + vecColName;
    BatchOperator<?> source = new RandomTableSourceBatchOp().setNumRows(1000L).setNumCols(numCols);
    String[] selectedColNames = source.getColNames();
    AppendIdBatchOp appendIdBatchOp = new AppendIdBatchOp().setIdCol(timeColName).linkFrom(source);
    ColumnsToVectorBatchOp columnsToVectorBatchOp = new ColumnsToVectorBatchOp().setSelectedCols(selectedColNames).setVectorCol(vecColName).linkFrom(appendIdBatchOp);
    BatchOperator<?> timeBatchOp = new SelectBatchOp().setClause(selectClause).linkFrom(columnsToVectorBatchOp);
    LSTNetTrainBatchOp trainOp = new LSTNetTrainBatchOp().setVectorCol(vecColName).setTimeCol(timeColName).setWindow(24 * 7).setHorizon(12).setNumEpochs(1).linkFrom(timeBatchOp);
    StreamOperator<?> sourceStreamOp = new RandomTableSourceStreamOp().setNumCols(numCols).setMaxRows(1000L);
    ColumnsToVectorStreamOp columnsToVectorStreamOp = new ColumnsToVectorStreamOp().setSelectedCols(selectedColNames).setVectorCol(vecColName).linkFrom(sourceStreamOp);
    AppendIdStreamOp appendIdStreamOp = new AppendIdStreamOp().setIdCol(timeColName).linkFrom(columnsToVectorStreamOp);
    StreamOperator<?> timestampStreamOp = new SelectStreamOp().setClause(selectClause).linkFrom(appendIdStreamOp);
    OverCountWindowStreamOp overCountWindowStreamOp = new OverCountWindowStreamOp().setClause("MTABLE_AGG_PRECEDING(" + timeColName + ", " + vecColName + ") as col_agg").setTimeCol(timeColName).setPrecedingRows(24 * 7).linkFrom(timestampStreamOp);
    LSTNetPredictStreamOp predictStreamOp = new LSTNetPredictStreamOp(trainOp).setValueCol("col_agg").setPredictionCol("pred").setReservedCols(timeColName).linkFrom(overCountWindowStreamOp);
    FilePath tmpAkFile = new FilePath(new Path(folder.getRoot().getPath(), "lstnet_test_stream_multi_var_result.ak"));
    predictStreamOp.link(new AkSinkStreamOp().setOverwriteSink(true).setFilePath(tmpAkFile));
    StreamOperator.execute();
}
Also used : FilePath(com.alibaba.alink.common.io.filesystem.FilePath) Path(org.apache.flink.core.fs.Path) FilePath(com.alibaba.alink.common.io.filesystem.FilePath) AkSinkStreamOp(com.alibaba.alink.operator.stream.sink.AkSinkStreamOp) AppendIdStreamOp(com.alibaba.alink.operator.stream.dataproc.AppendIdStreamOp) SelectBatchOp(com.alibaba.alink.operator.batch.sql.SelectBatchOp) RandomTableSourceBatchOp(com.alibaba.alink.operator.batch.source.RandomTableSourceBatchOp) ColumnsToVectorStreamOp(com.alibaba.alink.operator.stream.dataproc.format.ColumnsToVectorStreamOp) AppendIdBatchOp(com.alibaba.alink.operator.batch.dataproc.AppendIdBatchOp) RandomTableSourceStreamOp(com.alibaba.alink.operator.stream.source.RandomTableSourceStreamOp) SelectStreamOp(com.alibaba.alink.operator.stream.sql.SelectStreamOp) OverCountWindowStreamOp(com.alibaba.alink.operator.stream.feature.OverCountWindowStreamOp) ColumnsToVectorBatchOp(com.alibaba.alink.operator.batch.dataproc.format.ColumnsToVectorBatchOp) LSTNetPredictStreamOp(com.alibaba.alink.operator.stream.timeseries.LSTNetPredictStreamOp) Test(org.junit.Test)

Example 4 with RandomTableSourceBatchOp

use of com.alibaba.alink.operator.batch.source.RandomTableSourceBatchOp in project Alink by alibaba.

the class TFTableModelPredictStreamOpTest method test.

@Category(DLTest.class)
@Test
public void test() throws Exception {
    int savedStreamParallelism = MLEnvironmentFactory.getDefault().getStreamExecutionEnvironment().getParallelism();
    BatchOperator.setParallelism(3);
    BatchOperator<?> source = new RandomTableSourceBatchOp().setNumRows(100L).setNumCols(10);
    String[] colNames = source.getColNames();
    source = source.select("*, case when RAND() > 0.5 then 1. else 0. end as label");
    String label = "label";
    StreamOperator<?> streamSource = new RandomTableSourceStreamOp().setNumCols(10).setMaxRows(100L);
    Map<String, Object> userParams = new HashMap<>();
    userParams.put("featureCols", JsonConverter.toJson(colNames));
    userParams.put("labelCol", label);
    userParams.put("batch_size", 16);
    userParams.put("num_epochs", 1);
    TFTableModelTrainBatchOp tfTableModelTrainBatchOp = new TFTableModelTrainBatchOp().setUserFiles(new String[] { "res:///tf_dnn_train.py" }).setMainScriptFile("res:///tf_dnn_train.py").setUserParams(JsonConverter.toJson(userParams)).setNumWorkers(2).setNumPSs(1).linkFrom(source);
    TFTableModelPredictStreamOp tfTableModelPredictStreamOp = new TFTableModelPredictStreamOp(tfTableModelTrainBatchOp).setOutputSchemaStr("logits double").setOutputSignatureDefs(new String[] { "logits" }).setSignatureDefKey("predict").setSelectedCols(colNames).linkFrom(streamSource);
    tfTableModelPredictStreamOp.print();
    StreamOperator.execute();
    StreamOperator.setParallelism(savedStreamParallelism);
}
Also used : HashMap(java.util.HashMap) TFTableModelTrainBatchOp(com.alibaba.alink.operator.batch.tensorflow.TFTableModelTrainBatchOp) RandomTableSourceBatchOp(com.alibaba.alink.operator.batch.source.RandomTableSourceBatchOp) RandomTableSourceStreamOp(com.alibaba.alink.operator.stream.source.RandomTableSourceStreamOp) Category(org.junit.experimental.categories.Category) DLTest(com.alibaba.alink.testutil.categories.DLTest) Test(org.junit.Test)

Example 5 with RandomTableSourceBatchOp

use of com.alibaba.alink.operator.batch.source.RandomTableSourceBatchOp in project Alink by alibaba.

the class DeepARTrainBatchOpTest method testSingleVar.

@Test
public void testSingleVar() throws Exception {
    BatchOperator.setParallelism(1);
    final String timeColName = "ts";
    BatchOperator<?> source = new RandomTableSourceBatchOp().setNumRows(1000L).setNumCols(1);
    String colName = source.getColNames()[0];
    AppendIdBatchOp appendIdBatchOp = new AppendIdBatchOp().setIdCol(timeColName).linkFrom(source);
    BatchOperator<?> timeBatchOp = new SelectBatchOp().setClause(String.format("%s, FLOOR(TO_TIMESTAMP(%s * 3600000) TO HOUR) as %s", colName, timeColName, timeColName)).linkFrom(appendIdBatchOp);
    StringBuilder groupByPredicate = new StringBuilder();
    String selectClause = timeColName + String.format(", SUM(%s) as %s", colName, colName);
    groupByPredicate.append(timeColName);
    BatchOperator<?> groupedTimeBatchOp = new GroupByBatchOp().setSelectClause(selectClause).setGroupByPredicate(groupByPredicate.toString()).linkFrom(timeBatchOp);
    BatchOperator<?> deepArTrainBatchOp = new DeepARTrainBatchOp().setSelectedCol(colName).setTimeCol(timeColName).setWindow(24 * 7).setStride(24).setNumEpochs(1).linkFrom(groupedTimeBatchOp);
    StreamOperator<?> sourceStreamOp = new RandomTableSourceStreamOp().setNumCols(1).setMaxRows(1000L);
    AppendIdStreamOp appendIdStreamOp = new AppendIdStreamOp().setIdCol(timeColName).linkFrom(sourceStreamOp);
    StreamOperator<?> timeStreamOp = new SelectStreamOp().setClause(String.format("%s, FLOOR(TO_TIMESTAMP(%s * 3600000) TO HOUR) as %s", colName, timeColName, timeColName)).linkFrom(appendIdStreamOp);
    String selectClausePred = String.format("TUMBLE_START() as %s", timeColName) + String.format(", SUM(%s) as %s", colName, colName);
    TumbleTimeWindowStreamOp timeWindowStreamOp = new TumbleTimeWindowStreamOp().setWindowTime(3600).setTimeCol(timeColName).setClause(selectClausePred).linkFrom(timeStreamOp);
    HopTimeWindowStreamOp hopTimeWindowStreamOp = new HopTimeWindowStreamOp().setTimeCol(timeColName).setClause(String.format("MTABLE_AGG(%s, %s) as %s", timeColName, colName, "mt")).setHopTime(24 * 3600).setWindowTime((24 * 7 - 24) * 3600).linkFrom(timeWindowStreamOp);
    DeepARPredictStreamOp deepARPredictStreamOp = new DeepARPredictStreamOp(deepArTrainBatchOp).setValueCol("mt").setPredictionCol("pred").setPredictNum(24).linkFrom(hopTimeWindowStreamOp);
    FilePath tmpAkFile = new FilePath(new Path(folder.getRoot().getPath(), "deepar_test_stream_single_var_result.ak"));
    deepARPredictStreamOp.link(new AkSinkStreamOp().setOverwriteSink(true).setFilePath(tmpAkFile));
    StreamOperator.execute();
}
Also used : FilePath(com.alibaba.alink.common.io.filesystem.FilePath) Path(org.apache.flink.core.fs.Path) FilePath(com.alibaba.alink.common.io.filesystem.FilePath) DeepARPredictStreamOp(com.alibaba.alink.operator.stream.timeseries.DeepARPredictStreamOp) AkSinkStreamOp(com.alibaba.alink.operator.stream.sink.AkSinkStreamOp) AppendIdStreamOp(com.alibaba.alink.operator.stream.dataproc.AppendIdStreamOp) TumbleTimeWindowStreamOp(com.alibaba.alink.operator.stream.feature.TumbleTimeWindowStreamOp) SelectBatchOp(com.alibaba.alink.operator.batch.sql.SelectBatchOp) GroupByBatchOp(com.alibaba.alink.operator.batch.sql.GroupByBatchOp) RandomTableSourceBatchOp(com.alibaba.alink.operator.batch.source.RandomTableSourceBatchOp) AppendIdBatchOp(com.alibaba.alink.operator.batch.dataproc.AppendIdBatchOp) RandomTableSourceStreamOp(com.alibaba.alink.operator.stream.source.RandomTableSourceStreamOp) SelectStreamOp(com.alibaba.alink.operator.stream.sql.SelectStreamOp) HopTimeWindowStreamOp(com.alibaba.alink.operator.stream.feature.HopTimeWindowStreamOp) Test(org.junit.Test)

Aggregations

RandomTableSourceBatchOp (com.alibaba.alink.operator.batch.source.RandomTableSourceBatchOp)7 Test (org.junit.Test)7 RandomTableSourceStreamOp (com.alibaba.alink.operator.stream.source.RandomTableSourceStreamOp)5 FilePath (com.alibaba.alink.common.io.filesystem.FilePath)4 AppendIdBatchOp (com.alibaba.alink.operator.batch.dataproc.AppendIdBatchOp)4 SelectBatchOp (com.alibaba.alink.operator.batch.sql.SelectBatchOp)4 AppendIdStreamOp (com.alibaba.alink.operator.stream.dataproc.AppendIdStreamOp)4 AkSinkStreamOp (com.alibaba.alink.operator.stream.sink.AkSinkStreamOp)4 SelectStreamOp (com.alibaba.alink.operator.stream.sql.SelectStreamOp)4 Path (org.apache.flink.core.fs.Path)4 ColumnsToVectorBatchOp (com.alibaba.alink.operator.batch.dataproc.format.ColumnsToVectorBatchOp)2 GroupByBatchOp (com.alibaba.alink.operator.batch.sql.GroupByBatchOp)2 ColumnsToVectorStreamOp (com.alibaba.alink.operator.stream.dataproc.format.ColumnsToVectorStreamOp)2 HopTimeWindowStreamOp (com.alibaba.alink.operator.stream.feature.HopTimeWindowStreamOp)2 OverCountWindowStreamOp (com.alibaba.alink.operator.stream.feature.OverCountWindowStreamOp)2 TumbleTimeWindowStreamOp (com.alibaba.alink.operator.stream.feature.TumbleTimeWindowStreamOp)2 DeepARPredictStreamOp (com.alibaba.alink.operator.stream.timeseries.DeepARPredictStreamOp)2 LSTNetPredictStreamOp (com.alibaba.alink.operator.stream.timeseries.LSTNetPredictStreamOp)2 TFTableModelTrainBatchOp (com.alibaba.alink.operator.batch.tensorflow.TFTableModelTrainBatchOp)1 OneHotModelInfo (com.alibaba.alink.operator.common.feature.OneHotModelInfo)1