Search in sources :

Example 1 with CsvSourceStreamOp

use of com.alibaba.alink.operator.stream.source.CsvSourceStreamOp in project Alink by alibaba.

the class Chap03 method c_2_3_1.

static void c_2_3_1() throws Exception {
    HadoopFileSystem hdfs = new HadoopFileSystem(HADOOP_VERSION, HDFS_URI);
    OssFileSystem oss = new OssFileSystem(OSS_VERSION, OSS_END_POINT, OSS_BUCKET_NAME, OSS_ACCESS_ID, OSS_ACCESS_KEY);
    FilePath[] filePaths = new FilePath[] { new FilePath(LOCAL_DIR + "iris.ak"), new FilePath(HDFS_URI + "user/yangxu/alink/data/temp/iris.ak", hdfs), new FilePath(OSS_PREFIX_URI + "alink/data/temp/iris.ak", oss) };
    for (FilePath filePath : filePaths) {
        new CsvSourceBatchOp().setFilePath(IRIS_HTTP_URL).setSchemaStr(IRIS_SCHEMA_STR).link(new AkSinkBatchOp().setFilePath(filePath).setOverwriteSink(true));
        BatchOperator.execute();
        System.out.println(new AkSourceBatchOp().setFilePath(filePath).count());
    }
    for (FilePath filePath : filePaths) {
        new CsvSourceStreamOp().setFilePath(IRIS_HTTP_URL).setSchemaStr(IRIS_SCHEMA_STR).link(new AkSinkStreamOp().setFilePath(filePath).setOverwriteSink(true));
        StreamOperator.execute();
        new AkSourceStreamOp().setFilePath(filePath).filter("sepal_length < 4.5").print();
        StreamOperator.execute();
    }
}
Also used : FilePath(com.alibaba.alink.common.io.filesystem.FilePath) AkSinkStreamOp(com.alibaba.alink.operator.stream.sink.AkSinkStreamOp) AkSourceBatchOp(com.alibaba.alink.operator.batch.source.AkSourceBatchOp) AkSourceStreamOp(com.alibaba.alink.operator.stream.source.AkSourceStreamOp) AkSinkBatchOp(com.alibaba.alink.operator.batch.sink.AkSinkBatchOp) HadoopFileSystem(com.alibaba.alink.common.io.filesystem.HadoopFileSystem) CsvSourceStreamOp(com.alibaba.alink.operator.stream.source.CsvSourceStreamOp) OssFileSystem(com.alibaba.alink.common.io.filesystem.OssFileSystem) CsvSourceBatchOp(com.alibaba.alink.operator.batch.source.CsvSourceBatchOp)

Example 2 with CsvSourceStreamOp

use of com.alibaba.alink.operator.stream.source.CsvSourceStreamOp in project Alink by alibaba.

the class Chap04 method c_3.

static void c_3() throws Exception {
    DerbyCatalog derby = new DerbyCatalog("derby_catalog", null, DERBY_VERSION, DATA_DIR + DERBY_DIR);
    derby.open();
    derby.createDatabase(DB_NAME, new CatalogDatabaseImpl(new HashMap<>(), ""), true);
    derby.dropTable(new ObjectPath(DB_NAME, BATCH_TABLE_NAME), true);
    derby.dropTable(new ObjectPath(DB_NAME, STREAM_TABLE_NAME), true);
    new CsvSourceBatchOp().setFilePath(IRIS_URL).setSchemaStr(IRIS_SCHEMA_STR).lazyPrintStatistics("< origin data >").link(new CatalogSinkBatchOp().setCatalogObject(new CatalogObject(derby, new ObjectPath(DB_NAME, BATCH_TABLE_NAME))));
    BatchOperator.execute();
    new CsvSourceStreamOp().setFilePath(IRIS_URL).setSchemaStr(IRIS_SCHEMA_STR).link(new CatalogSinkStreamOp().setCatalogObject(new CatalogObject(derby, new ObjectPath(DB_NAME, STREAM_TABLE_NAME))));
    StreamOperator.execute();
    new CatalogSourceBatchOp().setCatalogObject(new CatalogObject(derby, new ObjectPath(DB_NAME, BATCH_TABLE_NAME))).lazyPrintStatistics("< batch catalog source >");
    BatchOperator.execute();
    new CatalogSourceStreamOp().setCatalogObject(new CatalogObject(derby, new ObjectPath(DB_NAME, STREAM_TABLE_NAME))).sample(0.02).print();
    StreamOperator.execute();
    System.out.println("< tables before drop >");
    System.out.println(JsonConverter.toJson(derby.listTables(DB_NAME)));
    if (derby.tableExists(new ObjectPath(DB_NAME, BATCH_TABLE_NAME))) {
        derby.dropTable(new ObjectPath(DB_NAME, BATCH_TABLE_NAME), false);
    }
    derby.dropTable(new ObjectPath(DB_NAME, STREAM_TABLE_NAME), true);
    System.out.println("< tables after drop >");
    System.out.println(JsonConverter.toJson(derby.listTables(DB_NAME)));
    derby.dropDatabase(DB_NAME, true);
    derby.close();
}
Also used : CatalogSinkBatchOp(com.alibaba.alink.operator.batch.sink.CatalogSinkBatchOp) CatalogSourceBatchOp(com.alibaba.alink.operator.batch.source.CatalogSourceBatchOp) ObjectPath(org.apache.flink.table.catalog.ObjectPath) CatalogSourceStreamOp(com.alibaba.alink.operator.stream.source.CatalogSourceStreamOp) HashMap(java.util.HashMap) DerbyCatalog(com.alibaba.alink.common.io.catalog.DerbyCatalog) CatalogSinkStreamOp(com.alibaba.alink.operator.stream.sink.CatalogSinkStreamOp) CsvSourceStreamOp(com.alibaba.alink.operator.stream.source.CsvSourceStreamOp) CatalogDatabaseImpl(org.apache.flink.table.catalog.CatalogDatabaseImpl) CsvSourceBatchOp(com.alibaba.alink.operator.batch.source.CsvSourceBatchOp) CatalogObject(com.alibaba.alink.params.io.HasCatalogObject.CatalogObject)

Example 3 with CsvSourceStreamOp

use of com.alibaba.alink.operator.stream.source.CsvSourceStreamOp in project Alink by alibaba.

the class Chap04 method c_4.

static void c_4() throws Exception {
    if (null != MYSQL_URL) {
        MySqlCatalog mySql = new MySqlCatalog("mysql_catalog", null, MYSQL_VERSION, MYSQL_URL, MYSQL_PORT, MYSQL_USER_NAME, MYSQL_PASSWORD);
        mySql.open();
        mySql.createDatabase(DB_NAME, new CatalogDatabaseImpl(new HashMap<>(), ""), true);
        new CsvSourceBatchOp().setFilePath(IRIS_URL).setSchemaStr(IRIS_SCHEMA_STR).lazyPrintStatistics("< origin data >").link(new CatalogSinkBatchOp().setCatalogObject(new CatalogObject(mySql, new ObjectPath(DB_NAME, BATCH_TABLE_NAME))));
        BatchOperator.execute();
        new CsvSourceStreamOp().setFilePath(IRIS_URL).setSchemaStr(IRIS_SCHEMA_STR).link(new CatalogSinkStreamOp().setCatalogObject(new CatalogObject(mySql, new ObjectPath(DB_NAME, STREAM_TABLE_NAME))));
        StreamOperator.execute();
        new CatalogSourceBatchOp().setCatalogObject(new CatalogObject(mySql, new ObjectPath(DB_NAME, BATCH_TABLE_NAME))).lazyPrintStatistics("< batch catalog source >");
        BatchOperator.execute();
        new CatalogSourceStreamOp().setCatalogObject(new CatalogObject(mySql, new ObjectPath(DB_NAME, STREAM_TABLE_NAME))).sample(0.02).print();
        StreamOperator.execute();
        System.out.println("< tables before drop >");
        System.out.println(JsonConverter.toJson(mySql.listTables(DB_NAME)));
        if (mySql.tableExists(new ObjectPath(DB_NAME, BATCH_TABLE_NAME))) {
            mySql.dropTable(new ObjectPath(DB_NAME, BATCH_TABLE_NAME), false);
        }
        mySql.dropTable(new ObjectPath(DB_NAME, STREAM_TABLE_NAME), true);
        System.out.println("< tables after drop >");
        System.out.println(JsonConverter.toJson(mySql.listTables(DB_NAME)));
        mySql.dropDatabase(DB_NAME, true);
        mySql.close();
    }
}
Also used : CatalogSinkBatchOp(com.alibaba.alink.operator.batch.sink.CatalogSinkBatchOp) CatalogSourceBatchOp(com.alibaba.alink.operator.batch.source.CatalogSourceBatchOp) ObjectPath(org.apache.flink.table.catalog.ObjectPath) CatalogSourceStreamOp(com.alibaba.alink.operator.stream.source.CatalogSourceStreamOp) MySqlCatalog(com.alibaba.alink.common.io.catalog.MySqlCatalog) HashMap(java.util.HashMap) CatalogSinkStreamOp(com.alibaba.alink.operator.stream.sink.CatalogSinkStreamOp) CsvSourceStreamOp(com.alibaba.alink.operator.stream.source.CsvSourceStreamOp) CatalogDatabaseImpl(org.apache.flink.table.catalog.CatalogDatabaseImpl) CsvSourceBatchOp(com.alibaba.alink.operator.batch.source.CsvSourceBatchOp) CatalogObject(com.alibaba.alink.params.io.HasCatalogObject.CatalogObject)

Example 4 with CsvSourceStreamOp

use of com.alibaba.alink.operator.stream.source.CsvSourceStreamOp in project Alink by alibaba.

the class Chap07 method c_1_4.

static void c_1_4() throws Exception {
    CsvSourceBatchOp source = new CsvSourceBatchOp().setFilePath(DATA_DIR + ORIGIN_FILE).setSchemaStr(SCHEMA_STRING);
    source.link(new StratifiedSampleBatchOp().setStrataCol("category").setStrataRatios("Iris-versicolor:0.2,Iris-setosa:0.4,Iris-virginica:0.8")).groupBy("category", "category, COUNT(*) AS cnt").print();
    CsvSourceStreamOp source_stream = new CsvSourceStreamOp().setFilePath(DATA_DIR + ORIGIN_FILE).setSchemaStr(SCHEMA_STRING);
    source_stream.link(new StratifiedSampleStreamOp().setStrataCol("category").setStrataRatios("Iris-versicolor:0.2,Iris-setosa:0.4,Iris-virginica:0.8")).print();
    StreamOperator.execute();
}
Also used : StratifiedSampleStreamOp(com.alibaba.alink.operator.stream.dataproc.StratifiedSampleStreamOp) StratifiedSampleBatchOp(com.alibaba.alink.operator.batch.dataproc.StratifiedSampleBatchOp) CsvSourceStreamOp(com.alibaba.alink.operator.stream.source.CsvSourceStreamOp) CsvSourceBatchOp(com.alibaba.alink.operator.batch.source.CsvSourceBatchOp)

Example 5 with CsvSourceStreamOp

use of com.alibaba.alink.operator.stream.source.CsvSourceStreamOp in project Alink by alibaba.

the class Chap07 method c_1_2.

static void c_1_2() throws Exception {
    CsvSourceBatchOp source = new CsvSourceBatchOp().setFilePath(DATA_DIR + ORIGIN_FILE).setSchemaStr(SCHEMA_STRING);
    source.sampleWithSize(50).lazyPrintStatistics("< after sample with size 50 >").sample(0.1).print();
    source.lazyPrintStatistics("< origin data >").sampleWithSize(150, true).lazyPrintStatistics("< after sample with size 150 >").sample(0.03, true).print();
    CsvSourceStreamOp source_stream = new CsvSourceStreamOp().setFilePath(DATA_DIR + ORIGIN_FILE).setSchemaStr(SCHEMA_STRING);
    source_stream.sample(0.1).print();
    StreamOperator.execute();
}
Also used : CsvSourceStreamOp(com.alibaba.alink.operator.stream.source.CsvSourceStreamOp) CsvSourceBatchOp(com.alibaba.alink.operator.batch.source.CsvSourceBatchOp)

Aggregations

CsvSourceStreamOp (com.alibaba.alink.operator.stream.source.CsvSourceStreamOp)19 CsvSourceBatchOp (com.alibaba.alink.operator.batch.source.CsvSourceBatchOp)15 Test (org.junit.Test)7 PipelineModel (com.alibaba.alink.pipeline.PipelineModel)4 CatalogSinkBatchOp (com.alibaba.alink.operator.batch.sink.CatalogSinkBatchOp)3 AkSourceBatchOp (com.alibaba.alink.operator.batch.source.AkSourceBatchOp)3 CatalogSourceBatchOp (com.alibaba.alink.operator.batch.source.CatalogSourceBatchOp)3 JsonValueStreamOp (com.alibaba.alink.operator.stream.dataproc.JsonValueStreamOp)3 SplitStreamOp (com.alibaba.alink.operator.stream.dataproc.SplitStreamOp)3 EvalBinaryClassStreamOp (com.alibaba.alink.operator.stream.evaluation.EvalBinaryClassStreamOp)3 FtrlPredictStreamOp (com.alibaba.alink.operator.stream.onlinelearning.FtrlPredictStreamOp)3 FtrlTrainStreamOp (com.alibaba.alink.operator.stream.onlinelearning.FtrlTrainStreamOp)3 CatalogSinkStreamOp (com.alibaba.alink.operator.stream.sink.CatalogSinkStreamOp)3 CatalogSourceStreamOp (com.alibaba.alink.operator.stream.source.CatalogSourceStreamOp)3 CatalogObject (com.alibaba.alink.params.io.HasCatalogObject.CatalogObject)3 HashMap (java.util.HashMap)3 CatalogDatabaseImpl (org.apache.flink.table.catalog.CatalogDatabaseImpl)3 ObjectPath (org.apache.flink.table.catalog.ObjectPath)3 FilePath (com.alibaba.alink.common.io.filesystem.FilePath)2 HadoopFileSystem (com.alibaba.alink.common.io.filesystem.HadoopFileSystem)2