Search in sources :

Example 1 with ChainedTransformer

use of org.apache.hudi.sink.transform.ChainedTransformer in project hudi by apache.

the class ITTestDataStreamWrite method testChainedTransformersBeforeWriting.

@Test
public void testChainedTransformersBeforeWriting() throws Exception {
    Transformer t1 = (ds) -> ds.map((rowdata) -> {
        if (rowdata instanceof GenericRowData) {
            GenericRowData genericRD = (GenericRowData) rowdata;
            // update age field to age + 1
            genericRD.setField(2, genericRD.getInt(2) + 1);
            return genericRD;
        } else {
            throw new RuntimeException("Unrecognized row type : " + rowdata.getClass().getSimpleName());
        }
    });
    ChainedTransformer chainedTransformer = new ChainedTransformer(Arrays.asList(t1, t1));
    testWriteToHoodie(chainedTransformer, EXPECTED_CHAINED_TRANSFORMER);
}
Also used : FilePathFilter(org.apache.flink.api.common.io.FilePathFilter) Arrays(java.util.Arrays) FileProcessingMode(org.apache.flink.streaming.api.functions.source.FileProcessingMode) TestConfigurations(org.apache.hudi.utils.TestConfigurations) CheckpointingMode(org.apache.flink.streaming.api.CheckpointingMode) HashMap(java.util.HashMap) JobStatus(org.apache.flink.api.common.JobStatus) RowType(org.apache.flink.table.types.logical.RowType) ChainedTransformer(org.apache.hudi.sink.transform.ChainedTransformer) BasicTypeInfo(org.apache.flink.api.common.typeinfo.BasicTypeInfo) HoodieTableType(org.apache.hudi.common.model.HoodieTableType) GenericRowData(org.apache.flink.table.data.GenericRowData) Path(org.apache.flink.core.fs.Path) Map(java.util.Map) TestLogger(org.apache.flink.util.TestLogger) StreamerUtil(org.apache.hudi.util.StreamerUtil) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) Pipelines(org.apache.hudi.sink.utils.Pipelines) ValueSource(org.junit.jupiter.params.provider.ValueSource) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) RowData(org.apache.flink.table.data.RowData) AvroSchemaConverter(org.apache.hudi.util.AvroSchemaConverter) Configuration(org.apache.flink.configuration.Configuration) TestData(org.apache.hudi.utils.TestData) TimestampFormat(org.apache.flink.formats.common.TimestampFormat) JobClient(org.apache.flink.core.execution.JobClient) File(java.io.File) StandardCharsets(java.nio.charset.StandardCharsets) DataStream(org.apache.flink.streaming.api.datastream.DataStream) Test(org.junit.jupiter.api.Test) Objects(java.util.Objects) TimeUnit(java.util.concurrent.TimeUnit) ContinuousFileSource(org.apache.hudi.utils.source.ContinuousFileSource) TextInputFormat(org.apache.flink.api.java.io.TextInputFormat) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest) List(java.util.List) InternalTypeInfo(org.apache.flink.table.runtime.typeutils.InternalTypeInfo) TempDir(org.junit.jupiter.api.io.TempDir) JsonRowDataDeserializationSchema(org.apache.flink.formats.json.JsonRowDataDeserializationSchema) FlinkOptions(org.apache.hudi.configuration.FlinkOptions) Transformer(org.apache.hudi.sink.transform.Transformer) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) ChainedTransformer(org.apache.hudi.sink.transform.ChainedTransformer) Transformer(org.apache.hudi.sink.transform.Transformer) ChainedTransformer(org.apache.hudi.sink.transform.ChainedTransformer) GenericRowData(org.apache.flink.table.data.GenericRowData) Test(org.junit.jupiter.api.Test) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest)

Aggregations

File (java.io.File)1 StandardCharsets (java.nio.charset.StandardCharsets)1 Arrays (java.util.Arrays)1 HashMap (java.util.HashMap)1 List (java.util.List)1 Map (java.util.Map)1 Objects (java.util.Objects)1 TimeUnit (java.util.concurrent.TimeUnit)1 JobStatus (org.apache.flink.api.common.JobStatus)1 FilePathFilter (org.apache.flink.api.common.io.FilePathFilter)1 BasicTypeInfo (org.apache.flink.api.common.typeinfo.BasicTypeInfo)1 TypeInformation (org.apache.flink.api.common.typeinfo.TypeInformation)1 TextInputFormat (org.apache.flink.api.java.io.TextInputFormat)1 Configuration (org.apache.flink.configuration.Configuration)1 JobClient (org.apache.flink.core.execution.JobClient)1 Path (org.apache.flink.core.fs.Path)1 TimestampFormat (org.apache.flink.formats.common.TimestampFormat)1 JsonRowDataDeserializationSchema (org.apache.flink.formats.json.JsonRowDataDeserializationSchema)1 CheckpointingMode (org.apache.flink.streaming.api.CheckpointingMode)1 DataStream (org.apache.flink.streaming.api.datastream.DataStream)1