当前位置：网站首页>Flink stream processing API collection: master all Flink stream processing technologies. Just read this article

Flink stream processing API collection: master all Flink stream processing technologies. Just read this article

2022-06-28 12:38:00 【InfoQ】

notes ： The content of this article is pure dry goods , More words , It is suggested to like the collection first and study it slowly ！

Preface

It was mentioned in the previous article , One flink The steps of application development are roughly five steps ： Build the execution environment 、 Acquisition data source 、 Operation data source 、 Output to an external system 、 Trigger program execution . These five modules make up a flink Mission , Next, focus on the corresponding... Of each module API To comb . All the following code cases have been included in my Gitee Warehouse , Students who need to click the link to get ：Gitee Address ：

https://gitee.com/xiaoZcode/flink_test

One 、 Build a flow execution environment （Environment）

getExecutionEnvironment()

Create an execution environment , Represents the context of the currently executing program . If the program is called independently , Then this method returns the local execution environment ; If you call the program from the command line client to submit to the cluster , Then this method returns the execution environment of this cluster . It will decide what kind of running environment to return according to the way the query runs , Is the most common way to create an execution environment .

The code is as follows ：

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();

createLocalEnvironment()

Return to the local execution environment , You need to specify the default parallelism when calling .

The code is as follows ：

LocalStreamEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(1);

createRemoteEnvironment()

Return to the cluster execution environment , take Jar Submit to the remote server . Need to be specified when calling JobManager Of IP And port number , And specify the to run in the cluster Jar package .

The code is as follows ：

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createRemoteEnvironment(&quot;jobmanage-hostname&quot;, 6123, &quot;YOURPATH//xxx.jar&quot;);

Two 、 Load data source （Source）

Case scenario ：

In the context of industrial Internet of things , Collect the temperature value of the sensor , Calculate and analyze the temperature values collected from different sensors . notes ： The following code is written around this scenario , For more complete source code, please move to the beginning of the article .

Create sensor object ：SensorReading

public class SensorReading {

 private String id;
 private Long timestamp;
 private Double temperature;

 public SensorReading() {

 }

 public SensorReading(String id, Long timestamp, Double temperature) {
 this.id = id;
 this.timestamp = timestamp;
 this.temperature = temperature;
 }

 public String getId() {
 return id;
 }

 public void setId(String id) {
 this.id = id;
 }

 public Long getTimestamp() {
 return timestamp;
 }

 public void setTimestamp(Long timestamp) {
 this.timestamp = timestamp;
 }

 public Double getTemperature() {
 return temperature;
 }

 public void setTemperature(Double temperature) {
 this.temperature = temperature;
 }

 @Override
 public String toString() {
 return &quot;SensorReading{&quot; +
 &quot;id='&quot; + id + '\'' +
 &quot;, timestamp=&quot; + timestamp +
 &quot;, temperature=&quot; + temperature +
 '}';
 }
}

Reading data from a collection

public class SourceTest1_Collection {
 public static void main(String[] args) throws Exception {
 //  Create an execution environment 
 StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
 // Set the parallelism to  1
 env.setParallelism(1);

 // Reading data from a collection 
 DataStream<SensorReading> dataStream = env.fromCollection(Arrays.asList(
 new SensorReading(&quot;sensor_1&quot;, 1547718199L, 35.8),
 new SensorReading(&quot;sensor_2&quot;, 1547718199L, 35.0),
 new SensorReading(&quot;sensor_3&quot;, 1547718199L, 38.8),
 new SensorReading(&quot;sensor_4&quot;, 1547718199L, 39.8)
 ));

 DataStream<Integer> integerDataStream = env.fromElements(1, 2, 3, 4, 5, 789);

 // Printout 
 dataStream.print(&quot;data&quot;);
 integerDataStream.print(&quot;int&quot;);

 // Execution procedure 
 env.execute();
 }
}

Read data from file

Get the core code part of the data source from the file ：

DataStream<String> dataStream = env.readTextFile(&quot;xxx &quot;);

public class SourceTest2_File {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(1);

 // Read data from file 
 DataStream<String> dataStream = env.readTextFile(&quot;sensor.txt&quot;);

 dataStream.print();

 env.execute();
 }
}

from Kafka Reading data

The first thing we need to do is introduce Kafka Since the beginning of the project

<dependency>
 <groupId>org.apache.flink</groupId>
 <artifactId>flink-connector-kafka-0.11_2.12</artifactId>
 <version>1.10.1</version>
</dependency>

public class SourceTest3_Kafka {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(1);

 Properties properties=new Properties();
 properties.setProperty(&quot;bootstrap.servers&quot;,&quot;localhost:9092&quot;);
 properties.setProperty(&quot;group.id&quot;,&quot;consumer-group&quot;);
 properties.setProperty(&quot;key.deserializer&quot;, &quot;org.apache.kafka.common.serialization.StringDeserializer&quot;);
 properties.setProperty(&quot;value.deserializer&quot;, &quot;org.apache.kafka.common.serialization.StringDeserializer&quot;);
 properties.setProperty(&quot;auto.offset.reset&quot;,&quot;latest&quot;);

 DataStream<String> dataStream=env.addSource(new FlinkKafkaConsumer011<String>(&quot;sensor&quot;,new SimpleStringSchema(),properties));

 dataStream.print();

 env.execute();

 }
}

Custom data sources Source

Except from the collection 、 Files and Kafka Get data from , It also provides us with a custom source The way , Need to sourceFunction function . The core code is as follows ：

DataStream<SensorReading> dataStream = env.addSource( new MySensor());

public class SourceTest4_UDF {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(1);

 // Read data from file 
 DataStream<SensorReading> dataStream = env.addSource(new MySensorSource());

 dataStream.print();

 env.execute();
 }

 //  Implement custom data source 
 public static class MySensorSource implements SourceFunction<SensorReading>{
 //  Define a marker bit , Control data generation 
 private boolean running = true;

 @Override
 public void run(SourceContext<SensorReading> ctv) throws Exception {
 //  random number 
 Random random=new Random();

 // Set up 10 An initial temperature 
 HashMap<String, Double> sensorTempMap = new HashMap<>();
 for (int i = 0; i < 10; i++) {
 sensorTempMap.put(&quot;sensor_&quot;+(i+1), 60 + random.nextGaussian() * 20); //  Normal distribution 
 }
 while (running){
 for (String sensorId: sensorTempMap.keySet()) {
 Double newTemp = sensorTempMap.get(sensorId) + random.nextGaussian();
 sensorTempMap.put(sensorId,newTemp);
 ctv.collect(new SensorReading(sensorId,System.currentTimeMillis(),newTemp));
 }
 Thread.sleep(1000);
 }
 }

 @Override
 public void cancel() {
 running=false;
 }
 }
}

3、 ... and 、 Conversion operator （Transform）

After obtaining the specified data source , We also need to analyze and calculate the data source ,

Basic transformation operator ：Map、flatMap、Filter

public class TransformTest1_Base {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(1);

 // Read data from file 
 DataStream<String> inputStream = env.readTextFile(&quot;sensor.txt&quot;);

 // 1. map  hold String Convert to length generation 
 DataStream<Integer> mapStream = inputStream.map(new MapFunction<String, Integer>() {
 @Override
 public Integer map(String value) throws Exception {
 return value.length();
 }
 });

 // 2. flatmap  Segment fields by commas 
 DataStream<String> flatMapStream = inputStream.flatMap(new FlatMapFunction<String, String>() {
 @Override
 public void flatMap(String value, Collector<String> out) throws Exception {
 String[] fields=value.split(&quot;,&quot;);
 for (String field : fields){
 out.collect(field);
 }
 }
 });

 // 3. filter , Screening sensor_1  Right at the beginning id Corresponding data 
 DataStream<String> filterStream=inputStream.filter(new FilterFunction<String>() {
 @Override
 public boolean filter(String value) throws Exception {
 return value.startsWith(&quot;sensor_1&quot;);
 }
 });

 //  Printout 
 mapStream.print(&quot;map&quot;);
 flatMapStream.print(&quot;flatMap&quot;);
 filterStream.print(&quot;filter&quot;);

 //  Execution procedure 
 env.execute();
 }
}

KeyBy、 Rolling aggregation operator 【sum()、min()、max()、minBy()、maxBy()】

KeyBy：DataStream → KeyedStream： Logically split a flow into disjoint partitions , Each partition contains the same key The elements of , Inside with hash In the form of .

The above operators can be used for KeyedStream Every branch of the river converges .

public class TransformTest2_RollingAggregation {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(1);

 // Read data from file 
 DataStream<String> inputStream = env.readTextFile(&quot;sensor.txt&quot;);

 //  convert to SensorReading type 
 DataStream<SensorReading> dataStream=inputStream.map(new MapFunction<String, SensorReading>() {
 @Override
 public SensorReading map(String s) throws Exception {
 String[] fields=s.split(&quot;,&quot;);
 return new SensorReading(fields[0],new Long(fields[1]),new Double(fields[2]));
 }
 });

 // DataStream<SensorReading> dataStream = inputStream.map(line -> {
 // String[] fields = line.split(&quot;,&quot;);
 // return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
 // });

 //  grouping 
 KeyedStream<SensorReading, Tuple> keyedStream = dataStream.keyBy(&quot;id&quot;);
 // KeyedStream<SensorReading, String> keyedStream1 = dataStream.keyBy(SensorReading::getId);

 // Rolling aggregation , Take the current maximum temperature value 
 // DataStream<SensorReading> resultStream = keyedStream.maxBy(&quot;temperature&quot;);
 DataStream<SensorReading> resultStream = keyedStream.maxBy(&quot;temperature&quot;);

 resultStream.print();

 env.execute();
 }
}

Reduce

KeyedStream → DataStream： Aggregation operation of a packet data stream , Merge the current element with the result of the last aggregation , Generate a new value , The returned stream contains the result of each aggregation , Instead of just returning the final result of the last aggregation .

public class TransformTest3_Reduce {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(1);

 // Read data from file 
 DataStream<String> inputStream = env.readTextFile(&quot;sensor.txt&quot;);

 //  convert to SensorReading type 
 DataStream<SensorReading> dataStream=inputStream.map(new MapFunction<String, SensorReading>() {
 @Override
 public SensorReading map(String s) throws Exception {
 String[] fields=s.split(&quot;,&quot;);
 return new SensorReading(fields[0],new Long(fields[1]),new Double(fields[2]));
 }
 });
 //  grouping 
 KeyedStream<SensorReading, Tuple> keyedStream = dataStream.keyBy(&quot;id&quot;);

 // reduce  polymerization , Take the maximum temperature , And the latest timestamp 
 DataStream<SensorReading> resultStream = keyedStream.reduce(new ReduceFunction<SensorReading>() {
 @Override
 public SensorReading reduce(SensorReading value1, SensorReading value2) throws Exception {
 return new SensorReading(value1.getId(), value2.getTimestamp(), Math.max(value1.getTemperature(), value2.getTemperature()));
 }
 });
 resultStream.print();
 env.execute();
 }
}

shunt 【Split 、Select】、 Confluence 【Connect 、CoMap、union】

Split

DataStream → SplitStream： According to some characteristics, put a DataStream Split into two or more DataStream.

null

Select

SplitStream→DataStream： From a SplitStream Get one or more DataStream.

null

Connect

DataStream,DataStream → ConnectedStreams： Connect two data streams that maintain their type , Two data streams are Connect after , It's just being put in the same stream , Internally, the data and forms remain unchanged , The two streams are independent of each other .

null

CoMap、CoFlatMap

ConnectedStreams → DataStream： Act on ConnectedStreams On , Function and map and flatMap equally , Yes ConnectedStreams Each of them Stream separately map and flatMap Handle .

null

Union

DataStream → DataStream： For two or more DataStream Conduct union operation , Produce a system that contains all DataStream The new element DataStream.

null

DataStream<SensorReading> unionStream = xxxstream.union(xxx);

==Connect And Union difference ：==

Union The previous two streams must be of the same type ,Connect It can be different , In the following coMap And then adjust to become the same .

Connect Only two streams can be manipulated ,Union You can operate multiple .

public class TransformTest4_MultipleStreams {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(1);

 // Read data from file 
 DataStream<String> inputStream = env.readTextFile(&quot;sensor.txt&quot;);

 //  convert to SensorReading type 
 DataStream<SensorReading> dataStream=inputStream.map(new MapFunction<String, SensorReading>() {
 @Override
 public SensorReading map(String s) throws Exception {
 String[] fields=s.split(&quot;,&quot;);
 return new SensorReading(fields[0],new Long(fields[1]),new Double(fields[2]));
 }
 });

 // 1. shunt   According to the temperature value 30 Divide the flow by degrees 
 SplitStream<SensorReading> splitStream = dataStream.split(new OutputSelector<SensorReading>() {
 @Override
 public Iterable<String> select(SensorReading value) {
 return (value.getTemperature() > 30) ? Collections.singletonList(&quot;high&quot;) : Collections.singletonList(&quot;low&quot;);
 }
 });
 //  Select the corresponding stream data through conditions 
 DataStream<SensorReading> highTempStream = splitStream.select(&quot;high&quot;);
 DataStream<SensorReading> lowTempStream = splitStream.select(&quot;low&quot;);
 DataStream<SensorReading> allTempStream = splitStream.select(&quot;high&quot;,&quot;low&quot;);

 highTempStream.print(&quot;high&quot;);
 lowTempStream.print(&quot;low&quot;);
 allTempStream.print(&quot;all&quot;);

 // 2. Confluence  connect, First convert the high-temperature flow into two tuples , Combined with low temperature flow , Output status information .
 DataStream<Tuple2<String, Double>> warningStream = highTempStream.map(new MapFunction<SensorReading, Tuple2<String, Double>>() {
 @Override
 public Tuple2<String, Double> map(SensorReading value) throws Exception {
 return new Tuple2<>(value.getId(), value.getTemperature());
 }
 });

 //  Only two streams can be merged , However, the data types of the two streams can be different 
 ConnectedStreams<Tuple2<String, Double>, SensorReading> connectStream = warningStream.connect(lowTempStream);
 DataStream<Object> resultStream = connectStream.map(new CoMapFunction<Tuple2<String, Double>, SensorReading, Object>() {
 @Override
 public Object map1(Tuple2<String, Double> value) throws Exception {
 return new Tuple3<>(value.f0, value.f1, &quot;high temp warning&quot;);
 }

 @Override
 public Object map2(SensorReading value) throws Exception {
 return new Tuple2<>(value.getId(), &quot;normal&quot;);
 }
 });

 resultStream.print();

 // 3.union Join multiple streams   The limitation is that the data type of each stream must be consistent 
 DataStream<SensorReading> union = highTempStream.union(lowTempStream, allTempStream);
 union.print(&quot;union stream&quot;);

 env.execute();
 }
}

Four 、 Data output （Sink）

Flink The official provides part of the framework Sink, Users can also customize the implementation Sink.flink The operation core code that outputs the task ：

stream.addSink(new MySink(xxxx))

Kafka

introduce Kafka rely on ：

<dependency>
 <groupId>org.apache.flink</groupId>
 <artifactId>flink-connector-kafka-0.11_2.12</artifactId>
 <version>1.10.1</version>
</dependency>

public class SinkTest1_Kafka {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(1);

 // Read data from file 
 DataStream<String> inputStream = env.readTextFile(&quot;/Volumes/Update/flink/flink_test/src/main/resources/sensor.txt&quot;);

 //  convert to SensorReading type 
 DataStream<String> dataStream=inputStream.map(new MapFunction<String, String>() {
 @Override
 public String map(String s) throws Exception {
 String[] fields=s.split(&quot;,&quot;);
 return new SensorReading(fields[0],new Long(fields[1]),new Double(fields[2])).toString();
 }
 });

 // Output to an external system 
 dataStream.addSink(new FlinkKafkaProducer011<String>(&quot;localhost:9092&quot;,&quot;sinktest&quot;,new SimpleStringSchema()));

 env.execute();
 }
}

Redis

introduce Redis rely on ：

<dependency>
 <groupId>org.apache.bahir</groupId>
 <artifactId>flink-connector-redis_2.11</artifactId>
 <version>1.0</version>
</dependency>

public class SinkTest2_Redis {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(1);

 // Read data from file 
 DataStream<String> inputStream = env.readTextFile(&quot;/Volumes/Update/flink/flink_test/src/main/resources/sensor.txt&quot;);

 //  convert to SensorReading type 
 DataStream<SensorReading> dataStream=inputStream.map(new MapFunction<String, SensorReading>() {
 @Override
 public SensorReading map(String s) throws Exception {
 String[] fields=s.split(&quot;,&quot;);
 return new SensorReading(fields[0],new Long(fields[1]),new Double(fields[2]));
 }
 });
 // jedis To configure 
 FlinkJedisPoolConfig config = new FlinkJedisPoolConfig.Builder()
 .setHost(&quot;localhost&quot;)
 .setPort(6379)
 .build();
 dataStream.addSink(new RedisSink<>(config,new MyRedisMapper()));


 env.execute();
 }
 //  Customize RedisMapper
 public static class MyRedisMapper implements RedisMapper<SensorReading>{
 // Custom save data to Redis The order of , Deposit hash surface Hset
 @Override
 public RedisCommandDescription getCommandDescription() {
 return new RedisCommandDescription(RedisCommand.HSET,&quot;sensor_temp&quot;);
 }

 @Override
 public String getKeyFromData(SensorReading data) {
 return data.getId();
 }

 @Override
 public String getValueFromData(SensorReading data) {
 return data.getTemperature().toString();
 }
 }

}

Elasticsearch

Introduce dependencies ：

<dependency>
 <groupId>org.apache.flink</groupId>
 <artifactId>flink-connector-elasticsearch6_2.12</artifactId>
 <version>1.10.1</version>
</dependency>

public class SinkTest3_ES {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env;
 env = StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(1);

 // Read data from file 
 DataStream<String> inputStream = env.readTextFile(&quot;/Volumes/Update/flink/flink_test/src/main/resources/sensor.txt&quot;);

 //  convert to SensorReading type 
 DataStream<SensorReading> dataStream=inputStream.map(new MapFunction<String, SensorReading>() {
 public SensorReading map(String s) throws Exception {
 String[] fields=s.split(&quot;,&quot;);
 return new SensorReading(fields[0],new Long(fields[1]),new Double(fields[2]));
 }
 });
 //  Definition ES Link configuration for 
 ArrayList<HttpHost> httpHosts = new ArrayList<>();
 httpHosts.add(new HttpHost(&quot;localhost&quot;,9200));

 dataStream.addSink(new ElasticsearchSink.Builder<SensorReading>(httpHosts,new MyEsSinkFunction()).build());
 
 env.execute();
 }

 // Implement custom ES Write operation 
 public static class MyEsSinkFunction implements ElasticsearchSinkFunction<SensorReading> {
 @Override
 public void process(SensorReading element, RuntimeContext ctx, RequestIndexer indexer) {
 //  Define the data to be written source
 HashMap<String, String> dataSource = new HashMap<>();
 dataSource.put(&quot;id&quot;,element.getId());
 dataSource.put(&quot;temp&quot;,element.getTemperature().toString());
 dataSource.put(&quot;ts&quot;,element.getTimestamp().toString());

 //  Create a request to ES Initiated write command 
 IndexRequest indexRequest = Requests.indexRequest()
 .index(&quot;sensor&quot;)
 .type(&quot;readingdata&quot;)
 .source(dataSource);

 //  use indexer Send a request 
 indexer.add(indexRequest);
 }
 }
}

Customize Sink（JDBC）

Introduce dependencies ：

<dependency>
 <groupId>mysql</groupId>
 <artifactId>mysql-connector-java</artifactId>
 <version>5.1.44</version>
</dependency>

public class SinkTest4_JDBC {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(1);

 // Read data from file 
 DataStream<String> inputStream = env.readTextFile(&quot;sensor.txt&quot;);

 //  convert to SensorReading type 
 DataStream<SensorReading> dataStream=inputStream.map(new MapFunction<String, SensorReading>() {
 @Override
 public SensorReading map(String s) throws Exception {
 String[] fields=s.split(&quot;,&quot;);
 return new SensorReading(fields[0],new Long(fields[1]),new Double(fields[2]));
 }
 });

 dataStream.addSink(new MyJDBCSink());
 env.execute();
 }

 //  Implement customization SinkFunction
 public static class MyJDBCSink extends RichSinkFunction<SensorReading> {
 // Declare connections and precompiles 
 Connection connection=null;
 PreparedStatement insert=null;
 PreparedStatement update=null;
 @Override
 public void open(Configuration parameters) throws Exception {
 connection= DriverManager.getConnection(&quot;jdbc:mysql://localhost:3306/test&quot;,&quot;root&quot;,&quot;123456&quot;);
 insert=connection.prepareStatement(&quot;insert into sensor_temp (id,temp) values (?,?)&quot;);
 update=connection.prepareStatement(&quot;update sensor_temp set temp = ? where id = ? &quot;);
 }

 //  Every single piece of data , Call link , perform sql
 @Override
 public void invoke(SensorReading value, Context context) throws Exception {
 //  Execute the update directly 
 update.setDouble(1,value.getTemperature());
 update.setString(2,value.getId());
 update.execute();
 if (update.getUpdateCount() == 0){
 insert.setString(1,value.getId());
 insert.setDouble(2,value.getTemperature());
 insert.execute();
 }
 }

 //  Close the connection flow 
 @Override
 public void close() throws Exception {
 connection.close();
 insert.close();
 update.close();
 }
 }
}

5、 ... and 、 data type 、UDF function 、 Rich function

Flink Supported data types

Flink Support all Java and Scala Basic data type ,Int, Double, Long, String etc.

DataStream<Integer> numberStream = env.fromElements(1, 2, 3, 4);

Java and Scala Tuples （Tuples）

DataStream<Tuple2<String, Integer>> personStream = env.fromElements(
 new Tuple2(&quot;Adam&quot;, 17),
 new Tuple2(&quot;Sarah&quot;, 23) );
personStream.filter(p -> p.f1 > 18);

Flink Yes Java and Scala Some special purpose types in are also supported , such as Java Of ArrayList,HashMap,Enum wait

UDF function

Flink Exposed everything udf Function interface ( The implementation mode is interface or abstract class ). for example MapFunction, FilterFunction, ProcessFunction wait .

Rich function （Rich Functions）

“ Rich function ” yes DataStream API Provides a function class interface , all Flink Function classes have their own Rich edition . It differs from the conventional function in that , You can get the context of the running environment , And have some life cycle approaches , So more complex functions can be implemented .RichMapFunction、RichFlatMapFunction、RichFilterFunction

==Rich Function There is a concept of life cycle . Typical life cycle approaches are ：==

open() The method is rich function Initialization method of , When an operator, for example map perhaps filter Before being called open() Will be called .

close() Method is the last method called in the life cycle , Do some cleaning .

getRuntimeContext() Method provides the function with RuntimeContext Some information , E.g. letter The degree of parallelism of number execution , The name of the mission , as well as state state .

public class TransformTest5_RichFunction {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
 env.setParallelism(4);

 // Read data from file 
 DataStream<String> inputStream = env.readTextFile(&quot;sensor.txt&quot;);

 //  convert to SensorReading type 
 DataStream<SensorReading> dataStream=inputStream.map(new MapFunction<String, SensorReading>() {
 @Override
 public SensorReading map(String s) throws Exception {
 String[] fields=s.split(&quot;,&quot;);
 return new SensorReading(fields[0],new Long(fields[1]),new Double(fields[2]));
 }
 });

 DataStream<Tuple2<String,Integer>> resultStream=dataStream.map(new MyMapper());
 resultStream.print();

 env.execute();
 }

 public static class MyMapper0 implements MapFunction<SensorReading,Tuple2<String,Integer>>{
 @Override
 public Tuple2<String, Integer> map(SensorReading value) throws Exception {
 return new Tuple2<>(value.getId(),value.getId().length());
 }
 }

 //  Inheritance rich function 
 public static class MyMapper extends RichMapFunction<SensorReading,Tuple2<String,Integer>>{
 @Override
 public Tuple2<String, Integer> map(SensorReading value) throws Exception {
 // getRuntimeContext().getState()
 return new Tuple2<String,Integer>(value.getId(),getRuntimeContext().getIndexOfThisSubtask());
 }

 @Override
 public void open(Configuration parameters) throws Exception {
 //  Initialization work , Generally, it is to define the State , Or create a database link 
 System.out.println(&quot;open&quot;);
 // super.open(parameters);
 }

 @Override
 public void close() throws Exception {
 //  Close links , Closing status 
 System.out.println(&quot;close&quot;);
 // super.close();
 }
 }
}