当前位置:网站首页>11.1.2 overview of Flink_ Wordcount case

11.1.2 overview of Flink_ Wordcount case

2022-06-26 00:26:00 Loves_ dccBigData

WordCount Case study

flink The operator in is a stateful operator
— same key Be assigned to the same task in
— Note the error of implicit conversion

import org.apache.flink.streaming.api.scala._

object Demo01WordCount {
    

  def main(args: Array[String]): Unit = {
    
    // establish flink Environment 
    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment

    /**
      *  Use nc To simulate streaming data 
      * yum install nc
      * nc -lk 8888
      */
      // Set parallelism , If it is not set , Different data are different task in ,12 Nuclear is 12 Before a number 
      // The data is divided into different cores to calculate the data 
    env.setParallelism(1)
    // Read scoket The data of 
    val lineDS: DataStream[String] = env.socketTextStream("master", 8888)

    // Take the data apart 
    val wordsDS: DataStream[String] = lineDS.flatMap(_.split(","))

    // Set the number after the data to 1
    val kvDS: DataStream[(String, Int)] = wordsDS.map((_, 1))

    // Use KeyBy Grouping , Put different data in different places reduce task in 
    val keyByDS: KeyedStream[(String, Int), String] = kvDS.keyBy(_._1)

    // Sum up , Sum the second element , Incoming subscript , Divide by subscript data , Subscript from 0 Start 
    // There are state operators 
    val countDS: DataStream[(String, Int)] = keyByDS.sum(1)

    // Printout 
    countDS.print()

    // start-up , Execution flow ,7*24 Perform the task in hours 
    env.execute()
  }

}
原网站

版权声明
本文为[Loves_ dccBigData]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206252121244507.html