当前位置:网站首页>Spark Structured Streaming HelloWorld
Spark Structured Streaming HelloWorld
2022-07-26 04:29:00 【Actually, I have a true disposition】
Spark Structured Streaming HelloWorld
Preface
Spark Structured Streaming+Kafka+Hbase Scala Version tutorial , The whole entrance .
Text
1.Spark Version selection
Select the corresponding version of your server ; Document address :
https://spark.apache.org/docs/
This address is opened with the version number , Choose your own environment Spark That's all right. ;
So here I'm going to use theta 2.4.5; The latest version of the document is 3.3.3
2. The official example
After entering the corresponding version, you can find Spark The main function of , Here's the picture
Spark Streaming It has been clearly marked as old API 了 , new API Namely Structured Streaming, The picture is circled in red , So what I am currently using is new API.Structured Streaming
HelloWorld Code
The official one is simple word count Example
// Create DataFrame representing the stream of input lines from connection to localhost:9999
val lines = spark.readStream
.format("socket")
.option("host", "localhost")
.option("port", 9999)
.load()
// Split the lines into words
val words = lines.as[String].flatMap(_.split(" "))
// Generate running word count
val wordCounts = words.groupBy("value").count()
Batch code example
The official example , Here is my understanding ,streamingDF Is the data of a batch ;foreachBatch Is to cycle each batch ; The data in the batch is batchDF, Print batch number batchId You can see that this batch number is a self increasing number ;
streamingDF.writeStream.foreachBatch {
(batchDF: DataFrame, batchId: Long) =>
// This line is for caching , In this way, subsequent operations will not repeat the previous transform Operation
batchDF.persist()
// Operate the data in a batch , The specific basis is what operation is written differently
batchDF.write.format(...).save(...) // location 1
batchDF.write.format(...).save(...) // location 2
// The cache must be released after completion
batchDF.unpersist()
}
边栏推荐
- 自动化测试框架该如何搭建?
- 机器学习之桑基图(用于用户行为分析)
- 十、拦截器
- 1. If function of Excel
- qt编译报错整理及Remote模块下载
- 五、域对象共享数据
- Huawei issued another global convening order of "genius youth", and some people once gave up their annual salary of 3.6 million to join
- 解析Steam教育的课程设计测评体系
- Analyzing the curriculum design evaluation system of steam Education
- Several methods of realizing high-low byte or high-low word exchange in TIA botu s7-1200
猜你喜欢

数组排序2
![[C language foundation] 13 preprocessor](/img/4c/ab25d88e9a0cf29bde6e33a2b14225.jpg)
[C language foundation] 13 preprocessor

数组排序3

Recommendation Book Educational Psychology: a book for tomorrow's teachers~

Support proxy direct connection to Oracle database, jumpserver fortress v2.24.0 release

Several methods of realizing high-low byte or high-low word exchange in TIA botu s7-1200

What are the consequences and problems of computer system restoration

How to download the supplementary literature?

UE4 通过按键控制物体的旋转

SEGGER Embedded Studio找不到xxx.c或者xxx.h文件
随机推荐
Keil v5安装和使用
egg-ts-sequelize-CLI
再获认可 | 赛宁网安连续上榜《CCSIP 2022中国网络安全产业全景图》
Rotate array minimum number
Day24 job
UE4 获取玩家控制权的两种方式
10、 Interceptor
支持代理直连Oracle数据库,JumpServer堡垒机v2.24.0发布
低成本、快速、高效搭建数字藏品APP、H5系统,扇贝科技专业开发更放心!
建设面向青少年的创客教育实验室
Segger embedded studio cannot find xxx.c or xxx.h file
UE4 多个角色控制权的切换
How to make your language academic when writing a thesis? Just remember four sentences!
Why is mongodb fast
How does win11 change the power mode? Win11 method of changing power mode
Huawei issued another global convening order of "genius youth", and some people once gave up their annual salary of 3.6 million to join
Recommendation | scholar's art and Tao: writing papers is a skill
5、 Domain objects share data
远坂凛壁纸
Cnosdb Nirvana Rebirth: abandon go and fully embrace rust