当前位置:网站首页>Spark Structured Streaming HelloWorld
Spark Structured Streaming HelloWorld
2022-07-26 04:23:00 【其实我是真性情】
Spark Structured Streaming HelloWorld
前言
Spark Structured Streaming+Kafka+Hbase Scala版教程,整体入口。
正文
1.Spark版本选择
选择你自己服务器对应的版本;文档地址:
https://spark.apache.org/docs/
这个地址打开都是版本号,选择自己环境里的Spark就可以了;
这里我用的是2.4.5;文档发布时间最新版是3.3.3
2.官方例子
进入对应版本之后可以在下边找到Spark的主要功能,如下图
Spark Streaming已经明确标明是老API了,新的API就是Structured Streaming,图里用红圈圈出来了,所以我当前用的就是新API。Structured Streaming
HelloWorld代码
官方的一个简单的word count例子
// Create DataFrame representing the stream of input lines from connection to localhost:9999
val lines = spark.readStream
.format("socket")
.option("host", "localhost")
.option("port", 9999)
.load()
// Split the lines into words
val words = lines.as[String].flatMap(_.split(" "))
// Generate running word count
val wordCounts = words.groupBy("value").count()
批处理代码例子
官方例子,这里说一下我的理解,streamingDF是一个批次的数据;foreachBatch就是循环每个批次;批次里的数据就在batchDF,打印批次号batchId就能看到这个批次号是个自增的数字;
streamingDF.writeStream.foreachBatch {
(batchDF: DataFrame, batchId: Long) =>
//这行是缓存一下,这样后续的操作不会重复的执行前边transform操作了
batchDF.persist()
//对一个批次里的数据进行操作,具体根据是什么操作写法不一样
batchDF.write.format(...).save(...) // location 1
batchDF.write.format(...).save(...) // location 2
//完事必须把缓存释放了
batchDF.unpersist()
}
边栏推荐
- Use of anonymous functions
- Credit card fraud detection based on machine learning
- AcWing. 102 best cattle fence
- How is the launch of ros2 different?
- Apisex's exploration in the field of API and microservices
- egg-sequelize TS编写
- Integrated architecture of performance and cost: modular architecture
- Keil V5 installation and use
- How does win11 22h2 skip networking and Microsoft account login?
- 吴恩达机器学习课后习题——线性回归
猜你喜欢

Wu Enda's machine learning after class exercises - linear regression

华为高层谈 35 岁危机,程序员如何破年龄之忧?

Threadpooltaskexecutor and ThreadPoolExecutor

文献|关系研究能得出因果性结论吗

Acwing_12. 背包问题求具体方案_dp

Retail chain store cashier system source code management commodity classification function logic sharing

理性认知教育机器人寓教于乐的辅助作用
![[project chapter - how to write and map the business model? (3000 word graphic summary suggestions)] project plan of innovation and entrepreneurship competition and application form of national Entrep](/img/e8/b115b85e2e0547545e85b2058a9bb0.png)
[project chapter - how to write and map the business model? (3000 word graphic summary suggestions)] project plan of innovation and entrepreneurship competition and application form of national Entrep

makefile知识再整理(超详细)

Keil V5 installation and use
随机推荐
Compiled by egg serialize TS
I.MX6U-ALPHA开发板(主频和时钟配置实验)
TIA botu WinCC Pro controls the display and hiding of layers through scripts
PathMatchingResourcePatternResolver解析配置文件 资源文件
Use of rule engine drools
雅迪高端之后开始变慢
Getting started with mongodb Basics
Support proxy direct connection to Oracle database, jumpserver fortress v2.24.0 release
Matlab drawing
Life related - ten years of career experience (turn)
How does win11 22h2 skip networking and Microsoft account login?
Comprehensive evaluation and decision-making method
1. Excel的IF函数
Which websites can I visit to check the latest medical literature?
第三篇如何使用SourceTree提交代码
Solution: runtimeerror: expected object of scalar type int but got scalar type double
MySQL日志分类:错误日志、二进制日志、查询日志、慢查询日志
【学习笔记】AGC041
解决:RuntimeError: Expected object of scalar type Int but got scalar type Double
UE4 键盘控制开关灯