当前位置:网站首页>Spark calculation operator and some small details in liunx
Spark calculation operator and some small details in liunx
2022-07-06 17:39:00 【Bald Second Senior brother】
Spark -map operator
map operator :
object Spark01_Oper {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("Value")
val cs = new SparkContext(conf)
val make = cs.makeRDD(1 to 10)
//map operator
val mapRdd = make.map(x => x * 2)
mapRdd.collect().foreach(println)
}
}
map Operators are used to calculate the data in all incoming partitions one by one .
mapPartRdd operator
object Spark02_OPer {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("mapPart")
val sc = new SparkContext(conf)
//map operator
val list = sc.makeRDD(1 to 10)
val mapPartRdd = list.mapPartitions(datas => {datas.map(data => data*2)})
mapPartRdd.collect().foreach(println)
}
}mapPartRdd operator Be similar to map But it calculates data by partition , The output value of his calculation is a list
mapPartitionsWithIndex operator
object Spark03_OPer {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("With")
val sc = new SparkContext(conf)
val list = sc.makeRDD(1 to 10,2)
val indexRDD = list.mapPartitionsWithIndex {
case (num, datas) => {
datas.map((_," Zone number :"+num))
}
}
indexRDD.collect().foreach(println)
}
}mapPartitionsWithIndex The operator is similar to mapPartiyions But in func There will be an index value representing the partition , therefore func There will be one more function similar to Int.
spark Possible problems :
Because every time the calculation data will produce new data, but it will not be deleted , Accumulating all the time will cause memory overflow (OOM)
Driver And Executor The difference between
Driver:
Driver Just create Spark The classes of context objects can be said to be Driver,Driver yes Spark in Application That is, the code release program , It can be understood that it is written for us spark The main program of the code , Secondly, he is also responsible for Executor To allocate tasks ,Driver There can only be one
Executor:
Executor yes Spark In charge of resource calculation , He can exist in multiple .
difference :
Drvier Like a boss , and Executor yes Driver The hands of ,Driver Be responsible for assigning tasks to Executor To execute .
Linux Pick up
linux Method of switching on and off
1. To turn it off : shutdown -h restart :shutdown -r
2. To turn it off : inti -0 restart : init -6
3. To turn it off : poweroff restart :reboot
service And systemctl The difference between
service:
You can start 、 stop it 、 Restart and shut down system services , It can also display the current status of all system services ,service The function of the command is to /etc/init.d Find the corresponding service under the directory , Open and close
systemctl:
It's a systemd Tools , Mainly responsible for control systemd System and service manager , yes service and chkconfig The combination of orders
Operation of network equipment :


Environment variable loading order

边栏推荐
- PySpark算子处理空间数据全解析(5): 如何在PySpark里面使用空间运算接口
- Akamai talking about risk control principles and Solutions
- Summary of study notes for 2022 soft exam information security engineer preparation
- Xin'an Second Edition: Chapter 24 industrial control safety demand analysis and safety protection engineering learning notes
- Yarn: unable to load file d:\programfiles\nodejs\yarn PS1, because running scripts is prohibited on this system
- C#版Selenium操作Chrome全屏模式显示(F11)
- 基于LNMP部署flask项目
- [VNCTF 2022]ezmath wp
- Huawei certified cloud computing hica
- Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
猜你喜欢

Garbage first of JVM garbage collector

Grafana 9 正式发布,更易用,更酷炫了!

Flink analysis (I): basic concept analysis

C# NanoFramework 点灯和按键 之 ESP32

EasyRE WriteUp

分布式(一致性协议)之领导人选举( DotNext.Net.Cluster 实现Raft 选举 )

Re signal writeup

07 personal R & D products and promotion - human resources information management system
![[elastic] elastic lacks xpack and cannot create template unknown setting index lifecycle. name index. lifecycle. rollover_ alias](/img/03/ece7f7b28cd9caea4240635548c77f.jpg)
[elastic] elastic lacks xpack and cannot create template unknown setting index lifecycle. name index. lifecycle. rollover_ alias

Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
随机推荐
JVM 垃圾回收器之Serial SerialOld ParNew
Akamai浅谈风控原理与解决方案
[reverse primary] Unique
Flexible report v1.0 (simple version)
How does wechat prevent withdrawal come true?
Integrated development management platform
Jetpack compose 1.1 release, based on kotlin's Android UI Toolkit
The problem of "syntax error" when uipath executes insert statement is solved
Interpretation of Flink source code (II): Interpretation of jobgraph source code
Akamai anti confusion
Start job: operation returned an invalid status code 'badrequst' or 'forbidden‘
Example of batch update statement combining update and inner join in SQL Server
Huawei certified cloud computing hica
Redis快速入门
PySpark算子处理空间数据全解析(4): 先说说空间运算
Deploy flask project based on LNMP
07个人研发的产品及推广-人力资源信息管理系统
[CISCN 2021 华南赛区]rsa Writeup
Junit单元测试
遠程代碼執行滲透測試——B模塊測試