当前位置:网站首页>Spark calculation operator and some small details in liunx
Spark calculation operator and some small details in liunx
2022-07-06 17:39:00 【Bald Second Senior brother】
Spark -map operator
map operator :
object Spark01_Oper { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster("local[*]").setAppName("Value") val cs = new SparkContext(conf) val make = cs.makeRDD(1 to 10) //map operator val mapRdd = make.map(x => x * 2) mapRdd.collect().foreach(println) } }
map Operators are used to calculate the data in all incoming partitions one by one .
mapPartRdd operator
object Spark02_OPer { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster("local[*]").setAppName("mapPart") val sc = new SparkContext(conf) //map operator val list = sc.makeRDD(1 to 10) val mapPartRdd = list.mapPartitions(datas => {datas.map(data => data*2)}) mapPartRdd.collect().foreach(println) } }
mapPartRdd operator Be similar to map But it calculates data by partition , The output value of his calculation is a list
mapPartitionsWithIndex operator
object Spark03_OPer {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("With")
val sc = new SparkContext(conf)
val list = sc.makeRDD(1 to 10,2)
val indexRDD = list.mapPartitionsWithIndex {
case (num, datas) => {
datas.map((_," Zone number :"+num))
}
}
indexRDD.collect().foreach(println)
}
}
mapPartitionsWithIndex The operator is similar to mapPartiyions But in func There will be an index value representing the partition , therefore func There will be one more function similar to Int.
spark Possible problems :
Because every time the calculation data will produce new data, but it will not be deleted , Accumulating all the time will cause memory overflow (OOM)
Driver And Executor The difference between
Driver:
Driver Just create Spark The classes of context objects can be said to be Driver,Driver yes Spark in Application That is, the code release program , It can be understood that it is written for us spark The main program of the code , Secondly, he is also responsible for Executor To allocate tasks ,Driver There can only be one
Executor:
Executor yes Spark In charge of resource calculation , He can exist in multiple .
difference :
Drvier Like a boss , and Executor yes Driver The hands of ,Driver Be responsible for assigning tasks to Executor To execute .
Linux Pick up
linux Method of switching on and off
1. To turn it off : shutdown -h restart :shutdown -r
2. To turn it off : inti -0 restart : init -6
3. To turn it off : poweroff restart :reboot
service And systemctl The difference between
service:
You can start 、 stop it 、 Restart and shut down system services , It can also display the current status of all system services ,service The function of the command is to /etc/init.d Find the corresponding service under the directory , Open and close
systemctl:
It's a systemd Tools , Mainly responsible for control systemd System and service manager , yes service and chkconfig The combination of orders
Operation of network equipment :
Environment variable loading order
边栏推荐
- CTF reverse entry question - dice
- Application service configurator (regular, database backup, file backup, remote backup)
- Flink parsing (III): memory management
- 05 personal R & D products and promotion - data synchronization tool
- Detailed explanation of data types of MySQL columns
- Selenium test of automatic answer runs directly in the browser, just like real users.
- C version selenium operation chrome full screen mode display (F11)
- Chrome prompts the solution of "your company management" (the startup page is bound to the company's official website and cannot be modified)
- Remote code execution penetration test - B module test
- Wu Jun's trilogy insight (V) refusing fake workers
猜你喜欢
How does wechat prevent withdrawal come true?
自动答题 之 Selenium测试直接运行在浏览器中,就像真正的用户在操作一样。
06个人研发的产品及推广-代码统计工具
JVM 垃圾回收器之Garbage First
Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
Akamai 反混淆篇
Deploy flask project based on LNMP
案例:检查空字段【注解+反射+自定义异常】
Akamai浅谈风控原理与解决方案
The most complete tcpdump and Wireshark packet capturing practice in the whole network
随机推荐
Connect to LAN MySQL
yarn : 无法加载文件 D:\ProgramFiles\nodejs\yarn.ps1,因为在此系统上禁止运行脚本
JVM class loading subsystem
Total / statistics function of MySQL
Models used in data warehouse modeling and layered introduction
C WinForm series button easy to use
05个人研发的产品及推广-数据同步工具
mysql高级(索引,视图,存储过程,函数,修改密码)
Uipath browser performs actions in the new tab
Vscode replaces commas, or specific characters with newlines
mysql高級(索引,視圖,存儲過程,函數,修改密碼)
信息与网络安全期末复习(完整版)
List set data removal (list.sublist.clear)
Wu Jun's trilogy experience (VII) the essence of Commerce
Pyspark operator processing spatial data full parsing (4): let's talk about spatial operations first
PostgreSQL 14.2, 13.6, 12.10, 11.15 and 10.20 releases
02 personal developed products and promotion - SMS platform
【逆向】脱壳后修复IAT并关闭ASLR
Akamai anti confusion
03个人研发的产品及推广-计划服务配置器V3.0