当前位置:网站首页>Spark calculation operator and some small details in liunx
Spark calculation operator and some small details in liunx
2022-07-06 17:39:00 【Bald Second Senior brother】
Spark -map operator
map operator :
object Spark01_Oper {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("Value")
val cs = new SparkContext(conf)
val make = cs.makeRDD(1 to 10)
//map operator
val mapRdd = make.map(x => x * 2)
mapRdd.collect().foreach(println)
}
}
map Operators are used to calculate the data in all incoming partitions one by one .
mapPartRdd operator
object Spark02_OPer {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("mapPart")
val sc = new SparkContext(conf)
//map operator
val list = sc.makeRDD(1 to 10)
val mapPartRdd = list.mapPartitions(datas => {datas.map(data => data*2)})
mapPartRdd.collect().foreach(println)
}
}mapPartRdd operator Be similar to map But it calculates data by partition , The output value of his calculation is a list
mapPartitionsWithIndex operator
object Spark03_OPer {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("With")
val sc = new SparkContext(conf)
val list = sc.makeRDD(1 to 10,2)
val indexRDD = list.mapPartitionsWithIndex {
case (num, datas) => {
datas.map((_," Zone number :"+num))
}
}
indexRDD.collect().foreach(println)
}
}mapPartitionsWithIndex The operator is similar to mapPartiyions But in func There will be an index value representing the partition , therefore func There will be one more function similar to Int.
spark Possible problems :
Because every time the calculation data will produce new data, but it will not be deleted , Accumulating all the time will cause memory overflow (OOM)
Driver And Executor The difference between
Driver:
Driver Just create Spark The classes of context objects can be said to be Driver,Driver yes Spark in Application That is, the code release program , It can be understood that it is written for us spark The main program of the code , Secondly, he is also responsible for Executor To allocate tasks ,Driver There can only be one
Executor:
Executor yes Spark In charge of resource calculation , He can exist in multiple .
difference :
Drvier Like a boss , and Executor yes Driver The hands of ,Driver Be responsible for assigning tasks to Executor To execute .
Linux Pick up
linux Method of switching on and off
1. To turn it off : shutdown -h restart :shutdown -r
2. To turn it off : inti -0 restart : init -6
3. To turn it off : poweroff restart :reboot
service And systemctl The difference between
service:
You can start 、 stop it 、 Restart and shut down system services , It can also display the current status of all system services ,service The function of the command is to /etc/init.d Find the corresponding service under the directory , Open and close
systemctl:
It's a systemd Tools , Mainly responsible for control systemd System and service manager , yes service and chkconfig The combination of orders
Operation of network equipment :


Environment variable loading order

边栏推荐
- 自动化运维利器-Ansible-Playbook
- 2021-03-22 "display login screen during recovery" can't be canceled. The appearance of lock screen interface leads to the solution that the remotely connected virtual machine can't work normally
- Precipitated database operation class - version C (SQL Server)
- Detailed explanation of data types of MySQL columns
- 关于Selenium启动Chrome浏览器闪退问题
- Development and practice of lightweight planning service tools
- 05个人研发的产品及推广-数据同步工具
- Automatic operation and maintenance sharp weapon ansible Foundation
- Grafana 9 正式发布,更易用,更酷炫了!
- C# WinForm中DataGridView单元格显示图片
猜你喜欢

03个人研发的产品及推广-计划服务配置器V3.0

Chrome prompts the solution of "your company management" (the startup page is bound to the company's official website and cannot be modified)
![[ASM] introduction and use of bytecode operation classwriter class](/img/0b/87c9851e577df8dcf8198a272b81bd.png)
[ASM] introduction and use of bytecode operation classwriter class

2021-03-22 "display login screen during recovery" can't be canceled. The appearance of lock screen interface leads to the solution that the remotely connected virtual machine can't work normally

PySpark算子处理空间数据全解析(4): 先说说空间运算

网络分层概念及基本知识

【Elastic】Elastic缺少xpack无法创建模板 unknown setting index.lifecycle.name index.lifecycle.rollover_alias

Application service configurator (regular, database backup, file backup, remote backup)

02个人研发的产品及推广-短信平台

Yarn: unable to load file d:\programfiles\nodejs\yarn PS1, because running scripts is prohibited on this system
随机推荐
Redis quick start
connection reset by peer
JVM class loading subsystem
How to submit data through post
【Elastic】Elastic缺少xpack无法创建模板 unknown setting index.lifecycle.name index.lifecycle.rollover_alias
自动化运维利器ansible基础
Application service configurator (regular, database backup, file backup, remote backup)
C# NanoFramework 点灯和按键 之 ESP32
Flink parsing (V): state and state backend
应用服务配置器(定时,数据库备份,文件备份,异地备份)
Development and practice of lightweight planning service tools
Virtual machine startup prompt probing EDD (edd=off to disable) error
[ciscn 2021 South China]rsa writeup
1. Introduction to JVM
信息与网络安全期末复习(基于老师给的重点)
List set data removal (list.sublist.clear)
学 SQL 必须了解的 10 个高级概念
Error: Publish of Process project to Orchestrator failed. The operation has timed out.
Interpretation of Flink source code (II): Interpretation of jobgraph source code
The solution to the left-right sliding conflict caused by nesting Baidu MapView in the fragment of viewpager