当前位置:网站首页>RDD partition rules of spark
RDD partition rules of spark
2022-07-06 02:04:00 【Diligent ls】
1.RDD Data is created from a collection
a. Do not specify partition
Create... From collection rdd, If you do not write the number of partitions manually , The default number of partitions is the same as that of local mode cpu The number of cores is related to
local : 1 individual local[*] : Number of all cores of notebook local[K]:K individual
b. The specified partition
object fenqu {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("SparkCoreTest")
val sc: SparkContext = new SparkContext(conf)
//1)4 Data , Set up 4 Zones , Output :0 Partition ->1,1 Partition ->2,2 Partition ->3,3 Partition ->4
val rdd: RDD[Int] = sc.makeRDD(Array(1, 2, 3, 4), 4)
//2)4 Data , Set up 3 Zones , Output :0 Partition ->1,1 Partition ->2,2 Partition ->3,4
//val rdd: RDD[Int] = sc.makeRDD(Array(1, 2, 3, 4), 3)
//3)5 Data , Set up 3 Zones , Output :0 Partition ->1,1 Partition ->2、3,2 Partition ->4、5
//val rdd: RDD[Int] = sc.makeRDD(Array(1, 2, 3, 4, 5), 3)
rdd.saveAsTextFile("output")
sc.stop()
}
}
The rules
The starting position of the partition = ( Zone number * Total data length )/ Total number of divisions
End of partition =(( Zone number + 1)* Total data length )/ Total number of divisions
2. Create after reading in the file
a. Default
The default value is the current number of cores and 2 The minimum value of , It's usually 2
b. Appoint
1). How to calculate the number of partitions :
totalSize = 10
goalSize = 10 / 3 = 3(byte) Indicates that each partition stores 3 Bytes of data
Partition number = totalSize/ goalSize = 10 /3 => 3,3,4
4 Subsection greater than 3 Subsection 1.1 times , accord with hadoop section 1.1 Double strategy , Therefore, an additional partition will be created , That is, there are 4 Zones 3,3,3,1
2). Spark Read the file , It's using hadoop Read by , So read line by line , It has nothing to do with the number of bytes
3). The calculation of data reading position is in the unit of offset .
4). Calculation of offset range of data partition
0 => [0,3] 1 012 0 => 1,2
1 => [3,6] 2 345 1 => 3
2 => [6,9] 3 678 2 => 4
3 => [9,9] 4 9 3 => nothing
边栏推荐
- Basic operations of databases and tables ----- default constraints
- Grabbing and sorting out external articles -- status bar [4]
- 通过PHP 获取身份证相关信息 获取生肖,获取星座,获取年龄,获取性别
- dried food! Accelerating sparse neural network through hardware and software co design
- Leetcode skimming questions_ Invert vowels in a string
- 竞价推广流程
- [solution] add multiple directories in different parts of the same word document
- [understanding of opportunity-39]: Guiguzi - Chapter 5 flying clamp - warning 2: there are six types of praise. Be careful to enjoy praise as fish enjoy bait.
- MySQL index
- Using SA token to solve websocket handshake authentication
猜你喜欢
[Clickhouse] Clickhouse based massive data interactive OLAP analysis scenario practice
[depth first search] Ji Suan Ke: Betsy's trip
dried food! Accelerating sparse neural network through hardware and software co design
【Flask】官方教程(Tutorial)-part3:blog蓝图、项目可安装化
Open source | Ctrip ticket BDD UI testing framework flybirds
[Jiudu OJ 09] two points to find student information
[detailed] several ways to quickly realize object mapping
Leetcode3. Implement strstr()
National intangible cultural heritage inheritor HD Wang's shadow digital collection of "Four Beauties" made an amazing debut!
Basic operations of databases and tables ----- unique constraints
随机推荐
Basic operations of databases and tables ----- primary key constraints
Cadre du Paddle: aperçu du paddlelnp [bibliothèque de développement pour le traitement du langage naturel des rames volantes]
Folio. Ink is a free, fast and easy-to-use image sharing tool
NumPy 数组索引 切片
Leetcode skimming questions_ Sum of squares
2 power view
NLP第四范式:Prompt概述【Pre-train,Prompt(提示),Predict】【刘鹏飞】
[width first search] Ji Suan Ke: Suan tou Jun goes home (BFS with conditions)
02. Go language development environment configuration
Virtual machine network, networking settings, interconnection with host computer, network configuration
Paddle framework: paddlenlp overview [propeller natural language processing development library]
How does the crystal oscillator vibrate?
Publish your own toolkit notes using NPM
Basic operations of database and table ----- set the fields of the table to be automatically added
[flask] official tutorial -part3: blog blueprint, project installability
Redis-字符串类型
Maya hollowed out modeling
Text editing VIM operation, file upload
C web page open WinForm exe
Selenium waiting mode