当前位置:网站首页>RDD partition rules of spark
RDD partition rules of spark
2022-07-06 02:04:00 【Diligent ls】
1.RDD Data is created from a collection
a. Do not specify partition
Create... From collection rdd, If you do not write the number of partitions manually , The default number of partitions is the same as that of local mode cpu The number of cores is related to
local : 1 individual local[*] : Number of all cores of notebook local[K]:K individual
b. The specified partition
object fenqu {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("SparkCoreTest")
val sc: SparkContext = new SparkContext(conf)
//1)4 Data , Set up 4 Zones , Output :0 Partition ->1,1 Partition ->2,2 Partition ->3,3 Partition ->4
val rdd: RDD[Int] = sc.makeRDD(Array(1, 2, 3, 4), 4)
//2)4 Data , Set up 3 Zones , Output :0 Partition ->1,1 Partition ->2,2 Partition ->3,4
//val rdd: RDD[Int] = sc.makeRDD(Array(1, 2, 3, 4), 3)
//3)5 Data , Set up 3 Zones , Output :0 Partition ->1,1 Partition ->2、3,2 Partition ->4、5
//val rdd: RDD[Int] = sc.makeRDD(Array(1, 2, 3, 4, 5), 3)
rdd.saveAsTextFile("output")
sc.stop()
}
}
The rules
The starting position of the partition = ( Zone number * Total data length )/ Total number of divisions
End of partition =(( Zone number + 1)* Total data length )/ Total number of divisions
2. Create after reading in the file
a. Default
The default value is the current number of cores and 2 The minimum value of , It's usually 2
b. Appoint
1). How to calculate the number of partitions :
totalSize = 10
goalSize = 10 / 3 = 3(byte) Indicates that each partition stores 3 Bytes of data
Partition number = totalSize/ goalSize = 10 /3 => 3,3,4
4 Subsection greater than 3 Subsection 1.1 times , accord with hadoop section 1.1 Double strategy , Therefore, an additional partition will be created , That is, there are 4 Zones 3,3,3,1
2). Spark Read the file , It's using hadoop Read by , So read line by line , It has nothing to do with the number of bytes
3). The calculation of data reading position is in the unit of offset .
4). Calculation of offset range of data partition
0 => [0,3] 1 012 0 => 1,2
1 => [3,6] 2 345 1 => 3
2 => [6,9] 3 678 2 => 4
3 => [9,9] 4 9 3 => nothing
边栏推荐
- Open source | Ctrip ticket BDD UI testing framework flybirds
- 安装php-zbarcode扩展时报错,不知道有没有哪位大神帮我解决一下呀 php 环境用的7.3
- FTP server, ssh server (super brief)
- I like Takeshi Kitano's words very much: although it's hard, I will still choose that kind of hot life
- Regular expressions: examples (1)
- [ssrf-01] principle and utilization examples of server-side Request Forgery vulnerability
- Kubernetes stateless application expansion and contraction capacity
- [understanding of opportunity-39]: Guiguzi - Chapter 5 flying clamp - warning 2: there are six types of praise. Be careful to enjoy praise as fish enjoy bait.
- 02.Go语言开发环境配置
- Online reservation system of sports venues based on PHP
猜你喜欢
02.Go语言开发环境配置
National intangible cultural heritage inheritor HD Wang's shadow digital collection of "Four Beauties" made an amazing debut!
2 power view
Using SA token to solve websocket handshake authentication
【Flask】官方教程(Tutorial)-part1:项目布局、应用程序设置、定义和访问数据库
[flask] official tutorial -part1: project layout, application settings, definition and database access
Open source | Ctrip ticket BDD UI testing framework flybirds
MySQL index
Leetcode sum of two numbers
[solved] how to generate a beautiful static document description page
随机推荐
Force buckle 1020 Number of enclaves
Leetcode skimming questions_ Verify palindrome string II
How to use C to copy files on UNIX- How can I copy a file on Unix using C?
Computer graduation design PHP animation information website
A basic lintcode MySQL database problem
leetcode-两数之和
3D vision - 4 Getting started with gesture recognition - using mediapipe includes single frame and real time video
How to improve the level of pinduoduo store? Dianyingtong came to tell you
How does the crystal oscillator vibrate?
This time, thoroughly understand the deep copy
通过PHP 获取身份证相关信息 获取生肖,获取星座,获取年龄,获取性别
正则表达式:示例(1)
安装Redis
[the most complete in the whole network] |mysql explain full interpretation
Ali test open-ended questions
Initialize MySQL database when docker container starts
[ssrf-01] principle and utilization examples of server-side Request Forgery vulnerability
MySQL index
【clickhouse】ClickHouse Practice in EOI
Concept of storage engine