当前位置:网站首页>RDD partition rules of spark
RDD partition rules of spark
2022-07-06 02:04:00 【Diligent ls】
1.RDD Data is created from a collection
a. Do not specify partition
Create... From collection rdd, If you do not write the number of partitions manually , The default number of partitions is the same as that of local mode cpu The number of cores is related to
local : 1 individual local[*] : Number of all cores of notebook local[K]:K individual
b. The specified partition
object fenqu {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("SparkCoreTest")
val sc: SparkContext = new SparkContext(conf)
//1)4 Data , Set up 4 Zones , Output :0 Partition ->1,1 Partition ->2,2 Partition ->3,3 Partition ->4
val rdd: RDD[Int] = sc.makeRDD(Array(1, 2, 3, 4), 4)
//2)4 Data , Set up 3 Zones , Output :0 Partition ->1,1 Partition ->2,2 Partition ->3,4
//val rdd: RDD[Int] = sc.makeRDD(Array(1, 2, 3, 4), 3)
//3)5 Data , Set up 3 Zones , Output :0 Partition ->1,1 Partition ->2、3,2 Partition ->4、5
//val rdd: RDD[Int] = sc.makeRDD(Array(1, 2, 3, 4, 5), 3)
rdd.saveAsTextFile("output")
sc.stop()
}
}The rules
The starting position of the partition = ( Zone number * Total data length )/ Total number of divisions
End of partition =(( Zone number + 1)* Total data length )/ Total number of divisions

2. Create after reading in the file
a. Default
The default value is the current number of cores and 2 The minimum value of , It's usually 2
b. Appoint
1). How to calculate the number of partitions :
totalSize = 10
goalSize = 10 / 3 = 3(byte) Indicates that each partition stores 3 Bytes of data
Partition number = totalSize/ goalSize = 10 /3 => 3,3,4
4 Subsection greater than 3 Subsection 1.1 times , accord with hadoop section 1.1 Double strategy , Therefore, an additional partition will be created , That is, there are 4 Zones 3,3,3,1
2). Spark Read the file , It's using hadoop Read by , So read line by line , It has nothing to do with the number of bytes
3). The calculation of data reading position is in the unit of offset .
4). Calculation of offset range of data partition
0 => [0,3] 1 012 0 => 1,2
1 => [3,6] 2 345 1 => 3
2 => [6,9] 3 678 2 => 4
3 => [9,9] 4 9 3 => nothing
边栏推荐
- Reasonable and sensible
- Blue Bridge Cup embedded_ STM32_ New project file_ Explain in detail
- A basic lintcode MySQL database problem
- Basic operations of database and table ----- delete data table
- 安装php-zbarcode扩展时报错,不知道有没有哪位大神帮我解决一下呀 php 环境用的7.3
- [detailed] several ways to quickly realize object mapping
- genius-storage使用文档,一个浏览器缓存工具
- 通过PHP 获取身份证相关信息 获取生肖,获取星座,获取年龄,获取性别
- Extracting key information from TrueType font files
- selenium 元素定位(2)
猜你喜欢

2022 PMP project management examination agile knowledge points (8)
![Grabbing and sorting out external articles -- status bar [4]](/img/1e/2d44f36339ac796618cd571aca5556.png)
Grabbing and sorting out external articles -- status bar [4]

Computer graduation design PHP enterprise staff training management system

Force buckle 1020 Number of enclaves
![[width first search] Ji Suan Ke: Suan tou Jun goes home (BFS with conditions)](/img/ec/7fcdcbd9c92924e765d420f7c71836.jpg)
[width first search] Ji Suan Ke: Suan tou Jun goes home (BFS with conditions)

SPI communication protocol

Visualstudio2019 compilation configuration lastools-v2.0.0 under win10 system

Social networking website for college students based on computer graduation design PHP

Basic operations of database and table ----- delete data table

Tensorflow customize the whole training process
随机推荐
D22:indeterminate equation (indefinite equation, translation + problem solution)
1. Introduction to basic functions of power query
Internship: unfamiliar annotations involved in the project code and their functions
Xshell 7 Student Edition
剑指 Offer 12. 矩阵中的路径
Paddle framework: paddlenlp overview [propeller natural language processing development library]
Computer graduation design PHP animation information website
Tensorflow customize the whole training process
01. Go language introduction
A Cooperative Approach to Particle Swarm Optimization
[width first search] Ji Suan Ke: Suan tou Jun goes home (BFS with conditions)
[understanding of opportunity-39]: Guiguzi - Chapter 5 flying clamp - warning 2: there are six types of praise. Be careful to enjoy praise as fish enjoy bait.
PHP campus movie website system for computer graduation design
Dynamics 365 开发协作最佳实践思考
Redis list
[flask] official tutorial -part3: blog blueprint, project installability
【Flask】官方教程(Tutorial)-part1:项目布局、应用程序设置、定义和访问数据库
安装Redis
It's wrong to install PHP zbarcode extension. I don't know if any God can help me solve it. 7.3 for PHP environment
MCU lightweight system core