当前位置:网站首页>Principle analysis of spark
Principle analysis of spark
2022-07-02 07:07:00 【Boring n day】
Spark Principle analysis of
List of articles
Preface
Today's main learning is a Spark Analysis of the principle of the framework ,spark Operation flow ,RDD An execution process of , An introduction to dependencies
One . Spark brief introduction
Spark By scala Developed ,scala To run on JAVA platform (JVM), And compatible with existing JAVA Program , So use scala The program is written by Java jdk You can run , It doesn't need to scala jdk
Spark And MapReduce contrast

As can be seen from the figure above , Use Hadoop MR Iterative computation is very resource intensive
Spark After loading the data into memory , The subsequent iterative calculation can directly use the intermediate results in memory for operation , Avoid frequently reading data from disk
Two . Basic concept and architecture design

Spark The basic process of operation ( Here we use YARN For example )

- When the client submits the application , First, build a basic running environment for applications SparkContext And to RM Register and apply for resources , by Driver Create a SC Apply for resources , Assignment and monitoring of tasks
there Driver Understand applications written for users ,SparkContex(SC)t Is similar to RM Medium AM function - RM After receiving the request, it will start Executor And allocate resources , And to SC Register and apply for Task, And always with SC Maintain communication to prevent disconnection .
- After the job runs SC towards RM Apply for cancellation and close yourself
RDD A basic operational overview of
RDD The typical execution process of is as follows

1.RDD Read in the external data source and create , If the data source is large, multiple partitions will be created , Different partitions will go to different data nodes , Because of this characteristic , Talent RDD: Distributed elastic datasets
2.RDD After a series of conversion operations : Each conversion operation will form a new RDD For the next conversion operation , In this way, it forms DAG chart
3. the last one RDD Output to external data source through action operation
In the process ,RDD Will convert , But it will not generate specific results , Only encounter action operation (action) Will calculate the corresponding results

RDD Dependency of
As shown in the figure above ,RDD There are wide dependence and narrow dependence , What has narrow dependence , Now let me talk about Kuan dependence , It shows that there is a father RDD One partition of the corresponding sub RDD Multiple sections of
summary
Today's writing is more water , In general, it is written to consolidate what we have learned today . I'll be more specific when I have time later
边栏推荐
猜你喜欢

In depth study of JVM bottom layer (II): hotspot virtual machine object

Go package name

CAD二次开发 对象

Sqli labs customs clearance summary-page1

Spark的原理解析

Sqli-labs customs clearance (less6-less14)

In depth study of JVM bottom layer (3): garbage collector and memory allocation strategy

Sentry搭建和使用

Recursion (maze problem, Queen 8 problem)

MySQL中的正则表达式
随机推荐
Atcoder beginer contest 253 F - operations on a matrix / / tree array
Self study table Au
Overload global and member new/delete
php中的二维数组去重
php中树形结构转数组(拉平树结构,保留上下级排序)
Stack (linear structure)
Thinkphp5中一个字段对应多个模糊查询
Sqli labs customs clearance summary-page4
ORACLE 11G利用 ORDS+pljson来实现json_table 效果
php中生成随机的6位邀请码
SQLI-LABS通关(less2-less5)
The boss said: whoever wants to use double to define the amount of goods, just pack up and go
Sqli labs customs clearance summary-page3
php中计算树状结构数据中的合计
js创建一个自定义json数组
Cve-2015-1635 (ms15-034) Remote Code Execution Vulnerability recurrence
搭建frp进行内网穿透
Solve the problem of bindchange event jitter of swiper component of wechat applet
In depth study of JVM bottom layer (II): hotspot virtual machine object
2021-07-19c CAD secondary development creates multiple line segments