当前位置:网站首页>Principle analysis of spark
Principle analysis of spark
2022-07-02 07:07:00 【Boring n day】
Spark Principle analysis of
List of articles
Preface
Today's main learning is a Spark Analysis of the principle of the framework ,spark Operation flow ,RDD An execution process of , An introduction to dependencies
One . Spark brief introduction
Spark By scala Developed ,scala To run on JAVA platform (JVM), And compatible with existing JAVA Program , So use scala The program is written by Java jdk You can run , It doesn't need to scala jdk
Spark And MapReduce contrast

As can be seen from the figure above , Use Hadoop MR Iterative computation is very resource intensive
Spark After loading the data into memory , The subsequent iterative calculation can directly use the intermediate results in memory for operation , Avoid frequently reading data from disk
Two . Basic concept and architecture design

Spark The basic process of operation ( Here we use YARN For example )

- When the client submits the application , First, build a basic running environment for applications SparkContext And to RM Register and apply for resources , by Driver Create a SC Apply for resources , Assignment and monitoring of tasks
there Driver Understand applications written for users ,SparkContex(SC)t Is similar to RM Medium AM function - RM After receiving the request, it will start Executor And allocate resources , And to SC Register and apply for Task, And always with SC Maintain communication to prevent disconnection .
- After the job runs SC towards RM Apply for cancellation and close yourself
RDD A basic operational overview of
RDD The typical execution process of is as follows

1.RDD Read in the external data source and create , If the data source is large, multiple partitions will be created , Different partitions will go to different data nodes , Because of this characteristic , Talent RDD: Distributed elastic datasets
2.RDD After a series of conversion operations : Each conversion operation will form a new RDD For the next conversion operation , In this way, it forms DAG chart
3. the last one RDD Output to external data source through action operation
In the process ,RDD Will convert , But it will not generate specific results , Only encounter action operation (action) Will calculate the corresponding results

RDD Dependency of
As shown in the figure above ,RDD There are wide dependence and narrow dependence , What has narrow dependence , Now let me talk about Kuan dependence , It shows that there is a father RDD One partition of the corresponding sub RDD Multiple sections of
summary
Today's writing is more water , In general, it is written to consolidate what we have learned today . I'll be more specific when I have time later
边栏推荐
- oracle-外币记账时总账余额表gl_balance变化(上)
- js数组的常用的原型方法
- Self study table Au
- Cve - 2015 - 1635 (ms15 - 034) réplication de la vulnérabilité d'exécution de code à distance
- pm2简单使用和守护进程
- ORACLE 11.2.0.3 不停机处理SYSAUX表空间一直增长问题
- Log - 7 - record a major error in missing documents (A4 paper)
- 在php的开发环境中如何调取WebService?
- 2021-07-17C#/CAD二次开发创建圆(5)
- ORACLE 11G SYSAUX表空间满处理及move和shrink区别
猜你喜欢

In depth study of JVM bottom layer (II): hotspot virtual machine object

Sublime text configuring PHP compilation environment

Cve - 2015 - 1635 (ms15 - 034) réplication de la vulnérabilité d'exécution de code à distance

JSP智能小区物业管理系统

Yolov5 practice: teach object detection by hand

Write a thread pool by hand, and take you to learn the implementation principle of ThreadPoolExecutor thread pool

uniapp引入本地字体

ORACLE EBS中消息队列fnd_msg_pub、fnd_message在PL/SQL中的应用

Sentry construction and use

Brief analysis of PHP session principle
随机推荐
Queue (linear structure)
Flex Jiugongge layout
数仓模型事实表模型设计
The win10 network icon disappears, and the network icon turns gray. Open the network and set the flash back to solve the problem
mapreduce概念和案例(尚硅谷学习笔记)
js创建一个自定义json数组
SQL injection closure judgment
ssm+mysql实现进销存系统
Eslint configuration code auto format
JS delete the last character of the string
js删除字符串的最后一位
Date time API details
SQLI-LABS通关(less18-less20)
oracle EBS标准表的后缀解释说明
TCP攻击
MySQL index
php中时间戳转换为毫秒以及格式化时间
PXC high availability cluster summary
2021-07-05C#/CAD二次开发创建圆弧(4)
ORACLE APEX 21.2安装及一键部署