当前位置:网站首页>Principle analysis of spark
Principle analysis of spark
2022-07-02 07:07:00 【Boring n day】
Spark Principle analysis of
List of articles
Preface
Today's main learning is a Spark Analysis of the principle of the framework ,spark Operation flow ,RDD An execution process of , An introduction to dependencies
One . Spark brief introduction
Spark By scala Developed ,scala To run on JAVA platform (JVM), And compatible with existing JAVA Program , So use scala The program is written by Java jdk You can run , It doesn't need to scala jdk
Spark And MapReduce contrast
As can be seen from the figure above , Use Hadoop MR Iterative computation is very resource intensive
Spark After loading the data into memory , The subsequent iterative calculation can directly use the intermediate results in memory for operation , Avoid frequently reading data from disk
Two . Basic concept and architecture design
Spark The basic process of operation ( Here we use YARN For example )
- When the client submits the application , First, build a basic running environment for applications SparkContext And to RM Register and apply for resources , by Driver Create a SC Apply for resources , Assignment and monitoring of tasks
there Driver Understand applications written for users ,SparkContex(SC)t Is similar to RM Medium AM function - RM After receiving the request, it will start Executor And allocate resources , And to SC Register and apply for Task, And always with SC Maintain communication to prevent disconnection .
- After the job runs SC towards RM Apply for cancellation and close yourself
RDD A basic operational overview of
RDD The typical execution process of is as follows
1.RDD Read in the external data source and create , If the data source is large, multiple partitions will be created , Different partitions will go to different data nodes , Because of this characteristic , Talent RDD: Distributed elastic datasets
2.RDD After a series of conversion operations : Each conversion operation will form a new RDD For the next conversion operation , In this way, it forms DAG chart
3. the last one RDD Output to external data source through action operation
In the process ,RDD Will convert , But it will not generate specific results , Only encounter action operation (action) Will calculate the corresponding results
RDD Dependency of
As shown in the figure above ,RDD There are wide dependence and narrow dependence , What has narrow dependence , Now let me talk about Kuan dependence , It shows that there is a father RDD One partition of the corresponding sub RDD Multiple sections of
summary
Today's writing is more water , In general, it is written to consolidate what we have learned today . I'll be more specific when I have time later
边栏推荐
猜你喜欢
Sqli labs customs clearance summary-page4
IDEA2020中PySpark的两表关联(字段名相同)
Sqli - Labs Clearance (less6 - less14)
The win10 network icon disappears, and the network icon turns gray. Open the network and set the flash back to solve the problem
The use of regular expressions in JS
SQLI-LABS通关(less18-less20)
How to debug wechat built-in browser applications (enterprise number, official account, subscription number)
ORACLE EBS中消息队列fnd_msg_pub、fnd_message在PL/SQL中的应用
Proteus -- RS-232 dual computer communication
Sqli-labs customs clearance (less15-less17)
随机推荐
ORACLE 11.2.0.3 不停机处理SYSAUX表空间一直增长问题
Ingress Controller 0.47.0的Yaml文件
Overload global and member new/delete
php中计算两个日期之前相差多少天、月、年
[daily question] - Huawei machine test 01
在php的开发环境中如何调取WebService?
PgSQL learning notes
ORACLE EBS接口开发-json格式数据快捷生成
Write a thread pool by hand, and take you to learn the implementation principle of ThreadPoolExecutor thread pool
Spark的原理解析
SQL injection closure judgment
Code execution sequence with and without resolve in promise
Tool grass welfare post
js的防抖和节流
ORACLE 11G利用 ORDS+pljson来实现json_table 效果
js删除字符串的最后一位
pySpark构建临时表报错
Changes in foreign currency bookkeeping and revaluation general ledger balance table (Part 2)
CVE-2015-1635(MS15-034 )远程代码执行漏洞复现
Sqli-labs customs clearance (less1)