当前位置:网站首页>Spark overview
Spark overview
2022-07-03 09:25:00 【Did Xiao Hu get stronger today】
List of articles
Spark What is it?
Spark It is a memory based fast 、 Universal 、 Scalable big data analysis and calculation engine .
Spark and Hadoop
Hadoop By java language-written , Store massive data on the distributed server cluster and run the distributed server Open source framework for analytical applications .
Spark It's a kind of Scala Fast language development 、 Universal 、 Scalable big data analysis engine . The main function is mainly used for data calculation .
Spark or Hadoop
Hadoop MapReduce Because its original design is not to meet the circular iterative data flow processing , So in many Data reusability scenarios running in parallel ( Such as : machine learning 、 Graph mining algorithm 、 Interactive data mining algorithm ) Zhongcun In many problems such as computational efficiency . therefore Spark emerge as the times require ,Spark It's in the traditional MapReduce Calculation box On the basis of the frame , Using the optimization of its calculation process , Computing based on memory , To reduce the IO The cost of .
Spark and Hadoop The fundamental difference is the problem of data communication between multiple jobs : Spark Data between multiple jobs Communication is based on memory , and Hadoop It's disk based .
Spark Only in shuffle Write data to disk when , and Hadoop More than one of them MR Data interaction between jobs depends on disk interaction ,Spark The caching mechanism is better than HDFS Efficient caching mechanism .
In most data computing scenarios ,Spark It does. MapReduce Have more advantages . however Spark It's memory based , So in the actual production environment , Due to memory limitations , May be Due to insufficient memory resources Job Execution failure , here ,MapReduce It's actually a better choice , therefore Spark It's not a complete replacement for MR.
Spark Core module

- Spark Core Spark Core Provided in Spark The most basic and core functions ,Spark Other functions such as :Spark SQL, Spark Streaming,GraphX, MLlib It's all in Spark Core On the basis of
- Spark SQL Spark SQL yes Spark Components used to manipulate structured data . adopt Spark SQL, Users can use SQL perhaps Apache Hive Version of SQL dialect (HQL) To query data .
- Spark Streaming Spark Streaming yes Spark Platform for real-time data stream computing components , Provides a wealth of processing Streaming API.
- Spark MLlib MLlib yes Spark Provides a library of machine learning algorithms .MLlib Not only does it provide model evaluation 、 Data import, etc Extra features , It also provides some lower level machine learning primitives .
- Spark GraphX GraphX yes Spark The framework and algorithm library provided by graph computing .
边栏推荐
- Explanation of the answers to the three questions
- 【点云处理之论文狂读前沿版13】—— GAPNet: Graph Attention based Point Neural Network for Exploiting Local Feature
- 【Kotlin学习】运算符重载及其他约定——重载算术运算符、比较运算符、集合与区间的约定
- Matlab dichotomy to find the optimal solution
- 【点云处理之论文狂读前沿版11】—— Unsupervised Point Cloud Pre-training via Occlusion Completion
- Hudi learning notes (III) analysis of core concepts
- Principles of computer composition - cache, connection mapping, learning experience
- [set theory] order relation (eight special elements in partial order relation | ① maximum element | ② minimum element | ③ maximum element | ④ minimum element | ⑤ upper bound | ⑥ lower bound | ⑦ minimu
- 【毕业季|进击的技术er】又到一年毕业季,一毕业就转行,从动物科学到程序员,10年程序员有话说
- Sword finger offer II 029 Sorted circular linked list
猜你喜欢

Sword finger offer II 091 Paint the house

Liteide is easy to use

Utilisation de hudi dans idea

Hudi integrated spark data analysis example (including code flow and test results)

【点云处理之论文狂读经典版8】—— O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis

【点云处理之论文狂读经典版10】—— PointCNN: Convolution On X-Transformed Points

LeetCode 715. Range module

Flink学习笔记(十一)Table API 和 SQL

Temper cattle ranking problem
![[kotlin learning] classes, objects and interfaces - classes with non default construction methods or attributes, data classes and class delegates, object keywords](/img/ee/d982fd9e1f2283e09ad1a81d0b61b5.png)
[kotlin learning] classes, objects and interfaces - classes with non default construction methods or attributes, data classes and class delegates, object keywords
随机推荐
Solve POM in idea Comment top line problem in XML file
Move anaconda, pycharm and jupyter notebook to mobile hard disk
图像修复方法研究综述----论文笔记
LeetCode 715. Range module
Uc/os self-study from 0
[untitled] use of cmake
Beego learning - JWT realizes user login and registration
ERROR: certificate common name “www.mysql.com” doesn’t match requested host name “137.254.60.11”.
Instant messaging IM is the countercurrent of the progress of the times? See what jnpf says
Excel is not as good as jnpf form for 3 minutes in an hour. Leaders must praise it when making reports like this!
Save the drama shortage, programmers' favorite high-score American drama TOP10
With low code prospect, jnpf is flexible and easy to use, and uses intelligence to define a new office mode
Hudi学习笔记(三) 核心概念剖析
ERROR: certificate common name “*.” doesn’t match requested ho
Explanation of the answers to the three questions
On February 14, 2022, learn the imitation Niuke project - develop the registration function
MySQL installation and configuration (command line version)
Jenkins learning (III) -- setting scheduled tasks
307. Range Sum Query - Mutable
Simple use of MATLAB