当前位置:网站首页>Spark overview
Spark overview
2022-07-03 09:25:00 【Did Xiao Hu get stronger today】
List of articles
Spark What is it?
Spark It is a memory based fast 、 Universal 、 Scalable big data analysis and calculation engine .
Spark and Hadoop
Hadoop By java language-written , Store massive data on the distributed server cluster and run the distributed server Open source framework for analytical applications .
Spark It's a kind of Scala Fast language development 、 Universal 、 Scalable big data analysis engine . The main function is mainly used for data calculation .
Spark or Hadoop
Hadoop MapReduce Because its original design is not to meet the circular iterative data flow processing , So in many Data reusability scenarios running in parallel ( Such as : machine learning 、 Graph mining algorithm 、 Interactive data mining algorithm ) Zhongcun In many problems such as computational efficiency . therefore Spark emerge as the times require ,Spark It's in the traditional MapReduce Calculation box On the basis of the frame , Using the optimization of its calculation process , Computing based on memory , To reduce the IO The cost of .
Spark and Hadoop The fundamental difference is the problem of data communication between multiple jobs : Spark Data between multiple jobs Communication is based on memory , and Hadoop It's disk based .
Spark Only in shuffle Write data to disk when , and Hadoop More than one of them MR Data interaction between jobs depends on disk interaction ,Spark The caching mechanism is better than HDFS Efficient caching mechanism .
In most data computing scenarios ,Spark It does. MapReduce Have more advantages . however Spark It's memory based , So in the actual production environment , Due to memory limitations , May be Due to insufficient memory resources Job Execution failure , here ,MapReduce It's actually a better choice , therefore Spark It's not a complete replacement for MR.
Spark Core module

- Spark Core Spark Core Provided in Spark The most basic and core functions ,Spark Other functions such as :Spark SQL, Spark Streaming,GraphX, MLlib It's all in Spark Core On the basis of
- Spark SQL Spark SQL yes Spark Components used to manipulate structured data . adopt Spark SQL, Users can use SQL perhaps Apache Hive Version of SQL dialect (HQL) To query data .
- Spark Streaming Spark Streaming yes Spark Platform for real-time data stream computing components , Provides a wealth of processing Streaming API.
- Spark MLlib MLlib yes Spark Provides a library of machine learning algorithms .MLlib Not only does it provide model evaluation 、 Data import, etc Extra features , It also provides some lower level machine learning primitives .
- Spark GraphX GraphX yes Spark The framework and algorithm library provided by graph computing .
边栏推荐
- The "booster" of traditional office mode, Building OA office system, was so simple!
- [solution to the new version of Flink without bat startup file]
- Derivation of Fourier transform
- Banner - Summary of closed group meeting
- 【Kotlin学习】类、对象和接口——带非默认构造方法或属性的类、数据类和类委托、object关键字
- AcWing 785. Quick sort (template)
- Digital management medium + low code, jnpf opens a new engine for enterprise digital transformation
- Spark 结构化流写入Hudi 实践
- [point cloud processing paper crazy reading frontier version 11] - unsupervised point cloud pre training via occlusion completion
- STM32F103 can learning record
猜你喜欢

We have a common name, XX Gong

【点云处理之论文狂读经典版13】—— Adaptive Graph Convolutional Neural Networks

Idea uses the MVN command to package and report an error, which is not available
![[point cloud processing paper crazy reading frontier version 11] - unsupervised point cloud pre training via occlusion completion](/img/76/b92fe4549cacba15c113993a07abb8.png)
[point cloud processing paper crazy reading frontier version 11] - unsupervised point cloud pre training via occlusion completion

LeetCode 513. Find the value in the lower left corner of the tree

【点云处理之论文狂读前沿版13】—— GAPNet: Graph Attention based Point Neural Network for Exploiting Local Feature

Install third-party libraries such as Jieba under Anaconda pytorch

Beego learning - Tencent cloud upload pictures
![[point cloud processing paper crazy reading classic version 9] - pointwise revolutionary neural networks](/img/ea/2c4336ee929c26c16627e5c0955704.png)
[point cloud processing paper crazy reading classic version 9] - pointwise revolutionary neural networks

npm install安装依赖包报错解决方法
随机推荐
[point cloud processing paper crazy reading frontier edition 13] - gapnet: graph attention based point neural network for exploring local feature
LeetCode每日一题(1300. Sum of Mutated Array Closest to Target)
【点云处理之论文狂读经典版7】—— Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs
Flink学习笔记(十一)Table API 和 SQL
There is no open in default browser option in the right click of the vscade editor
Overview of database system
【点云处理之论文狂读经典版13】—— Adaptive Graph Convolutional Neural Networks
[kotlin learning] classes, objects and interfaces - classes with non default construction methods or attributes, data classes and class delegates, object keywords
Recommend a low code open source project of yyds
LeetCode 30. Concatenate substrings of all words
[point cloud processing paper crazy reading cutting-edge version 12] - adaptive graph revolution for point cloud analysis
[solution to the new version of Flink without bat startup file]
2022-2-14 learning the imitation Niuke project - send email
Powerdesign reverse wizard such as SQL and generates name and comment
Pic16f648a-e/ss PIC16 8-bit microcontroller, 7KB (4kx14)
Django operates Excel files through openpyxl to import data into the database in batches.
Banner - Summary of closed group meeting
【Kotlin学习】类、对象和接口——定义类继承结构
Flink学习笔记(十)Flink容错机制
How to check whether the disk is in guid format (GPT) or MBR format? Judge whether UEFI mode starts or legacy mode starts?