当前位置:网站首页>Spark overview
Spark overview
2022-07-03 09:25:00 【Did Xiao Hu get stronger today】
List of articles
Spark What is it?
Spark It is a memory based fast 、 Universal 、 Scalable big data analysis and calculation engine .
Spark and Hadoop
Hadoop By java language-written , Store massive data on the distributed server cluster and run the distributed server Open source framework for analytical applications .
Spark It's a kind of Scala Fast language development 、 Universal 、 Scalable big data analysis engine . The main function is mainly used for data calculation .
Spark or Hadoop
Hadoop MapReduce Because its original design is not to meet the circular iterative data flow processing , So in many Data reusability scenarios running in parallel ( Such as : machine learning 、 Graph mining algorithm 、 Interactive data mining algorithm ) Zhongcun In many problems such as computational efficiency . therefore Spark emerge as the times require ,Spark It's in the traditional MapReduce Calculation box On the basis of the frame , Using the optimization of its calculation process , Computing based on memory , To reduce the IO The cost of .
Spark and Hadoop The fundamental difference is the problem of data communication between multiple jobs : Spark Data between multiple jobs Communication is based on memory , and Hadoop It's disk based .
Spark Only in shuffle Write data to disk when , and Hadoop More than one of them MR Data interaction between jobs depends on disk interaction ,Spark The caching mechanism is better than HDFS Efficient caching mechanism .
In most data computing scenarios ,Spark It does. MapReduce Have more advantages . however Spark It's memory based , So in the actual production environment , Due to memory limitations , May be Due to insufficient memory resources Job Execution failure , here ,MapReduce It's actually a better choice , therefore Spark It's not a complete replacement for MR.
Spark Core module
- Spark Core Spark Core Provided in Spark The most basic and core functions ,Spark Other functions such as :Spark SQL, Spark Streaming,GraphX, MLlib It's all in Spark Core On the basis of
- Spark SQL Spark SQL yes Spark Components used to manipulate structured data . adopt Spark SQL, Users can use SQL perhaps Apache Hive Version of SQL dialect (HQL) To query data .
- Spark Streaming Spark Streaming yes Spark Platform for real-time data stream computing components , Provides a wealth of processing Streaming API.
- Spark MLlib MLlib yes Spark Provides a library of machine learning algorithms .MLlib Not only does it provide model evaluation 、 Data import, etc Extra features , It also provides some lower level machine learning primitives .
- Spark GraphX GraphX yes Spark The framework and algorithm library provided by graph computing .
边栏推荐
- [point cloud processing paper crazy reading frontier version 11] - unsupervised point cloud pre training via occlusion completion
- 【点云处理之论文狂读前沿版11】—— Unsupervised Point Cloud Pre-training via Occlusion Completion
- [point cloud processing paper crazy reading frontier version 10] - mvtn: multi view transformation network for 3D shape recognition
- Flink学习笔记(十)Flink容错机制
- 【Kotlin学习】类、对象和接口——定义类继承结构
- WARNING: You are using pip ; however. Later, upgrade PIP failed, modulenotfounderror: no module named 'pip‘
- [point cloud processing paper crazy reading classic version 8] - o-cnn: octree based revolutionary neural networks for 3D shape analysis
- Crawler career from scratch (I): crawl the photos of my little sister ① (the website has been disabled)
- LeetCode 324. Swing sort II
- IDEA 中使用 Hudi
猜你喜欢
AcWing 786. Number k
LeetCode 508. The most frequent subtree elements and
LeetCode 57. Insert interval
【Kotlin学习】高阶函数的控制流——lambda的返回语句和匿名函数
CSDN markdown editor help document
Recommend a low code open source project of yyds
2022-1-6 Niuke net brush sword finger offer
IDEA 中使用 Hudi
Django operates Excel files through openpyxl to import data into the database in batches.
LeetCode 513. Find the value in the lower left corner of the tree
随机推荐
Instant messaging IM is the countercurrent of the progress of the times? See what jnpf says
Save the drama shortage, programmers' favorite high-score American drama TOP10
Excel is not as good as jnpf form for 3 minutes in an hour. Leaders must praise it when making reports like this!
LeetCode 508. The most frequent subtree elements and
Temper cattle ranking problem
LeetCode 57. Insert interval
Digital management medium + low code, jnpf opens a new engine for enterprise digital transformation
2022-2-13 learning the imitation Niuke project - home page of the development community
ERROR: certificate common name “www.mysql.com” doesn’t match requested host name “137.254.60.11”.
【点云处理之论文狂读经典版13】—— Adaptive Graph Convolutional Neural Networks
Common formulas of probability theory
[graduation season | advanced technology Er] another graduation season, I change my career as soon as I graduate, from animal science to programmer. Programmers have something to say in 10 years
Spark structured stream writing Hudi practice
Severity code description the project file line prohibits the display of status error c2440 "initialization": unable to convert from "const char [31]" to "char *"
We have a common name, XX Gong
【点云处理之论文狂读前沿版11】—— Unsupervised Point Cloud Pre-training via Occlusion Completion
Windows安装Redis详细步骤
Flink学习笔记(八)多流转换
【点云处理之论文狂读经典版12】—— FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation
Hudi learning notes (III) analysis of core concepts