当前位置:网站首页>Spark overview
Spark overview
2022-07-03 09:25:00 【Did Xiao Hu get stronger today】
List of articles
Spark What is it?
Spark It is a memory based fast 、 Universal 、 Scalable big data analysis and calculation engine .
Spark and Hadoop
Hadoop By java language-written , Store massive data on the distributed server cluster and run the distributed server Open source framework for analytical applications .
Spark It's a kind of Scala Fast language development 、 Universal 、 Scalable big data analysis engine . The main function is mainly used for data calculation .
Spark or Hadoop
Hadoop MapReduce Because its original design is not to meet the circular iterative data flow processing , So in many Data reusability scenarios running in parallel ( Such as : machine learning 、 Graph mining algorithm 、 Interactive data mining algorithm ) Zhongcun In many problems such as computational efficiency . therefore Spark emerge as the times require ,Spark It's in the traditional MapReduce Calculation box On the basis of the frame , Using the optimization of its calculation process , Computing based on memory , To reduce the IO The cost of .
Spark and Hadoop The fundamental difference is the problem of data communication between multiple jobs : Spark Data between multiple jobs Communication is based on memory , and Hadoop It's disk based .
Spark Only in shuffle Write data to disk when , and Hadoop More than one of them MR Data interaction between jobs depends on disk interaction ,Spark The caching mechanism is better than HDFS Efficient caching mechanism .
In most data computing scenarios ,Spark It does. MapReduce Have more advantages . however Spark It's memory based , So in the actual production environment , Due to memory limitations , May be Due to insufficient memory resources Job Execution failure , here ,MapReduce It's actually a better choice , therefore Spark It's not a complete replacement for MR.
Spark Core module

- Spark Core Spark Core Provided in Spark The most basic and core functions ,Spark Other functions such as :Spark SQL, Spark Streaming,GraphX, MLlib It's all in Spark Core On the basis of
- Spark SQL Spark SQL yes Spark Components used to manipulate structured data . adopt Spark SQL, Users can use SQL perhaps Apache Hive Version of SQL dialect (HQL) To query data .
- Spark Streaming Spark Streaming yes Spark Platform for real-time data stream computing components , Provides a wealth of processing Streaming API.
- Spark MLlib MLlib yes Spark Provides a library of machine learning algorithms .MLlib Not only does it provide model evaluation 、 Data import, etc Extra features , It also provides some lower level machine learning primitives .
- Spark GraphX GraphX yes Spark The framework and algorithm library provided by graph computing .
边栏推荐
- ERROR: certificate common name “www.mysql.com” doesn’t match requested host name “137.254.60.11”.
- Hudi 数据管理和存储概述
- State compression DP acwing 91 Shortest Hamilton path
- Numerical analysis notes (I): equation root
- [point cloud processing paper crazy reading classic version 13] - adaptive graph revolutionary neural networks
- PowerDesigner does not display table fields, only displays table names and references, which can be modified synchronously
- [point cloud processing paper crazy reading classic version 10] - pointcnn: revolution on x-transformed points
- LeetCode每日一题(1300. Sum of Mutated Array Closest to Target)
- 【点云处理之论文狂读前沿版12】—— Adaptive Graph Convolution for Point Cloud Analysis
- 307. Range Sum Query - Mutable
猜你喜欢

Jenkins learning (I) -- Jenkins installation

LeetCode 324. Swing sort II

【点云处理之论文狂读前沿版12】—— Adaptive Graph Convolution for Point Cloud Analysis

Idea uses the MVN command to package and report an error, which is not available
![[point cloud processing paper crazy reading frontier version 10] - mvtn: multi view transformation network for 3D shape recognition](/img/94/2ab1feb252dc84c2b4fcad50a0803f.png)
[point cloud processing paper crazy reading frontier version 10] - mvtn: multi view transformation network for 3D shape recognition

【点云处理之论文狂读经典版7】—— Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs
![[point cloud processing paper crazy reading classic version 11] - mining point cloud local structures by kernel correlation and graph pooling](/img/40/e0c7bad60b19cafa467c229419ac21.png)
[point cloud processing paper crazy reading classic version 11] - mining point cloud local structures by kernel correlation and graph pooling

Digital management medium + low code, jnpf opens a new engine for enterprise digital transformation

【Kotlin学习】类、对象和接口——带非默认构造方法或属性的类、数据类和类委托、object关键字

With low code prospect, jnpf is flexible and easy to use, and uses intelligence to define a new office mode
随机推荐
Derivation of Fourier transform
Integrated use of interlij idea and sonarqube
[advanced feature learning on point clouds using multi resolution features and learning]
Basic knowledge of network security
Install third-party libraries such as Jieba under Anaconda pytorch
LeetCode 513. Find the value in the lower left corner of the tree
LeetCode 30. Concatenate substrings of all words
LeetCode 57. Insert interval
PowerDesigner does not display table fields, only displays table names and references, which can be modified synchronously
Overview of database system
Redis learning (I)
Spark structured stream writing Hudi practice
Spark 概述
LeetCode 515. Find the maximum value in each tree row
Explanation of the answers to the three questions
LeetCode 508. The most frequent subtree elements and
Computing level network notes
LeetCode 438. Find all letter ectopic words in the string
Solve POM in idea Comment top line problem in XML file
LeetCode 532. K-diff number pairs in array