当前位置:网站首页>Spark overview
Spark overview
2022-07-03 09:25:00 【Did Xiao Hu get stronger today】
List of articles
Spark What is it?
Spark It is a memory based fast 、 Universal 、 Scalable big data analysis and calculation engine .
Spark and Hadoop
Hadoop By java language-written , Store massive data on the distributed server cluster and run the distributed server Open source framework for analytical applications .
Spark It's a kind of Scala Fast language development 、 Universal 、 Scalable big data analysis engine . The main function is mainly used for data calculation .
Spark or Hadoop
Hadoop MapReduce Because its original design is not to meet the circular iterative data flow processing , So in many Data reusability scenarios running in parallel ( Such as : machine learning 、 Graph mining algorithm 、 Interactive data mining algorithm ) Zhongcun In many problems such as computational efficiency . therefore Spark emerge as the times require ,Spark It's in the traditional MapReduce Calculation box On the basis of the frame , Using the optimization of its calculation process , Computing based on memory , To reduce the IO The cost of .
Spark and Hadoop The fundamental difference is the problem of data communication between multiple jobs : Spark Data between multiple jobs Communication is based on memory , and Hadoop It's disk based .
Spark Only in shuffle Write data to disk when , and Hadoop More than one of them MR Data interaction between jobs depends on disk interaction ,Spark The caching mechanism is better than HDFS Efficient caching mechanism .
In most data computing scenarios ,Spark It does. MapReduce Have more advantages . however Spark It's memory based , So in the actual production environment , Due to memory limitations , May be Due to insufficient memory resources Job Execution failure , here ,MapReduce It's actually a better choice , therefore Spark It's not a complete replacement for MR.
Spark Core module

- Spark Core Spark Core Provided in Spark The most basic and core functions ,Spark Other functions such as :Spark SQL, Spark Streaming,GraphX, MLlib It's all in Spark Core On the basis of
- Spark SQL Spark SQL yes Spark Components used to manipulate structured data . adopt Spark SQL, Users can use SQL perhaps Apache Hive Version of SQL dialect (HQL) To query data .
- Spark Streaming Spark Streaming yes Spark Platform for real-time data stream computing components , Provides a wealth of processing Streaming API.
- Spark MLlib MLlib yes Spark Provides a library of machine learning algorithms .MLlib Not only does it provide model evaluation 、 Data import, etc Extra features , It also provides some lower level machine learning primitives .
- Spark GraphX GraphX yes Spark The framework and algorithm library provided by graph computing .
边栏推荐
- LeetCode 513. Find the value in the lower left corner of the tree
- NPM install installation dependency package error reporting solution
- Problems in the implementation of lenet
- 2022-2-13 learning xiangniuke project - version control
- Flink学习笔记(九)状态编程
- Liteide is easy to use
- LeetCode 535. Encryption and decryption of tinyurl
- Install third-party libraries such as Jieba under Anaconda pytorch
- 【点云处理之论文狂读前沿版12】—— Adaptive Graph Convolution for Point Cloud Analysis
- Spark 概述
猜你喜欢

Temper cattle ranking problem

The "booster" of traditional office mode, Building OA office system, was so simple!

AcWing 785. Quick sort (template)
![[set theory] order relation (chain | anti chain | chain and anti chain example | chain and anti chain theorem | chain and anti chain inference | good order relation)](/img/fd/c0f885cdd17f1d13fdbc71b2aea641.jpg)
[set theory] order relation (chain | anti chain | chain and anti chain example | chain and anti chain theorem | chain and anti chain inference | good order relation)
[graduation season | advanced technology Er] another graduation season, I change my career as soon as I graduate, from animal science to programmer. Programmers have something to say in 10 years

【点云处理之论文狂读经典版7】—— Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs
![[kotlin learning] classes, objects and interfaces - define class inheritance structure](/img/66/34396e51c59504ebbc6b6eb9831209.png)
[kotlin learning] classes, objects and interfaces - define class inheritance structure
![[point cloud processing paper crazy reading frontier version 8] - pointview gcn: 3D shape classification with multi view point clouds](/img/ee/3286e76797a75c0f999c728fd2b555.png)
[point cloud processing paper crazy reading frontier version 8] - pointview gcn: 3D shape classification with multi view point clouds

CATIA automation object architecture - detailed explanation of application objects (I) document/settingcontrollers

Sword finger offer II 029 Sorted circular linked list
随机推荐
Crawler career from scratch (3): crawl the photos of my little sister ③ (the website has been disabled)
LeetCode每日一题(1024. Video Stitching)
【点云处理之论文狂读经典版13】—— Adaptive Graph Convolutional Neural Networks
CSDN markdown editor help document
Move anaconda, pycharm and jupyter notebook to mobile hard disk
[point cloud processing paper crazy reading frontier edition 13] - gapnet: graph attention based point neural network for exploring local feature
Go language - IO project
The idea of compiling VBA Encyclopedia
【点云处理之论文狂读经典版8】—— O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis
LeetCode 438. Find all letter ectopic words in the string
On February 14, 2022, learn the imitation Niuke project - develop the registration function
【点云处理之论文狂读前沿版12】—— Adaptive Graph Convolution for Point Cloud Analysis
Vscode编辑器右键没有Open In Default Browser选项
npm install安装依赖包报错解决方法
Logstash+jdbc data synchronization +head display problems
Go language - Reflection
AcWing 787. Merge sort (template)
Digital statistics DP acwing 338 Counting problem
Flink学习笔记(十)Flink容错机制
State compression DP acwing 291 Mondrian's dream