当前位置:网站首页>Final summary spark
Final summary spark
2022-06-29 07:16:00 【Yushijuj】
Term summary
The time of a semester flies away , In this semester, I am studying Spark In the middle of the war , I learned it, but I can't tell why , I didn't learn it. I know something , One semester down , I feel that the knowledge I have learned is not my own, but the teacher's , Leading by the nose , I evaluate myself , I have lost most of my ability of autonomous learning , On the contrary, it is more a step-by-step teaching of teachers , After a semester, I was ignorant , In a daze , I don't know anything after learning , Later, I worked out a set of questions for myself , Did , Many won't . That feeling , I am just a little bit of stuff , What all don't .
Hadoop MapReduce Is a programming model for dealing with large data sets , It uses parallel distributed algorithms . Developers can write highly parallelized operators , Don't worry about job allocation and fault tolerance . however ,MapReduce One of the challenges it faces is to run a job through a continuous multi-step process . In each step ,MapReduce To read data from the cluster , Perform the operation , And write the results to HDFS. Because each step requires disk read and write , disk I/O The delay will result in MapReduce Slow operation .
Development Spark Our original intention is to break through MapReduce These limitations of , It can perform in memory processing , Reduce the number of steps in the job , And reuse data across multiple parallel operations . With the help of Spark, Read data into memory 、 It takes only one step to perform the operation and write back the result , It greatly accelerates the speed of implementation .Spark In memory caching can also be used to significantly speed up machine learning algorithms that repeatedly call a function on the same data set , And then reuse the data . Data reuse through elastic distributed data sets (RDD) Create data abstractions on —DataFrames Has been achieved , An elastic distributed data set is a data set that is cached in memory and stored in multiple locations Spark A collection of objects reused in an operation . It greatly reduces the delay , send Spark Than MapReduce Several times faster , This is especially true in machine learning and interactive analysis .
Study Spark It requires long-term and in-depth learning , To learn well , let me put it another way , It is necessary to calm down to complete .
Not familiar with the code from the beginning , Statements often have some low-level errors , I also learned a lot of ways to solve problems. What do you not understand , Just try to solve it yourself , If you really don't understand it, just go to Baidu , In the process of learning, what we gain is not only that after many attempts , The joy of overcoming all difficulties to make the program run , It is more about solving the problem of making mistakes , The following is a summary of my own problems :
- There will be some small problems in the configuration of environment variables .
- In the configuration spark The directory cannot be opened in the setting of 、 File upload error , There is also the presence of entry Hadoop Security mode and a series of other problems .
- Too much reliance on the teacher's lecture notes , I feel like I can't do anything without my lecture notes .
- Meet a little Bug I don't want to solve it , Have a fear of difficulties .
- I still don't practice enough , So that their own problems continue , The situation is continuous , You have to do more to learn 、 More practice .
I will be a junior soon , It is hard to avoid feeling confused and afraid about tomorrow and the future , Come on, study , Today is the day , Actively welcome tomorrow !
边栏推荐
- NoSQL数据库之Redis(一):安装 & 简介
- try anbox (by quqi99)
- QT STL type iterator
- Relevance - correlation analysis
- Domestic code hosting center code cloud
- WDCP accesses all paths that do not exist and jumps to the home page without returning 404
- 存token获取token刷新token发送header头
- 更改主机名的方法(永久)
- VerilogA - dynamic comparator
- 转:侯宏:企业数字化转型的关键不是技术,而是战略
猜你喜欢

Relevance - correlation analysis

Message queue avoiding repeated refund by idempotent design and atomic lock

NoSQL数据库介绍

NoSQL数据库之Redis(四):Redis新数据类型

IDEA 集成 码云

UVM authentication platform
![[when OSPF introduces direct connection routes, it makes a summary by using static black hole routes]](/img/a8/f77cc5e43e1885171e73f8ab543ee4.png)
[when OSPF introduces direct connection routes, it makes a summary by using static black hole routes]

Error: GPG check FAILED Once install MySQL

. NETCORE uses redis to limit the number of interface accesses

Crawler data analysis (introduction 2-re analysis)
随机推荐
How to fix Error: Failed to download metadata for repo ‘appstream‘: Cannot prepare internal mirrorli
Draw multiple ROC curves on a graph
Open source 23 things shardingsphere and database mesh have to say
Markdown 技能树(9):表格
CI工具Jenkins之二:搭建一个简单的CI项目
Crawler data analysis (introduction 2-re analysis)
QT serial port programming
Two methods for preorder traversal of binary tree
[QNX Hypervisor 2.2用户手册]6.2.1 Guest之间通信
idea使用
【软件测试】接口——基本测试流程
Suggestions on digital transformation of large chemical enterprises
IDEA常用插件
LeetCode_ Dynamic programming_ Medium_ 91. decoding method
通过keyup监听textarea输入更改按钮样式
Class differences of QT processing image data (qpixmap, qimage, qpicture)
[answer all questions] CSDN question and answer function evaluation
MIPS指令集与简要分析
NoSQL数据库之Redis(四):Redis新数据类型
Introduction to Ceres Quartet