当前位置:网站首页>Final summary spark
Final summary spark
2022-06-29 07:16:00 【Yushijuj】
Term summary
The time of a semester flies away , In this semester, I am studying Spark In the middle of the war , I learned it, but I can't tell why , I didn't learn it. I know something , One semester down , I feel that the knowledge I have learned is not my own, but the teacher's , Leading by the nose , I evaluate myself , I have lost most of my ability of autonomous learning , On the contrary, it is more a step-by-step teaching of teachers , After a semester, I was ignorant , In a daze , I don't know anything after learning , Later, I worked out a set of questions for myself , Did , Many won't . That feeling , I am just a little bit of stuff , What all don't .
Hadoop MapReduce Is a programming model for dealing with large data sets , It uses parallel distributed algorithms . Developers can write highly parallelized operators , Don't worry about job allocation and fault tolerance . however ,MapReduce One of the challenges it faces is to run a job through a continuous multi-step process . In each step ,MapReduce To read data from the cluster , Perform the operation , And write the results to HDFS. Because each step requires disk read and write , disk I/O The delay will result in MapReduce Slow operation .
Development Spark Our original intention is to break through MapReduce These limitations of , It can perform in memory processing , Reduce the number of steps in the job , And reuse data across multiple parallel operations . With the help of Spark, Read data into memory 、 It takes only one step to perform the operation and write back the result , It greatly accelerates the speed of implementation .Spark In memory caching can also be used to significantly speed up machine learning algorithms that repeatedly call a function on the same data set , And then reuse the data . Data reuse through elastic distributed data sets (RDD) Create data abstractions on —DataFrames Has been achieved , An elastic distributed data set is a data set that is cached in memory and stored in multiple locations Spark A collection of objects reused in an operation . It greatly reduces the delay , send Spark Than MapReduce Several times faster , This is especially true in machine learning and interactive analysis .
Study Spark It requires long-term and in-depth learning , To learn well , let me put it another way , It is necessary to calm down to complete .
Not familiar with the code from the beginning , Statements often have some low-level errors , I also learned a lot of ways to solve problems. What do you not understand , Just try to solve it yourself , If you really don't understand it, just go to Baidu , In the process of learning, what we gain is not only that after many attempts , The joy of overcoming all difficulties to make the program run , It is more about solving the problem of making mistakes , The following is a summary of my own problems :
- There will be some small problems in the configuration of environment variables .
- In the configuration spark The directory cannot be opened in the setting of 、 File upload error , There is also the presence of entry Hadoop Security mode and a series of other problems .
- Too much reliance on the teacher's lecture notes , I feel like I can't do anything without my lecture notes .
- Meet a little Bug I don't want to solve it , Have a fear of difficulties .
- I still don't practice enough , So that their own problems continue , The situation is continuous , You have to do more to learn 、 More practice .
I will be a junior soon , It is hard to avoid feeling confused and afraid about tomorrow and the future , Come on, study , Today is the day , Actively welcome tomorrow !
边栏推荐
猜你喜欢
随机推荐
软件工程师与软件开发区别? Software Engineer和Software Developer区别?
消息队列之通过幂等设计和原子锁避免重复退款
Markdown 技能树(4):链接
Daily question - force deduction - multiply the found value by 2
RPC and RMI
数字ic设计——UART
try anbox (by quqi99)
Spark RDD案例:统计每日新增用户
2022.6.27-----leetcode.522
IDEA常用插件
JVM系列之对象深度探秘
把多个ROC曲线画在一张图上
Json对象和Json字符串的区别
Oscilloscope symbols
Uniapp obtains the date implementation of the beginning and end of the previous month and the next month
Configuring MySQL 5.7 and 8 under CentOS
Markdown 技能树(8):代码块
Effective methods for construction enterprises to select smart construction sites
Redis in NoSQL database (4): redis publishing and subscription
Redis (4) of NoSQL database: redis new data type









