当前位置：网站首页>Final summary spark

Final summary spark

2022-06-29 07:16:00 【Yushijuj】

Term summary

The time of a semester flies away , In this semester, I am studying Spark In the middle of the war , I learned it, but I can't tell why , I didn't learn it. I know something , One semester down , I feel that the knowledge I have learned is not my own, but the teacher's , Leading by the nose , I evaluate myself , I have lost most of my ability of autonomous learning , On the contrary, it is more a step-by-step teaching of teachers , After a semester, I was ignorant , In a daze , I don't know anything after learning , Later, I worked out a set of questions for myself , Did , Many won't . That feeling , I am just a little bit of stuff , What all don't .

Hadoop MapReduce Is a programming model for dealing with large data sets , It uses parallel distributed algorithms . Developers can write highly parallelized operators , Don't worry about job allocation and fault tolerance . however ,MapReduce One of the challenges it faces is to run a job through a continuous multi-step process . In each step ,MapReduce To read data from the cluster , Perform the operation , And write the results to HDFS. Because each step requires disk read and write , disk I/O The delay will result in MapReduce Slow operation .

Development Spark Our original intention is to break through MapReduce These limitations of , It can perform in memory processing , Reduce the number of steps in the job , And reuse data across multiple parallel operations . With the help of Spark, Read data into memory 、 It takes only one step to perform the operation and write back the result , It greatly accelerates the speed of implementation .Spark In memory caching can also be used to significantly speed up machine learning algorithms that repeatedly call a function on the same data set , And then reuse the data . Data reuse through elastic distributed data sets (RDD) Create data abstractions on —DataFrames Has been achieved , An elastic distributed data set is a data set that is cached in memory and stored in multiple locations Spark A collection of objects reused in an operation . It greatly reduces the delay , send Spark Than MapReduce Several times faster , This is especially true in machine learning and interactive analysis .

Study Spark It requires long-term and in-depth learning , To learn well , let me put it another way , It is necessary to calm down to complete .

Not familiar with the code from the beginning , Statements often have some low-level errors , I also learned a lot of ways to solve problems. What do you not understand , Just try to solve it yourself , If you really don't understand it, just go to Baidu , In the process of learning, what we gain is not only that after many attempts , The joy of overcoming all difficulties to make the program run , It is more about solving the problem of making mistakes , The following is a summary of my own problems ：

There will be some small problems in the configuration of environment variables .
In the configuration spark The directory cannot be opened in the setting of 、 File upload error , There is also the presence of entry Hadoop Security mode and a series of other problems .
Too much reliance on the teacher's lecture notes , I feel like I can't do anything without my lecture notes .
Meet a little Bug I don't want to solve it , Have a fear of difficulties .
I still don't practice enough , So that their own problems continue , The situation is continuous , You have to do more to learn 、 More practice .

I will be a junior soon , It is hard to avoid feeling confused and afraid about tomorrow and the future , Come on, study , Today is the day , Actively welcome tomorrow ！

原网站

版权声明
本文为[Yushijuj]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/180/202206290520000482.html

当前位置：网站首页>Final summary spark

Final summary spark

Term summary

边栏推荐

猜你喜欢

随机推荐