当前位置:网站首页>Share an example of a simple MapReduce method using a virtual machine
Share an example of a simple MapReduce method using a virtual machine
2022-06-30 04:07:00 【Xiao Zhu, classmate Wu】
I am also a new comer in big data , The bad news I share may only disturb the younger newcomers , Excuse me .
(1) First, the last mapreduce Flow chart of the method .

This is a method of counting the number of occurrences of text words , Very much in hadoop The wisdom of dividing and ruling the ecosystem , In the picture above splitting and mapping Phases can run on different hosts or virtual machines , If it's distributed , It will run very fast .reducing The stage is to count the same words into a module , Finally, together . The implementation method on the virtual machine will be introduced later .
(2) Create... On the virtual machine .txt file
If there is no ready-made data set , Just use what I wrote above , Let's write the text into a .txt In file . Make sure that the virtual machine is configured hadoop environment variable ,hdfs and yarn Various configuration files have been edited , No error will be reported when running .
[[email protected] ~]# cd /opt/rh/hadoop-3.2.2/
[[email protected] hadoop-3.2.2]# ls
bin etc hdfs include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share tmp wcinput wcoutput
[[email protected] hadoop-3.2.2]# mkdir wordcountfile
[[email protected] hadoop-3.2.2]# ls
bin etc hdfs include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share tmp wcinput wcoutput wordcountfile
[[email protected] hadoop-3.2.2]# cd wordcountfile
[[email protected] wordcountfile]# vi wordfile.txt
[[email protected] wordcountfile]# cat wordfile.txt
i like python
python like spark
Spark link hadoop
[[email protected] wordcountfile]#
So here we are hadoop The main file directory is created ,wordfile.txt file . Next we type in the command , function mapreduce Method .
(3) Open the virtual machine cluster ,start-all.sh, call hadoop Under the mapreduce Methods wordcount jar package , Specify the path to save the result file , Count the number of times each word appears .
[[email protected] hadoop-3.2.2]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar wordcount file:///opt/rh/hadoop-3.2.2/wordcountfile/wordfile.txt file:///opt/rh/hadoop-3.2.2/wordoutputfile
The virtual machine firewall needs to be turned off ,systemctl stop firewalld.service, The existence of firewall will affect the communication between hosts in the cluster . If you do not specify the full path of the output input file , It will be reported that the file has not been uploaded to hdfs Error of . Click on the run .
2021-04-03 21:54:23,391 INFO mapreduce.Job: map 0% reduce 0%
2021-04-03 21:54:29,850 INFO mapreduce.Job: map 100% reduce 0%
2021-04-03 21:54:38,973 INFO mapreduce.Job: map 100% reduce 100%
2021-04-03 21:54:40,002 INFO mapreduce.Job: Job job_1617457620069_0002 completed successfully
2021-04-03 21:54:40,169 INFO mapreduce.Job: Counters: 54
See this note wordcount The program runs successfully . Let's look at the result file .
drwxr-xr-x. 2 root root 26 Apr 3 21:33 wordcountfile
drwxr-xr-x. 2 root root 88 Apr 3 21:58 wordoutputfile
[[email protected] hadoop-3.2.2]# cd wordoutputfile/
[[email protected] wordoutputfile]# ls
part-r-00000 _SUCCESS
We turn on part-r-00000 file , See our results .
[[email protected] wordoutputfile]# cat part-r-00000
Spark 1
hadoop 1
i 1
like 2
link 1
python 2
spark 1
[[email protected] wordoutputfile]#
The result is the same as that in the above figure . Errors may be reported during execution .
But don't give up , If you encounter any problems, go online to search , You can also ask me . Although I don't know how to .
边栏推荐
- 云原生入门+容器概念介绍
- [note] on May 27, 2022, MySQL is operated through pychart
- 深入浅出掌握grpc通信框架
- Error in conditional filter (if) syntax in sum function in SQL Server2005
- Wang Shuang - assembly language learning summary
- DRF -- nested serializer (multi table joint query)
- Unity 在編輯器中輸入字符串時,轉義字符的輸入
- Everyone, Flink 1.13.6, mysql-cdc2.2.0, the datetime (6) class extracted
- Interface testing -- how to analyze an interface?
- Titanic(POJ2361)
猜你喜欢
![[image fusion] multi focus and multi spectral image fusion based on cross bilateral filter and weighted average with matlab code](/img/9c/2553d192c2f9b93acc6550220c447f.png)
[image fusion] multi focus and multi spectral image fusion based on cross bilateral filter and weighted average with matlab code

解决navicat连接数据库遇到的问题

Huawei cloud native - data development and datafactory

【论文阅读|深读】DANE:Deep Attributed Network Embedding

【云原生】AI云开发平台——AI Model Foundry介绍(开发者可免费体验AI训练模型)

el-upload上傳文件(手動上傳,自動上傳,上傳進度)

【力扣刷题总结】数据库题目按知识点分类总结(持续更新/简单和中等题已完结)

接口测试--如何分析一个接口?

el-upload上传文件(手动上传,自动上传,上传进度)

An error occurs when sqlyog imports the database. Please help solve it!
随机推荐
UML diagrams and list collections
SQL追加字段
Grasp grpc communication framework in simple terms
各位大佬,flink 1.13.6,mysql-cdc2.2.0,抽取上来的datetime(6)类
[punch in - Blue Bridge Cup] day 5 --- lower() small
Green new power and "zero" burden of computing power -- JASMINER X4 series is popular
matplotlib. pyplot. Hist parameter introduction
What does the hyphen mean for a block in Twig like in {% block body -%}?
. Net 7 JWT configuration is too convenient!
[note] on May 27, 2022, MySQL is operated through pychart
ThingsBoard教程(二三):在规则链中计算二个设备的温度差
[FAQ] page cross domain and interface Cross Domain
.NET 7 的 JWT 配置太方便了!
【力扣刷题总结】数据库题目按知识点分类总结(持续更新/简单和中等题已完结)
GIS related data
el-upload上传文件(手动上传,自动上传,上传进度)
Solve the problem of Navicat connecting to the database
dbt产品初体验
接口测试--如何分析一个接口?
工程安全和工程质量