当前位置:网站首页>5. Implement MapReduce program on window side to complete wordcount function
5. Implement MapReduce program on window side to complete wordcount function
2022-07-28 10:47:00 【Data analyst shrimp】
Test text data used by the program :
Dear River
Dear River Bear Spark
Car Dear Car Bear Car
Dear Car River Car
Spark Spark Dear Spark
1 Write the main class
(1)Maper class
First of all, it's custom Maper Class code
public class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//fields: Data representing a line of text : dear bear river
String[] words = value.toString().split("\t");
for (String word : words) {
// Every word appears 1 Time , Output as intermediate result
context.write(new Text(word), new IntWritable(1));
}
}
}
This Map Class is a generic type , It has four parameter types , Assign separately map() The input key of the function 、 The input values 、 Type of output key and output value .LongWritable: Input key type ,Text: Input value type ,Text: Output key type ,IntWritable: Output value type .
String[] words = value.toString().split("\t");,words The value of is Dear River Bear River
Input key key Is a long integer offset , Used to find the data of the first row and the data of the next row , The input value is a line of text Dear River Bear River, The output key is the word Bear , The output value is an integer 1.
Hadoop It provides a set of basic types that can optimize network serialization transmission , Instead of using Java Embedded type . These types are all in org.apache.hadoop.io In bag . Use here LongWritable type ( amount to Java Of Long type )、Text type ( amount to Java Medium String type ) and IntWritable type ( amount to Java Of Integer type ).
map() The parameters of the method are the input key and the input value . Take this procedure for example , Input key LongWritable key It's an offset , The input values Text value yes Dear Car Bear Car , We'll first include the... With one line of input Text Value to Java Of String type , Then use substring() Method to extract the columns we are interested in .map() The method also provides Context Instance is used for writing output content .
(2)Reducer class
public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
/*
(River, 1)
(River, 1)
(River, 1)
(Spark , 1)
(Spark , 1)
(Spark , 1)
(Spark , 1)
key: River
value: List(1, 1, 1)
key: Spark
value: List(1, 1, 1,1)
*/
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable count : values) {
sum += count.get();
}
context.write(key, new IntWritable(sum));// Output final results
};
}
Reduce The task starts with the partition number Map The end grabs data as :
(River, 1)
(River, 1)
(River, 1)
(spark, 1)
(Spark , 1)
(Spark , 1)
(Spark , 1)
The result after treatment is :
key: hello value: List(1, 1, 1)
key: spark value: List(1, 1, 1,1)
therefore reduce() The formal parameter of the function Iterable<IntWritable> values The received value is List(1, 1, 1) and List(1, 1, 1,1)
(3)Main function
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class WordCountMain {
// If in IDEA Middle local execution MR Program , Need to put mapred-site.xml Medium mapreduce.framework.name Value changed to local
public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
if (args.length != 2 || args == null) {
System.out.println("please input Path!");
System.exit(0);
}
//System.setProperty("HADOOP_USER_NAME","hadoop2.7");
Configuration configuration = new Configuration();
//configuration.set("mapreduce.job.jar","/home/bruce/project/kkbhdp01/target/com.kaikeba.hadoop-1.0-SNAPSHOT.jar");
// call getInstance Method , Generate job example
Job job = Job.getInstance(configuration, WordCountMain.class.getSimpleName());
// hit jar package
job.setJarByClass(WordCountMain.class);
// adopt job Set input / Output format
// MR The default input format for is TextInputFormat, So the next two lines can be commented out
// job.setInputFormatClass(TextInputFormat.class);
// job.setOutputFormatClass(TextOutputFormat.class);
// Set input / The output path
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// Set up processing Map/Reduce Stage class
job.setMapperClass(WordCountMap.class);
//map combine Reduce the amount of Internet traffic
job.setCombinerClass(WordCountReduce.class);
job.setReducerClass(WordCountReduce.class);
// If map、reduce Output. kv Same for type , Set up directly reduce Output. kv That's right ; If it's not the same , It needs to be set separately map, reduce Of Output kv type
//job.setMapOutputKeyClass(.class)
// job.setMapOutputKeyClass(Text.class);
// job.setMapOutputValueClass(IntWritable.class);
// Set up reduce task Final output key/value The type of
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// Submit the assignment
job.waitForCompletion(true);
}
}
2 Run locally
First change mapred-site.xml File configuration
take mapreduce.framework.name Is set to local
Then run locally :
View results :
3 Cluster operation
Mode one :
Pack first 
Changing configuration files , Change to yarn Pattern 
Add local jar Bag location :
Configuration configuration = new Configuration();
configuration.set("mapreduce.job.jar","C:\\Users\\tanglei1\\IdeaProjects\\Hadooptang\\target");

Set to allow cross platform remote calls :
configuration.set("mapreduce.app-submission.cross-platform","true");

Modify input parameters :
Running results :
Mode two :
take maven Project package , Run on the server side with commands mr Program
hadoop jar com.kaikeba.hadoop-1.0-SNAPSHOT.jar
com.kaikeba.hadoop.wordcount.WordCountMain /tttt.txt /wordcount11
边栏推荐
- Characteristics and installation of non relational database mongodb
- Sleeping barber problem
- GKLinearCongruentialRandomSource
- GKRidgedNoiseSource
- GKLinearCongruentialRandomSource
- PyQt5快速开发与实战 4.12 日历与时间
- Machine learning -- handwritten English alphabet 3 -- engineering features
- Andorid development
- C language input string with spaces
- 10_ue4进阶_添加倒地和施法动作
猜你喜欢

Redis-day01-常识补充及redis介绍

Semeval 2022 | introducing knowledge into ner system, aridamo academy won the best paper award

Chapter 1: cross end development of small programs of uniapp ----- create a uniapp project

蓝桥杯嵌入式-HAL库-USART_RX

安装office自定义项 安装期间出错 解决办法
Advanced C language: pointer (1)

GKCheckerboardNoiseSource

AP AUTOSAR platform design 1-2 introduction, technical scope and method

8、Yarn系统架构与原理详解

markdown转成word或者pdf
随机推荐
Product side data analysis thinking
Pat grade a title in September 2019
机器人技术(RoboCup 2D)如何进行一场球赛
GKARC4RandomSource
从零开始Blazor Server(2)--整合数据库
Codeforces Round #614 (Div. 2) B. JOE is on TV!
蓝桥杯嵌入式-HAL库-USART_RX
Powerful and unique! Yingzhong technology 2020 10th generation core unique product launch
Lucene query syntax memo
8、Yarn系统架构与原理详解
Advanced C language: pointer (1)
GKNoise
7. MapReduce custom sorting implementation
287. Find the Duplicate Number
爱可可AI前沿推介(7.28)
GKCheckerboardNoiseSource
Excel word simple skills sorting (continuous update ~)
粒子群实现最优解的求解
GKCylindersNoiseSource
GKRidgedNoiseSource