当前位置:网站首页>Mapreduce实例(二):求平均值
Mapreduce实例(二):求平均值
2022-07-27 14:42:00 【笑看风云路】
大家好,我是风云,欢迎大家关注我的博客 或者 微信公众号【笑看风云路】,在未来的日子里我们一起来学习大数据相关的技术,一起努力奋斗,遇见更好的自己!
实现思路
求平均数是MapReduce比较常见的算法,求平均数的算法也比较简单,一种思路是Map端读取数据,在数据输入到Reduce之前先经过shuffle,将map函数输出的key值相同的所有的value值形成一个集合value-list,然后将输入到Reduce端,Reduce端汇总并且统计记录数,然后作商即可。具体原理如下图所示:
编写代码
Mapper代码
public static class Map extends Mapper<Object , Text , Text , IntWritable>{
private static Text newKey=new Text();
//实现map函数
public void map(Object key,Text value,Context context) throws IOException, InterruptedException{
// 将输入的纯文本文件的数据转化成String
String line=value.toString();
System.out.println(line);
String arr[]=line.split("\t");
newKey.set(arr[0]);
int click=Integer.parseInt(arr[1]);
context.write(newKey, new IntWritable(click));
}
}
map端在采用Hadoop的默认输入方式之后,将输入的value值通过split()方法截取出来,我们把截取的商品点击次数字段转化为IntWritable类型并将其设置为value,把商品分类字段设置为key,然后直接输出key/value的值。
Reducer代码
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>{
//实现reduce函数
public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{
int num=0;
int count=0;
for(IntWritable val:values){
num+=val.get(); //每个元素求和num
count++; //统计元素的次数count
}
int avg=num/count; //计算平均数
context.write(key,new IntWritable(avg));
}
}
map的输出<key,value>经过shuffle过程集成<key,values>键值对,然后将<key,values>键值对交给reduce。reduce端接收到values之后,将输入的key直接复制给输出的key,将values通过for循环把里面的每个元素求和num并统计元素的次数count,然后用num除以count 得到平均值avg,将avg设置为value,最后直接输出<key,value>就可以了。
完整代码
package mapreduce;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class MyAverage{
public static class Map extends Mapper<Object , Text , Text , IntWritable>{
private static Text newKey=new Text();
public void map(Object key,Text value,Context context) throws IOException, InterruptedException{
String line=value.toString();
System.out.println(line);
String arr[]=line.split("\t");
newKey.set(arr[0]);
int click=Integer.parseInt(arr[1]);
context.write(newKey, new IntWritable(click));
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>{
public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{
int num=0;
int count=0;
for(IntWritable val:values){
num+=val.get();
count++;
}
int avg=num/count;
context.write(key,new IntWritable(avg));
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{
Configuration conf=new Configuration();
System.out.println("start");
Job job =new Job(conf,"MyAverage");
job.setJarByClass(MyAverage.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path in=new Path("hdfs://localhost:9000/mymapreduce4/in/goods_click");
Path out=new Path("hdfs://localhost:9000/mymapreduce4/out");
FileInputFormat.addInputPath(job,in);
FileOutputFormat.setOutputPath(job,out);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
-------------- end ----------------
微信公众号:扫描下方二维码或 搜索 笑看风云路 关注
边栏推荐
- It can carry 100 people! Musk releases the strongest "starship" in history! Go to Mars as early as next year!
- 一款功能强大的Web漏洞扫描和验证工具(Vulmap)
- Talk about ThreadLocal
- First understanding of structure
- 三星关闭在中国最后一家手机工厂
- leetcode234题-简单方法判断回文链表
- DRF学习笔记(二):数据反序列化
- Sword finger offer 51. reverse pairs in the array
- Solve mt7620 continuous cycle uboot (LZMA error 1 - must reset board to recover)
- 携手SiFive,格兰仕进军半导体领域!两款自研芯片曝光
猜你喜欢

Three uses of static keyword

Ncnn reasoning framework installation; Onnx to ncnn

SQL multi table query

判断数据的精确类型

初识MySQL数据库

DRF学习笔记(二):数据反序列化

Understand │ what is cross domain? How to solve cross domain problems?
![[sword finger offer] interview question 46: translating numbers into strings - dynamic programming](/img/ba/7a4136fd95ba2463556bc45231e8a2.png)
[sword finger offer] interview question 46: translating numbers into strings - dynamic programming

MySQL表数据的增删查改

IP protocol of network layer
随机推荐
Is the array name the address of the first element?
: 0xC0000005: 写入位置 0x01458000 时发生访问冲突----待解
flink打包程序提交任务示例
C language: minesweeping games
C language: string function and memory function
网络原理(2)——网络开发
网络原理(1)——基础原理概述
Constraints, design and joint query of data table -- 8000 word strategy + Exercise answers
Keil implements compilation with makefile
[Yunxiang book club issue 13] common methods of viewing media information and processing audio and video files in ffmpeg
Three uses of static keyword
SQL multi table query
C language: custom type
[sword finger offer] interview question 45: arrange the array into the smallest number
profileapi.h header
借5G东风,联发科欲再战高端市场?
这些题~~
[sword finger offer] interview question 41: median in data flow - large and small heap implementation
数据表的约束以及设计、联合查询——8千字攻略+题目练习解答
台积电的反击:指控格芯侵犯25项专利,并要求禁售!
