当前位置:网站首页>在MapReduce中利用MultipleOutputs输出多个文件
在MapReduce中利用MultipleOutputs输出多个文件
2022-07-03 15:04:00 【星哥玩云】
用户在使用Mapreduce时默认以part-*命名,MultipleOutputs可以将不同的键值对输出到用户自定义的不同的文件中。
实现过程是在调用output.write(key, new IntWritable(total), key.toString());
方法时候第三个参数是 public void write(KEYOUT key, VALUEOUT value, String baseOutputPath) 指定了输出文件的命名前缀,那么我们可以通过对不同的key使用不同的baseOutputPath来使不同key对应的value输出到不同的文件中,比如将同一天的数据输出到以该日期命名的文件中
Hadoop技术内幕:深入解析MapReduce架构设计与实现原理 PDF高清扫描版 http://www.linuxidc.com/Linux/2014-06/103576.htm
测试数据:ip-to-hosts.txt
18.217.167.70 United States 206.96.54.107 United States 196.109.151.139 Mauritius 174.52.58.113 United States 142.111.216.8 Canada 162.100.49.185 United States 146.38.26.54 United States 36.35.107.36 China 95.214.95.13 Spain 2.96.191.111 United Kingdom 62.177.119.177 Czech Republic 21.165.189.3 United States 46.190.32.115 Greece 113.173.113.29 Vietnam 42.65.172.142 Taiwan 197.91.198.199 South Africa 68.165.71.27 United States 110.119.165.104 China 171.50.76.89 India 171.207.52.113 Singapore 40.174.30.170 United States 191.170.95.175 United States 17.81.129.101 United States 91.212.157.202 France 173.83.82.99 United States 129.75.56.220 United States 149.25.104.198 United States 103.110.22.19 Indonesia 204.188.117.122 United States 138.23.10.72 United States 172.50.15.32 United States 85.88.38.58 Belgium 49.15.14.6 India 19.84.175.5 United States 50.158.140.215 United States 161.114.120.34 United States 118.211.174.52 Australia 220.98.113.71 Japan 182.101.16.171 China 25.45.75.194 United Kingdom 168.16.162.99 United States 155.60.219.154 Australia 26.216.17.198 United States 68.34.157.157 United States 89.176.196.28 Czech Republic 173.11.51.134 United States 116.207.191.159 China 164.210.124.152 United States 168.17.158.38 United States 174.24.173.11 United States 143.64.173.176 United States 160.164.158.125 Italy 15.111.128.4 United States 22.71.176.163 United States 105.57.100.182 Morocco 111.147.83.42 China 137.157.65.89 Australia
该文件中每行数据有两个字段 分别是ip地址和该ip地址对应的国家,以\t分隔
上代码
public static class IPCountryReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private MultipleOutputs output;
@Override protected void setup(Context context ) throws IOException, InterruptedException { output = new MultipleOutputs(context); }
@Override protected void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int total = 0; for(IntWritable value: values) { total += value.get(); } <span style="color:#FF0000;"> output.write(new Text("Output by MultipleOutputs"), NullWritable.get(), key.toString()); output.write(key, new IntWritable(total), key.toString());</span>
}
@Override protected void cleanup(Context context ) throws IOException, InterruptedException { output.close(); } }
在reduce的setup方法中
output = new MultipleOutputs(context);
然后在reduce中通过该output将内容输出到不同的文件中
private Configuration conf; public static final String NAME = "named_output";
public static void main(String[] args) throws Exception { args =new String[] {"hdfs://caozw:9100/user/hadoop/hadooprealword","hdfs://caozw:9100/user/hadoop/hadooprealword/output"}; ToolRunner.run(new Configuration(), new NamedCountryOutputJob(), args); }
public int run(String[] args) throws Exception { if(args.length != 2) { System.err.println("Usage: named_output <input> <output>"); System.exit(1); }
Job job = new Job(conf, "IP count by country to named files"); job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(IPCountryMapper.class); job.setReducerClass(IPCountryReducer.class);
job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setJarByClass(NamedCountryOutputJob.class);
FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 1 : 0;
}
public void setConf(Configuration conf) { this.conf = conf; }
public Configuration getConf() { return conf; }
public static class IPCountryMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int country_pos = 1; private static final Pattern pattern = Pattern.compile("\\t");
@Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String country = pattern.split(value.toString())[country_pos]; context.write(new Text(country), new IntWritable(1)); } }
测试结果:
边栏推荐
- [pytorch learning notes] transforms
- Didi off the shelf! Data security is national security
- 【Transform】【实践】使用Pytorch的torch.nn.MultiheadAttention来实现self-attention
- Pytoch deep learning and target detection practice notes
- Global and Chinese market of solder bars 2022-2028: Research Report on technology, participants, trends, market size and share
- [opengl] face pinching system
- App全局异常捕获
- .NET六大设计原则个人白话理解,有误请大神指正
- 5.2-5.3
- [graphics] adaptive shadow map
猜你喜欢
什么是embedding(把物体编码为一个低维稠密向量),pytorch中nn.Embedding原理及使用
基础SQL教程
B2020 points candy
My QT learning path -- how qdatetimeedit is empty
【微信小程序】WXSS 模板样式
[set theory] inclusion exclusion principle (complex example)
零拷贝底层剖析
[graphics] real shading in Unreal Engine 4
On MEM series functions of C language
Centos7 deployment sentry redis (with architecture diagram, clear and easy to understand)
随机推荐
How does vs+qt set the software version copyright, obtain the software version and display the version number?
什么是embedding(把物体编码为一个低维稠密向量),pytorch中nn.Embedding原理及使用
Global and Chinese market of air cargo logistics 2022-2028: Research Report on technology, participants, trends, market size and share
Déformation de la chaîne bm83 de niuke (conversion de cas, inversion de chaîne, remplacement de chaîne)
NOI OPENJUDGE 1.4(15)
Devaxpress: range selection control rangecontrol uses
[transform] [practice] use pytoch's torch nn. Multiheadattention to realize self attention
Neon global and Chinese markets 2022-2028: Research Report on technology, participants, trends, market size and share
C # realizes the login interface, and the password asterisk is displayed (hide the input password)
Mysql报错:[ERROR] mysqld: File ‘./mysql-bin.010228‘ not found (Errcode: 2 “No such file or directory“)
Global and Chinese market of marketing automation 2022-2028: Research Report on technology, participants, trends, market size and share
Niuke bm83 string deformation (case conversion, string inversion, string replacement)
[opengl] face pinching system
The latest M1 dedicated Au update Adobe audit CC 2021 Chinese direct installation version has solved the problems of M1 installation without flash back!
Container of symfony
[set theory] inclusion exclusion principle (complex example)
什么是one-hot encoding?Pytorch中,将label变成one hot编码的两种方式
B2020 points candy
NOI OPENJUDGE 1.6(09)
[graphics] real shading in Unreal Engine 4