当前位置:网站首页>Using multipleoutputs to output multiple files in MapReduce
Using multipleoutputs to output multiple files in MapReduce
2022-07-03 15:11:00 【Brother Xing plays with the clouds】
The user is using Mapreduce By default part-* name ,MultipleOutputs You can output different key value pairs to different user-defined files .
The implementation process is calling output.write(key, new IntWritable(total), key.toString());
The third parameter of the method is public void write(KEYOUT key, VALUEOUT value, String baseOutputPath) Specifies the naming prefix of the output file , Then we can pass on different key Use different baseOutputPath Make a difference key Corresponding value Output to different files , For example, output the data of the same day to a file named after that date
Hadoop Technology insider : In depth analysis of MapReduce Architecture design and implementation principle PDF High definition scanning version http://www.linuxidc.com/Linux/2014-06/103576.htm
Test data :ip-to-hosts.txt
18.217.167.70 United States 206.96.54.107 United States 196.109.151.139 Mauritius 174.52.58.113 United States 142.111.216.8 Canada 162.100.49.185 United States 146.38.26.54 United States 36.35.107.36 China 95.214.95.13 Spain 2.96.191.111 United Kingdom 62.177.119.177 Czech Republic 21.165.189.3 United States 46.190.32.115 Greece 113.173.113.29 Vietnam 42.65.172.142 Taiwan 197.91.198.199 South Africa 68.165.71.27 United States 110.119.165.104 China 171.50.76.89 India 171.207.52.113 Singapore 40.174.30.170 United States 191.170.95.175 United States 17.81.129.101 United States 91.212.157.202 France 173.83.82.99 United States 129.75.56.220 United States 149.25.104.198 United States 103.110.22.19 Indonesia 204.188.117.122 United States 138.23.10.72 United States 172.50.15.32 United States 85.88.38.58 Belgium 49.15.14.6 India 19.84.175.5 United States 50.158.140.215 United States 161.114.120.34 United States 118.211.174.52 Australia 220.98.113.71 Japan 182.101.16.171 China 25.45.75.194 United Kingdom 168.16.162.99 United States 155.60.219.154 Australia 26.216.17.198 United States 68.34.157.157 United States 89.176.196.28 Czech Republic 173.11.51.134 United States 116.207.191.159 China 164.210.124.152 United States 168.17.158.38 United States 174.24.173.11 United States 143.64.173.176 United States 160.164.158.125 Italy 15.111.128.4 United States 22.71.176.163 United States 105.57.100.182 Morocco 111.147.83.42 China 137.157.65.89 Australia
Each row of data in this file has two fields Namely ip Address and address ip Country corresponding to the address , With \t Separate
Code up
public static class IPCountryReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private MultipleOutputs output;
@Override protected void setup(Context context ) throws IOException, InterruptedException { output = new MultipleOutputs(context); }
@Override protected void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int total = 0; for(IntWritable value: values) { total += value.get(); } <span style="color:#FF0000;"> output.write(new Text("Output by MultipleOutputs"), NullWritable.get(), key.toString()); output.write(key, new IntWritable(total), key.toString());</span>
}
@Override protected void cleanup(Context context ) throws IOException, InterruptedException { output.close(); } }
stay reduce Of setup In the method
output = new MultipleOutputs(context);
And then in reduce Through the output Output content to different files
private Configuration conf; public static final String NAME = "named_output";
public static void main(String[] args) throws Exception { args =new String[] {"hdfs://caozw:9100/user/hadoop/hadooprealword","hdfs://caozw:9100/user/hadoop/hadooprealword/output"}; ToolRunner.run(new Configuration(), new NamedCountryOutputJob(), args); }
public int run(String[] args) throws Exception { if(args.length != 2) { System.err.println("Usage: named_output <input> <output>"); System.exit(1); }
Job job = new Job(conf, "IP count by country to named files"); job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(IPCountryMapper.class); job.setReducerClass(IPCountryReducer.class);
job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setJarByClass(NamedCountryOutputJob.class);
FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 1 : 0;
}
public void setConf(Configuration conf) { this.conf = conf; }
public Configuration getConf() { return conf; }
public static class IPCountryMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int country_pos = 1; private static final Pattern pattern = Pattern.compile("\\t");
@Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String country = pattern.split(value.toString())[country_pos]; context.write(new Text(country), new IntWritable(1)); } }
test result :
边栏推荐
- CentOS7部署哨兵Redis(带架构图,清晰易懂)
- 【云原生训练营】模块八 Kubernetes 生命周期管理和服务发现
- 什么是Label encoding?one-hot encoding ,label encoding两种编码该如何区分和使用?
- [Yu Yue education] scientific computing and MATLAB language reference materials of Central South University
- Use of Tex editor
- [transform] [NLP] first proposed transformer. The 2017 paper "attention is all you need" by Google brain team
- Remote server background hangs nohup
- Yolov5 advanced 8 format conversion between high and low versions
- 第04章_逻辑架构
- What are the composite types of Blackhorse Clickhouse, an OLAP database recognized in the industry
猜你喜欢
Kubernetes 进阶训练营 Pod基础
Incluxdb2 buckets create database
【微信小程序】WXSS 模板样式
Série yolov5 (i) - - netron, un outil de visualisation de réseau
Vs+qt application development, set software icon icon
Mysql报错:[ERROR] mysqld: File ‘./mysql-bin.010228‘ not found (Errcode: 2 “No such file or directory“)
Troubleshooting method of CPU surge
[wechat applet] wxss template style
el-switch 赋值后状态不变化
【云原生训练营】模块七 Kubernetes 控制平面组件:调度器与控制器
随机推荐
B2020 分糖果
5.4-5.5
Redis主从、哨兵、集群模式介绍
Global and Chinese markets for sterile packaging 2022-2028: Research Report on technology, participants, trends, market size and share
阿特拉斯atlas扭矩枪 USB通讯教程基于MTCOM
[cloud native training camp] module 7 kubernetes control plane component: scheduler and controller
Finally, someone explained the financial risk management clearly
What are the composite types of Blackhorse Clickhouse, an OLAP database recognized in the industry
Kubernetes advanced training camp pod Foundation
Global and Chinese markets for ionization equipment 2022-2028: Research Report on technology, participants, trends, market size and share
【云原生训练营】模块七 Kubernetes 控制平面组件:调度器与控制器
Apache ant extension tutorial
【Transform】【NLP】首次提出Transformer,Google Brain团队2017年论文《Attention is all you need》
[graphics] adaptive shadow map
[pytorch learning notes] transforms
5-1 blocking / non blocking, synchronous / asynchronous
高并发下之redis锁优化实战
Global and Chinese market of iron free motors 2022-2028: Research Report on technology, participants, trends, market size and share
The method of parameter estimation of user-defined function in MATLAB
4-20-4-23 concurrent server, TCP state transition;