当前位置:网站首页>Using multipleoutputs to output multiple files in MapReduce
Using multipleoutputs to output multiple files in MapReduce
2022-07-03 15:11:00 【Brother Xing plays with the clouds】
The user is using Mapreduce By default part-* name ,MultipleOutputs You can output different key value pairs to different user-defined files .
The implementation process is calling output.write(key, new IntWritable(total), key.toString());
The third parameter of the method is public void write(KEYOUT key, VALUEOUT value, String baseOutputPath) Specifies the naming prefix of the output file , Then we can pass on different key Use different baseOutputPath Make a difference key Corresponding value Output to different files , For example, output the data of the same day to a file named after that date
Hadoop Technology insider : In depth analysis of MapReduce Architecture design and implementation principle PDF High definition scanning version http://www.linuxidc.com/Linux/2014-06/103576.htm
Test data :ip-to-hosts.txt
18.217.167.70 United States 206.96.54.107 United States 196.109.151.139 Mauritius 174.52.58.113 United States 142.111.216.8 Canada 162.100.49.185 United States 146.38.26.54 United States 36.35.107.36 China 95.214.95.13 Spain 2.96.191.111 United Kingdom 62.177.119.177 Czech Republic 21.165.189.3 United States 46.190.32.115 Greece 113.173.113.29 Vietnam 42.65.172.142 Taiwan 197.91.198.199 South Africa 68.165.71.27 United States 110.119.165.104 China 171.50.76.89 India 171.207.52.113 Singapore 40.174.30.170 United States 191.170.95.175 United States 17.81.129.101 United States 91.212.157.202 France 173.83.82.99 United States 129.75.56.220 United States 149.25.104.198 United States 103.110.22.19 Indonesia 204.188.117.122 United States 138.23.10.72 United States 172.50.15.32 United States 85.88.38.58 Belgium 49.15.14.6 India 19.84.175.5 United States 50.158.140.215 United States 161.114.120.34 United States 118.211.174.52 Australia 220.98.113.71 Japan 182.101.16.171 China 25.45.75.194 United Kingdom 168.16.162.99 United States 155.60.219.154 Australia 26.216.17.198 United States 68.34.157.157 United States 89.176.196.28 Czech Republic 173.11.51.134 United States 116.207.191.159 China 164.210.124.152 United States 168.17.158.38 United States 174.24.173.11 United States 143.64.173.176 United States 160.164.158.125 Italy 15.111.128.4 United States 22.71.176.163 United States 105.57.100.182 Morocco 111.147.83.42 China 137.157.65.89 Australia
Each row of data in this file has two fields Namely ip Address and address ip Country corresponding to the address , With \t Separate
Code up
public static class IPCountryReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private MultipleOutputs output;
@Override protected void setup(Context context ) throws IOException, InterruptedException { output = new MultipleOutputs(context); }
@Override protected void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int total = 0; for(IntWritable value: values) { total += value.get(); } <span style="color:#FF0000;"> output.write(new Text("Output by MultipleOutputs"), NullWritable.get(), key.toString()); output.write(key, new IntWritable(total), key.toString());</span>
}
@Override protected void cleanup(Context context ) throws IOException, InterruptedException { output.close(); } }
stay reduce Of setup In the method
output = new MultipleOutputs(context);
And then in reduce Through the output Output content to different files
private Configuration conf; public static final String NAME = "named_output";
public static void main(String[] args) throws Exception { args =new String[] {"hdfs://caozw:9100/user/hadoop/hadooprealword","hdfs://caozw:9100/user/hadoop/hadooprealword/output"}; ToolRunner.run(new Configuration(), new NamedCountryOutputJob(), args); }
public int run(String[] args) throws Exception { if(args.length != 2) { System.err.println("Usage: named_output <input> <output>"); System.exit(1); }
Job job = new Job(conf, "IP count by country to named files"); job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(IPCountryMapper.class); job.setReducerClass(IPCountryReducer.class);
job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setJarByClass(NamedCountryOutputJob.class);
FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 1 : 0;
}
public void setConf(Configuration conf) { this.conf = conf; }
public Configuration getConf() { return conf; }
public static class IPCountryMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int country_pos = 1; private static final Pattern pattern = Pattern.compile("\\t");
@Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String country = pattern.split(value.toString())[country_pos]; context.write(new Text(country), new IntWritable(1)); } }
test result :
边栏推荐
- Global and Chinese markets for ionization equipment 2022-2028: Research Report on technology, participants, trends, market size and share
- [set theory] inclusion exclusion principle (complex example)
- Web server code parsing - thread pool
- 在MapReduce中利用MultipleOutputs输出多个文件
- Can‘t connect to MySQL server on ‘localhost‘
- Dataframe returns the whole row according to the value
- Kubernetes will show you from beginning to end
- 5.2-5.3
- Matplotlib drawing label cannot display Chinese problems
- Kubernetes 进阶训练营 Pod基础
猜你喜欢

Troubleshooting method of CPU surge

Byte practice plane longitude 2

【云原生训练营】模块七 Kubernetes 控制平面组件:调度器与控制器
![[transform] [practice] use pytoch's torch nn. Multiheadattention to realize self attention](/img/94/a9c7010fe9f14454469609ac4dd871.png)
[transform] [practice] use pytoch's torch nn. Multiheadattention to realize self attention

el-switch 赋值后状态不变化

Basic SQL tutorial

Didi off the shelf! Data security is national security

Functional modules and application scenarios covered by the productization of user portraits

4-29——4.32

【云原生训练营】模块八 Kubernetes 生命周期管理和服务发现
随机推荐
Finally, someone explained the financial risk management clearly
[wechat applet] wxss template style
There are links in the linked list. Can you walk three steps faster or slower
Nppexec get process return code
[transform] [NLP] first proposed transformer. The 2017 paper "attention is all you need" by Google brain team
视觉上位系统设计开发(halcon-winform)-6.节点与宫格
【云原生训练营】模块七 Kubernetes 控制平面组件:调度器与控制器
【可能是全中文网最全】pushgateway入门笔记
QT program font becomes larger on computers with different resolutions, overflowing controls
C # realizes the login interface, and the password asterisk is displayed (hide the input password)
C string format (decimal point retention / decimal conversion, etc.)
【日常训练】395. 至少有 K 个重复字符的最长子串
视觉上位系统设计开发(halcon-winform)-4.通信管理
Using Tengine to solve the session problem of load balancing
Use of Tex editor
【云原生训练营】模块八 Kubernetes 生命周期管理和服务发现
视觉上位系统设计开发(halcon-winform)-1.流程节点设计
[graphics] adaptive shadow map
Troubleshooting method of CPU surge
Relationship between truncated random distribution and original distribution