当前位置:网站首页>6. MapReduce custom partition implementation
6. MapReduce custom partition implementation
2022-07-28 10:47:00 【Data analyst shrimp】
MapReduce The built-in partition is HashPartitioner
principle : First pair map Output key seek hash value , On the mold reduce task Number , Based on the results , Determine this output kv Yes , Matched reduce Task removal .
Custom partitions need to be inherited Partitioner, make carbon copies getpariton() Method
Custom partition class :
Be careful :map The output of is <K,V> Key value pair
among int partitionIndex = dict.get(text.toString()),partitionIndex Is to obtain K Value
attach : Computed text
Dear Dear Bear Bear River Car Dear Dear Bear Rive
Dear Dear Bear Bear River Car Dear Dear Bear Rive
Need to be in main Function , Specify a custom partition class 
Custom partition class :
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
import java.util.HashMap;
public class CustomPartitioner extends Partitioner<Text, IntWritable> {
public static HashMap<String, Integer> dict = new HashMap<String, Integer>();
//Text Represents the map Stage output key,IntWritable Represents the output value
static{
dict.put("Dear", 0);
dict.put("Bear", 1);
dict.put("River", 2);
dict.put("Car", 3);
}
public int getPartition(Text text, IntWritable intWritable, int i) {
//
int partitionIndex = dict.get(text.toString());
return partitionIndex;
}
}
Be careful :map The output of is a key value pair <K,V>,int partitionIndex = dict.get(text.toString()); Medium partitionIndex yes map Output the value of the key in the key value pair , That is to say K Value .
Maper class :
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] words = value.toString().split("\t");
for (String word : words) {
// Every word appears 1 Time , Output as intermediate result
context.write(new Text(word), new IntWritable(1));
}
}
}
Reducer class :
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] words = value.toString().split("\t");
for (String word : words) {
// Every word appears 1 Time , Output as intermediate result
context.write(new Text(word), new IntWritable(1));
}
}
}
main function :
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class WordCountMain {
public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
if (args.length != 2 || args == null) {
System.out.println("please input Path!");
System.exit(0);
}
Configuration configuration = new Configuration();
configuration.set("mapreduce.job.jar","/home/bruce/project/kkbhdp01/target/com.kaikeba.hadoop-1.0-SNAPSHOT.jar");
Job job = Job.getInstance(configuration, WordCountMain.class.getSimpleName());
// hit jar package
job.setJarByClass(WordCountMain.class);
// adopt job Set input / Output format
//job.setInputFormatClass(TextInputFormat.class);
//job.setOutputFormatClass(TextOutputFormat.class);
// Set input / The output path
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// Set up processing Map/Reduce Stage class
job.setMapperClass(WordCountMap.class);
//map combine
//job.setCombinerClass(WordCountReduce.class);
job.setReducerClass(WordCountReduce.class);
// If map、reduce Output. kv Same for type , Set up directly reduce Output. kv That's right ; If it's not the same , It needs to be set separately map, reduce Output. kv type
//job.setMapOutputKeyClass(.class)
// Set the final output key/value The type of m
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setPartitionerClass(CustomPartitioner.class);
job.setNumReduceTasks(4);
// Submit the assignment
job.waitForCompletion(true);
}
}
main Function parameter setting :
边栏推荐
猜你喜欢
随机推荐
Semeval 2022 | introducing knowledge into ner system, aridamo academy won the best paper award
Andorid 开发三 (Intent)
Particle swarm optimization to solve the technical problems of TSP
Pyqt5 rapid development and practice 4.13 menu bar, toolbar and status bar and 4.14 qprinter
产品端数据分析思维
Inverse element & combinatorial number & fast power
Install mysql5.7 under centos7
ACM winter vacation training 4
GKCheckerboardNoiseSource
SQL Server 2016学习记录 --- 单表查询
Tensorflow knowledge points
Codeforces Round #614 (Div. 2) A. ConneR and the A.R.C. Markland-N
markdown转成word或者pdf
SQL Server 2016 learning records - connection query
Django celery redis send email asynchronously
RoboCup (2D) experiment 50 questions and the meaning of main functions
GKNoiseMap
GKCoherentNoiseSource
非关系型数据库MongoDB的特点及安装
20200229 training race L2 - 2 tree species Statistics (25 points)








