当前位置:网站首页>MapReduce instance (IX): reduce end join
MapReduce instance (IX): reduce end join
2022-07-06 09:33:00 【Laugh at Fengyun Road】
MR Realization Reduce End join
Hello everyone , I am Fengyun , Welcome to my blog perhaps WeChat official account 【 Laugh at Fengyun Road 】, In the days to come, let's learn about big data related technologies , Work hard together , Meet a better self !
Realization principle
stay Reudce End of Join The connection is MapReduce Frame between tables Join The most common mode of operation .
(1)Map The main work of the end , From different tables ( file ) Of key/value Label each other to distinguish records from different sources . Then use the connection field as key, The rest and the new logo as value, Finally, output .
(2)Reduce The main work of the end , stay Reduce End as connection field key The grouping of has been completed , We just need to put the records from different files in each group ( stay map The stage has been marked ) Separate , Finally, Descartes just ok 了 .
Reduce End connection ratio Map End connections are more common , Because in map The stage cannot get all the required join Field , namely : The same key The corresponding fields may be located in different locations map in , however Reduce The end connection efficiency is relatively low , Because all data must go through Shuffle The process .
SQL And in and exists Differentiation article <= Take a look back.
in And exists The choice of <= Take a look back.
generally speaking , The order of magnitude of the external circulation is small , Faster , Because of the outer complexity N, But if the index is used in the inner layer, it can be reduced to logM
A join B Also Cartesian product , Finally, keep the same result in the specified field (A Inner loop ,B Outer loop )
A in B, To calculate B, Then Cartesian product ,(A Inner loop ,B Outer loop )
A exist B, To calculate A, Then Cartesian product (B Inner loop ,A Outer loop )
not in Indexes are not used in both internal and external tables , and not exists You can use indexes , So the latter is better than the former in any case
Code writing
The procedure mainly consists of two parts :Map Part and Reduce part .
Map Code
public static class mymapper extends Mapper<Object, Text, Text, Text>{
@Override
protected void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String filePath = ((FileSplit)context.getInputSplit()).getPath().toString();
if (filePath.contains("orders1")) {
// Get line text content
String line = value.toString();
// Segment the line text content
String[] arr = line.split("\t");
/// Write the results
context.write(new Text(arr[0]), new Text( "1+" + arr[2]+"\t"+arr[3]));
System.out.println(arr[0] + "_1+" + arr[2]+"\t"+arr[3]);
}else if(filePath.contains("order_items1")) {
String line = value.toString();
String[] arr = line.split("\t");
context.write(new Text(arr[1]), new Text("2+" + arr[2]));
System.out.println(arr[1] + "_2+" + arr[2]);
}
}
}
Map It deals with a plain text file ,Mapper The data is processed by InputFormat Cut the data set into small data sets InputSplit, And use RecordReader It can be interpreted as <key,value> Yes, for map Function USES . stay map Function , First use getPath() Method to obtain fragments InputSplit And assign it to filePath,if Judge filePath If it contains goods.txt file name , Will map Function input value Value through Split(“\t”) Methods to segment , And goods_visit The same goods in the document id Field as key, Add "1+" As value. If if Judge filePath contain goods_visit.txt file name , The steps are the same as above , Just prefix other fields "2+" As value. Finally, put <key,value> adopt Context Of write Method output .
Reduce Code
public static class myreducer extends Reducer<Text, Text, Text, Text>{
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Vector<String> left = new Vector<String>(); // Used to store the data of the left table
Vector<String> right = new Vector<String>(); // Used to store the data of the right table
// Iterate over the set data
for (Text val : values) {
String str = val.toString();
// Add the data in the set to the corresponding left and right in
if (str.startsWith("1+")) {
left.add(str.substring(2));
}
else if (str.startsWith("2+")) {
right.add(str.substring(2));
}
}
// obtain left and right The length of the set
int sizeL = left.size();
int sizeR = right.size();
//System.out.println(key + "left:"+left);
//System.out.println(key + "right:"+right);
// Traverse two vectors and write the result
for (int i = 0; i < sizeL; i++) {
for (int j = 0; j < sizeR; j++) {
context.write( key, new Text( left.get(i) + "\t" + right.get(j) ) );
//System.out.println(key + " \t" + left.get(i) + "\t" + right.get(j));
}
}
}
}
map Function output <key,value> after shuffle take key All the same value Put it into an iterator to form values, And then <key,values> Key value pairs are passed to reduce function .reduce Function , First, create two new ones Vector aggregate , Used to store input values China and Israel "1+“ Beginning and "2+“ Initial data . Then use the enhanced version for Loop through and nest if Judge , If judgment values The elements in 1+ start , Through substring(2) Method to segment elements , The results are stored in left Collection , if values Inside element with 2+ start , Still use substring(2) Method to segment elements , The results are stored in right Collection . Finally, there are two nested for loop , Traverse the output <key,value>, The input is key Directly assigned to the output key, Output value by left +”\t”+right.
Complete code
package mapreduce;
import java.io.IOException;
import java.util.Iterator;
import java.util.Vector;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class ReduceJoin {
public static class mymapper extends Mapper<Object, Text, Text, Text>{
@Override
protected void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String filePath = ((FileSplit)context.getInputSplit()).getPath().toString();
if (filePath.contains("orders1")) {
String line = value.toString();
String[] arr = line.split("\t");
context.write(new Text(arr[0]), new Text( "1+" + arr[2]+"\t"+arr[3]));
//System.out.println(arr[0] + "_1+" + arr[2]+"\t"+arr[3]);
}else if(filePath.contains("order_items1")) {
String line = value.toString();
String[] arr = line.split("\t");
context.write(new Text(arr[1]), new Text("2+" + arr[2]));
//System.out.println(arr[1] + "_2+" + arr[2]);
}
}
}
public static class myreducer extends Reducer<Text, Text, Text, Text>{
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Vector<String> left = new Vector<String>();
Vector<String> right = new Vector<String>();
for (Text val : values) {
String str = val.toString();
if (str.startsWith("1+")) {
left.add(str.substring(2));
}
else if (str.startsWith("2+")) {
right.add(str.substring(2));
}
}
int sizeL = left.size();
int sizeR = right.size();
//System.out.println(key + "left:"+left);
//System.out.println(key + "right:"+right);
for (int i = 0; i < sizeL; i++) {
for (int j = 0; j < sizeR; j++) {
context.write( key, new Text( left.get(i) + "\t" + right.get(j) ) );
//System.out.println(key + " \t" + left.get(i) + "\t" + right.get(j));
}
}
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Job job = Job.getInstance();
job.setJobName("reducejoin");
job.setJarByClass(ReduceJoin.class);
job.setMapperClass(mymapper.class);
job.setReducerClass(myreducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
Path left = new Path("hdfs://localhost:9000/mymapreduce6/in/orders1");
Path right = new Path("hdfs://localhost:9000/mymapreduce6/in/order_items1");
Path out = new Path("hdfs://localhost:9000/mymapreduce6/out");
FileInputFormat.addInputPath(job, left);
FileInputFormat.addInputPath(job, right);
FileOutputFormat.setOutputPath(job, out);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
-------------- end ----------------
WeChat official account : Below scan QR code
or Search for Laugh at Fengyun Road
Focus on
边栏推荐
- QML control type: menu
- [daily question] Porter (DFS / DP)
- Redis之cluster集群
- How to intercept the string correctly (for example, intercepting the stock in operation by applying the error information)
- Redis之Bitmap
- Detailed explanation of cookies and sessions
- 英雄联盟轮播图手动轮播
- Research and implementation of hospital management inpatient system based on b/s (attached: source code paper SQL file)
- QML control type: Popup
- Le modèle sentinelle de redis
猜你喜欢
Design and implementation of film and television creation forum based on b/s (attached: source code paper SQL file project deployment tutorial)
Pytest's collection use case rules and running specified use cases
MapReduce工作机制
Redis之cluster集群
Advance Computer Network Review(1)——FatTree
Mapreduce实例(七):单表join
Solve the problem of inconsistency between database field name and entity class attribute name (resultmap result set mapping)
Redis之哨兵模式
[Yu Yue education] reference materials of complex variable function and integral transformation of Shenyang University of Technology
工作流—activiti7环境搭建
随机推荐
Blue Bridge Cup_ Single chip microcomputer_ PWM output
Publish and subscribe to redis
Advance Computer Network Review(1)——FatTree
QML control type: Popup
Lua script of redis
[shell script] - archive file script
QML control type: menu
Reids之缓存预热、雪崩、穿透
Pytest's collection use case rules and running specified use cases
六月刷题02——字符串
Chapter 1 :Application of Artificial intelligence in Drug Design:Opportunity and Challenges
Mysql database recovery (using mysqlbinlog command)
发生OOM了,你知道是什么原因吗,又该怎么解决呢?
Global and Chinese market of AVR series microcontrollers 2022-2028: Research Report on technology, participants, trends, market size and share
QML type: overlay
Mapreduce实例(八):Map端join
Libuv thread
In depth analysis and encapsulation call of requests
MapReduce instance (IV): natural sorting
Webrtc blog reference: