当前位置:网站首页>MapReduce instance (VII): single table join
MapReduce instance (VII): single table join
2022-07-06 09:33:00 【Laugh at Fengyun Road】
MR Realization Single table join
Hello everyone , I am Fengyun , Welcome to my blog perhaps WeChat official account 【 Laugh at Fengyun Road 】, In the days to come, let's learn about big data related technologies , Work hard together , Meet a better self !
Knowledge review
Distinguish Cartesian product , Natural join , Equivalent connection , Internal connection , External connection <= Review the basics of database
Realize the idea
Based on the buyer1(buyer_id,friends_id) Take table as an example to illustrate the experimental principle of single table connection . Single meter connection , The connection is from the left table buyer_id Column and right table friends_id Column , And the left table and the right table are the same table .
therefore , stay map The stage divides the read data into buyer_id and friends_id after , Will buyer_id Set to key,friends_id Set to value, Output directly and use it as the left table ; Then put the same pair buyer_id and friends_id Medium friends_id Set to key,buyer_id Set to value For the output , As the right table .
To distinguish between the left and right tables in the output , Need to output value Add the information in the left and right tables , For example value Of String Add characters at the beginning 1 Represents the left table , Add characters 2 Represents the right table . In this way map The left table and the right table are formed in the result of , then stay shuffle Complete the connection in the process .
reduce Received the result of the connection , Each of them key Of value-list contains "buyer_idfriends_id–friends_idbuyer_id" Relationship . Remove each key Of value-list To analyze , Put buyer_id Put an array , In the right table friends_id Put an array , then Find the Cartesian product of two arrays That's the end result .
Code writing
Map Code
public static class Map extends Mapper<Object,Text,Text,Text>{
// Realization map function
public void map(Object key,Text value,Context context)
throws IOException,InterruptedException{
String line = value.toString();
String[] arr = line.split("\t"); // Intercept by line
String mapkey=arr[0];
String mapvalue=arr[1];
String relationtype=new String(); // Left and right table marks
relationtype="1"; // Output left table
context.write(new Text(mapkey),new Text(relationtype+"+"+mapvalue));
//System.out.println(relationtype+"+"+mapvalue);
relationtype="2"; // Output the right table
context.write(new Text(mapvalue),new Text(relationtype+"+"+mapkey));
//System.out.println(relationtype+"+"+mapvalue);
}
}
Map It deals with a plain text file ,Mapper The data is processed by InputFormat Cut the data set into small data sets InputSplit, And use RecordReader It can be interpreted as <key/value> Yes, for map Function USES .map Function split(“\t”) Method to intercept each line of data , And store the data into the array arr[], hold arr[0] Assign a value to mapkey,arr[1] Assign a value to mapvalue. With two context Of write() Method output two copies of data , Then through the identifier relationtype by 1 or 2 For two copies of output data value Marking .
Reduce Code
public static class Reduce extends Reducer<Text, Text, Text, Text>{
// Realization reduce function
public void reduce(Text key,Iterable<Text> values,Context context)
throws IOException,InterruptedException{
int buyernum=0;
String[] buyer=new String[20];
int friendsnum=0;
String[] friends=new String[20];
Iterator ite=values.iterator();
while(ite.hasNext()){
String record=ite.next().toString();
int len=record.length();
int i=2;
if(0==len){
continue;
}
// Get the left and right table identification
char relationtype=record.charAt(0);
// Take out record, Put in buyer
if('1'==relationtype){
buyer [buyernum]=record.substring(i);
buyernum++;
}
// Take out record, Put in friends
if('2'==relationtype){
friends[friendsnum]=record.substring(i);
friendsnum++;
}
}
//buyernum and friendsnum Find Cartesian product of array
if(0!=buyernum&&0!=friendsnum){
for(int m=0;m<buyernum;m++){
for(int n=0;n<friendsnum;n++){
if(buyer[m]!=friends[n]){
// Output results
context.write(new Text(buyer[m]),new Text(friends[n]));
}
}
}
}
}
}
reduce The end is receiving map The data transmitted from the end has been the same key All of the value Put them all in one Iterator In the container values.reduce Function , First, create two new arrays buyer[] and friends[] For storage map Two copies of output data at the end . then Iterator In iteration hasNext() and Next() Method plus while Loop through the output values And assign it to record, use charAt(0) Method to get record The first character is assigned to relationtype, use if Determine if the relationtype by 1 Then use it substring(2) The method is subscripted 2 Began to intercept record Store it in buyer[] in , If relationtype by 2 Put the intercepted data into frindes[] Array . Then use two for Loop nested traversal output <key,value>, among key=buyer[m],value=friends[n].
Complete code
package mapreduce;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class DanJoin {
public static class Map extends Mapper<Object,Text,Text,Text>{
public void map(Object key,Text value,Context context)
throws IOException,InterruptedException{
String line = value.toString();
String[] arr = line.split("\t");
String mapkey=arr[0];
String mapvalue=arr[1];
String relationtype=new String();
relationtype="1";
context.write(new Text(mapkey),new Text(relationtype+"+"+mapvalue));
//System.out.println(relationtype+"+"+mapvalue);
relationtype="2";
context.write(new Text(mapvalue),new Text(relationtype+"+"+mapkey));
//System.out.println(relationtype+"+"+mapvalue);
}
}
public static class Reduce extends Reducer<Text, Text, Text, Text>{
public void reduce(Text key,Iterable<Text> values,Context context)
throws IOException,InterruptedException{
int buyernum=0;
String[] buyer=new String[20];
int friendsnum=0;
String[] friends=new String[20];
Iterator ite=values.iterator();
while(ite.hasNext()){
String record=ite.next().toString();
int len=record.length();
int i=2;
if(0==len){
continue;
}
char relationtype=record.charAt(0);
if('1'==relationtype){
buyer [buyernum]=record.substring(i);
buyernum++;
}
if('2'==relationtype){
friends[friendsnum]=record.substring(i);
friendsnum++;
}
}
if(0!=buyernum&&0!=friendsnum){
for(int m=0;m<buyernum;m++){
for(int n=0;n<friendsnum;n++){
if(buyer[m]!=friends[n]){
context.write(new Text(buyer[m]),new Text(friends[n]));
}
}
}
}
}
}
public static void main(String[] args) throws Exception{
Configuration conf=new Configuration();
String[] otherArgs=new String[2];
otherArgs[0]="hdfs://localhost:9000/mymapreduce7/in/buyer1";
otherArgs[1]="hdfs://localhost:9000/mymapreduce7/out";
Job job=new Job(conf," Table join");
job.setJarByClass(DanJoin.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
-------------- end ----------------
WeChat official account : Below scan QR code
or Search for Laugh at Fengyun Road
Focus on
边栏推荐
- Oom happened. Do you know the reason and how to solve it?
- 基于B/S的医院管理住院系统的研究与实现(附:源码 论文 sql文件)
- Redis' bitmap
- Redis之五大基础数据结构深入、应用场景
- Kratos ares microservice framework (II)
- Opencv+dlib realizes "matching" glasses for Mona Lisa
- Global and Chinese market of appointment reminder software 2022-2028: Research Report on technology, participants, trends, market size and share
- 【深度学习】语义分割:论文阅读(NeurIPS 2021)MaskFormer: per-pixel classification is not all you need
- [Yu Yue education] Wuhan University of science and technology securities investment reference
- Parameterization of postman
猜你喜欢
[oc]- < getting started with UI> -- common controls - prompt dialog box and wait for the prompt (circle)
基于B/S的影视创作论坛的设计与实现(附:源码 论文 sql文件 项目部署教程)
【深度學習】語義分割-源代碼匯總
Advanced Computer Network Review(4)——Congestion Control of MPTCP
面渣逆袭:Redis连环五十二问,图文详解,这下面试稳了
Pytest's collection use case rules and running specified use cases
Chapter 1 :Application of Artificial intelligence in Drug Design:Opportunity and Challenges
The carousel component of ant design calls prev and next methods in TS (typescript) environment
Sqlmap installation tutorial and problem explanation under Windows Environment -- "sqlmap installation | CSDN creation punch in"
In depth analysis and encapsulation call of requests
随机推荐
018.有效的回文
[deep learning] semantic segmentation: paper reading: (2021-12) mask2former
Global and Chinese market of linear regulators 2022-2028: Research Report on technology, participants, trends, market size and share
Webrtc blog reference:
基于WEB的网上购物系统的设计与实现(附:源码 论文 sql文件)
The carousel component of ant design calls prev and next methods in TS (typescript) environment
Selenium+pytest automated test framework practice (Part 2)
LeetCode41——First Missing Positive——hashing in place & swap
Chapter 1 :Application of Artificial intelligence in Drug Design:Opportunity and Challenges
Solve the problem of inconsistency between database field name and entity class attribute name (resultmap result set mapping)
Blue Bridge Cup_ Single chip microcomputer_ PWM output
Mapreduce实例(九):Reduce端join
【深度学习】语义分割-源代码汇总
Lua script of redis
发生OOM了,你知道是什么原因吗,又该怎么解决呢?
[daily question] Porter (DFS / DP)
【深度学习】语义分割:论文阅读(NeurIPS 2021)MaskFormer: per-pixel classification is not all you need
QML type: locale, date
Selenium+pytest automated test framework practice
In order to get an offer, "I believe that hard work will make great achievements