当前位置:网站首页>MapReduce instance (VII): single table join
MapReduce instance (VII): single table join
2022-07-06 09:33:00 【Laugh at Fengyun Road】
MR Realization Single table join
Hello everyone , I am Fengyun , Welcome to my blog perhaps WeChat official account 【 Laugh at Fengyun Road 】, In the days to come, let's learn about big data related technologies , Work hard together , Meet a better self !
Knowledge review
Distinguish Cartesian product , Natural join , Equivalent connection , Internal connection , External connection <= Review the basics of database
Realize the idea
Based on the buyer1(buyer_id,friends_id) Take table as an example to illustrate the experimental principle of single table connection . Single meter connection , The connection is from the left table buyer_id Column and right table friends_id Column , And the left table and the right table are the same table .
therefore , stay map The stage divides the read data into buyer_id and friends_id after , Will buyer_id Set to key,friends_id Set to value, Output directly and use it as the left table ; Then put the same pair buyer_id and friends_id Medium friends_id Set to key,buyer_id Set to value For the output , As the right table .
To distinguish between the left and right tables in the output , Need to output value Add the information in the left and right tables , For example value Of String Add characters at the beginning 1 Represents the left table , Add characters 2 Represents the right table . In this way map The left table and the right table are formed in the result of , then stay shuffle Complete the connection in the process .
reduce Received the result of the connection , Each of them key Of value-list contains "buyer_idfriends_id–friends_idbuyer_id" Relationship . Remove each key Of value-list To analyze , Put buyer_id Put an array , In the right table friends_id Put an array , then Find the Cartesian product of two arrays That's the end result .
Code writing
Map Code
public static class Map extends Mapper<Object,Text,Text,Text>{
// Realization map function
public void map(Object key,Text value,Context context)
throws IOException,InterruptedException{
String line = value.toString();
String[] arr = line.split("\t"); // Intercept by line
String mapkey=arr[0];
String mapvalue=arr[1];
String relationtype=new String(); // Left and right table marks
relationtype="1"; // Output left table
context.write(new Text(mapkey),new Text(relationtype+"+"+mapvalue));
//System.out.println(relationtype+"+"+mapvalue);
relationtype="2"; // Output the right table
context.write(new Text(mapvalue),new Text(relationtype+"+"+mapkey));
//System.out.println(relationtype+"+"+mapvalue);
}
}
Map It deals with a plain text file ,Mapper The data is processed by InputFormat Cut the data set into small data sets InputSplit, And use RecordReader It can be interpreted as <key/value> Yes, for map Function USES .map Function split(“\t”) Method to intercept each line of data , And store the data into the array arr[], hold arr[0] Assign a value to mapkey,arr[1] Assign a value to mapvalue. With two context Of write() Method output two copies of data , Then through the identifier relationtype by 1 or 2 For two copies of output data value Marking .
Reduce Code
public static class Reduce extends Reducer<Text, Text, Text, Text>{
// Realization reduce function
public void reduce(Text key,Iterable<Text> values,Context context)
throws IOException,InterruptedException{
int buyernum=0;
String[] buyer=new String[20];
int friendsnum=0;
String[] friends=new String[20];
Iterator ite=values.iterator();
while(ite.hasNext()){
String record=ite.next().toString();
int len=record.length();
int i=2;
if(0==len){
continue;
}
// Get the left and right table identification
char relationtype=record.charAt(0);
// Take out record, Put in buyer
if('1'==relationtype){
buyer [buyernum]=record.substring(i);
buyernum++;
}
// Take out record, Put in friends
if('2'==relationtype){
friends[friendsnum]=record.substring(i);
friendsnum++;
}
}
//buyernum and friendsnum Find Cartesian product of array
if(0!=buyernum&&0!=friendsnum){
for(int m=0;m<buyernum;m++){
for(int n=0;n<friendsnum;n++){
if(buyer[m]!=friends[n]){
// Output results
context.write(new Text(buyer[m]),new Text(friends[n]));
}
}
}
}
}
}
reduce The end is receiving map The data transmitted from the end has been the same key All of the value Put them all in one Iterator In the container values.reduce Function , First, create two new arrays buyer[] and friends[] For storage map Two copies of output data at the end . then Iterator In iteration hasNext() and Next() Method plus while Loop through the output values And assign it to record, use charAt(0) Method to get record The first character is assigned to relationtype, use if Determine if the relationtype by 1 Then use it substring(2) The method is subscripted 2 Began to intercept record Store it in buyer[] in , If relationtype by 2 Put the intercepted data into frindes[] Array . Then use two for Loop nested traversal output <key,value>, among key=buyer[m],value=friends[n].
Complete code
package mapreduce;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class DanJoin {
public static class Map extends Mapper<Object,Text,Text,Text>{
public void map(Object key,Text value,Context context)
throws IOException,InterruptedException{
String line = value.toString();
String[] arr = line.split("\t");
String mapkey=arr[0];
String mapvalue=arr[1];
String relationtype=new String();
relationtype="1";
context.write(new Text(mapkey),new Text(relationtype+"+"+mapvalue));
//System.out.println(relationtype+"+"+mapvalue);
relationtype="2";
context.write(new Text(mapvalue),new Text(relationtype+"+"+mapkey));
//System.out.println(relationtype+"+"+mapvalue);
}
}
public static class Reduce extends Reducer<Text, Text, Text, Text>{
public void reduce(Text key,Iterable<Text> values,Context context)
throws IOException,InterruptedException{
int buyernum=0;
String[] buyer=new String[20];
int friendsnum=0;
String[] friends=new String[20];
Iterator ite=values.iterator();
while(ite.hasNext()){
String record=ite.next().toString();
int len=record.length();
int i=2;
if(0==len){
continue;
}
char relationtype=record.charAt(0);
if('1'==relationtype){
buyer [buyernum]=record.substring(i);
buyernum++;
}
if('2'==relationtype){
friends[friendsnum]=record.substring(i);
friendsnum++;
}
}
if(0!=buyernum&&0!=friendsnum){
for(int m=0;m<buyernum;m++){
for(int n=0;n<friendsnum;n++){
if(buyer[m]!=friends[n]){
context.write(new Text(buyer[m]),new Text(friends[n]));
}
}
}
}
}
}
public static void main(String[] args) throws Exception{
Configuration conf=new Configuration();
String[] otherArgs=new String[2];
otherArgs[0]="hdfs://localhost:9000/mymapreduce7/in/buyer1";
otherArgs[1]="hdfs://localhost:9000/mymapreduce7/out";
Job job=new Job(conf," Table join");
job.setJarByClass(DanJoin.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
-------------- end ----------------
WeChat official account : Below scan QR code
or Search for Laugh at Fengyun Road
Focus on
边栏推荐
- Go redis initialization connection
- Kratos ares microservice framework (I)
- Compilation of libwebsocket
- 068.查找插入位置--二分查找
- MySQL数据库优化的几种方式(笔面试必问)
- The five basic data structures of redis are in-depth and application scenarios
- Scoped in webrtc_ refptr
- Parameterization of postman
- IDS' deletion policy
- Selenium+pytest automated test framework practice
猜你喜欢
软件负载均衡和硬件负载均衡的选择
【图的三大存储方式】只会用邻接矩阵就out了
In order to get an offer, "I believe that hard work will make great achievements
Advanced Computer Network Review(3)——BBR
发生OOM了,你知道是什么原因吗,又该怎么解决呢?
Redis' performance indicators and monitoring methods
Redis cluster
Lua script of redis
Mathematical modeling 2004b question (transmission problem)
Mapreduce实例(六):倒排索引
随机推荐
The five basic data structures of redis are in-depth and application scenarios
In order to get an offer, "I believe that hard work will make great achievements
Activiti7工作流的使用
[shell script] - archive file script
Kratos战神微服务框架(二)
Opencv+dlib realizes "matching" glasses for Mona Lisa
Blue Bridge Cup_ Single chip microcomputer_ PWM output
基于B/S的医院管理住院系统的研究与实现(附:源码 论文 sql文件)
Mapreduce实例(八):Map端join
Global and Chinese market of electric pruners 2022-2028: Research Report on technology, participants, trends, market size and share
Workflow - activiti7 environment setup
Five layer network architecture
Appears when importing MySQL
基于WEB的网上购物系统的设计与实现(附:源码 论文 sql文件)
Compilation of libwebsocket
[Chongqing Guangdong education] reference materials for nine lectures on the essence of Marxist Philosophy in Wuhan University
Research and implementation of hospital management inpatient system based on b/s (attached: source code paper SQL file)
CSP student queue
Nacos installation and service registration
【深度學習】語義分割-源代碼匯總