当前位置:网站首页>MapReduce instance (VII): single table join
MapReduce instance (VII): single table join
2022-07-06 09:33:00 【Laugh at Fengyun Road】
MR Realization Single table join
Hello everyone , I am Fengyun , Welcome to my blog perhaps WeChat official account 【 Laugh at Fengyun Road 】, In the days to come, let's learn about big data related technologies , Work hard together , Meet a better self !
Knowledge review
Distinguish Cartesian product , Natural join , Equivalent connection , Internal connection , External connection <= Review the basics of database
Realize the idea
Based on the buyer1(buyer_id,friends_id) Take table as an example to illustrate the experimental principle of single table connection . Single meter connection , The connection is from the left table buyer_id Column and right table friends_id Column , And the left table and the right table are the same table .
therefore , stay map The stage divides the read data into buyer_id and friends_id after , Will buyer_id Set to key,friends_id Set to value, Output directly and use it as the left table ; Then put the same pair buyer_id and friends_id Medium friends_id Set to key,buyer_id Set to value For the output , As the right table .
To distinguish between the left and right tables in the output , Need to output value Add the information in the left and right tables , For example value Of String Add characters at the beginning 1 Represents the left table , Add characters 2 Represents the right table . In this way map The left table and the right table are formed in the result of , then stay shuffle Complete the connection in the process .
reduce Received the result of the connection , Each of them key Of value-list contains "buyer_idfriends_id–friends_idbuyer_id" Relationship . Remove each key Of value-list To analyze , Put buyer_id Put an array , In the right table friends_id Put an array , then Find the Cartesian product of two arrays That's the end result .
Code writing
Map Code
public static class Map extends Mapper<Object,Text,Text,Text>{
// Realization map function
public void map(Object key,Text value,Context context)
throws IOException,InterruptedException{
String line = value.toString();
String[] arr = line.split("\t"); // Intercept by line
String mapkey=arr[0];
String mapvalue=arr[1];
String relationtype=new String(); // Left and right table marks
relationtype="1"; // Output left table
context.write(new Text(mapkey),new Text(relationtype+"+"+mapvalue));
//System.out.println(relationtype+"+"+mapvalue);
relationtype="2"; // Output the right table
context.write(new Text(mapvalue),new Text(relationtype+"+"+mapkey));
//System.out.println(relationtype+"+"+mapvalue);
}
}
Map It deals with a plain text file ,Mapper The data is processed by InputFormat Cut the data set into small data sets InputSplit, And use RecordReader It can be interpreted as <key/value> Yes, for map Function USES .map Function split(“\t”) Method to intercept each line of data , And store the data into the array arr[], hold arr[0] Assign a value to mapkey,arr[1] Assign a value to mapvalue. With two context Of write() Method output two copies of data , Then through the identifier relationtype by 1 or 2 For two copies of output data value Marking .
Reduce Code
public static class Reduce extends Reducer<Text, Text, Text, Text>{
// Realization reduce function
public void reduce(Text key,Iterable<Text> values,Context context)
throws IOException,InterruptedException{
int buyernum=0;
String[] buyer=new String[20];
int friendsnum=0;
String[] friends=new String[20];
Iterator ite=values.iterator();
while(ite.hasNext()){
String record=ite.next().toString();
int len=record.length();
int i=2;
if(0==len){
continue;
}
// Get the left and right table identification
char relationtype=record.charAt(0);
// Take out record, Put in buyer
if('1'==relationtype){
buyer [buyernum]=record.substring(i);
buyernum++;
}
// Take out record, Put in friends
if('2'==relationtype){
friends[friendsnum]=record.substring(i);
friendsnum++;
}
}
//buyernum and friendsnum Find Cartesian product of array
if(0!=buyernum&&0!=friendsnum){
for(int m=0;m<buyernum;m++){
for(int n=0;n<friendsnum;n++){
if(buyer[m]!=friends[n]){
// Output results
context.write(new Text(buyer[m]),new Text(friends[n]));
}
}
}
}
}
}
reduce The end is receiving map The data transmitted from the end has been the same key All of the value Put them all in one Iterator In the container values.reduce Function , First, create two new arrays buyer[] and friends[] For storage map Two copies of output data at the end . then Iterator In iteration hasNext() and Next() Method plus while Loop through the output values And assign it to record, use charAt(0) Method to get record The first character is assigned to relationtype, use if Determine if the relationtype by 1 Then use it substring(2) The method is subscripted 2 Began to intercept record Store it in buyer[] in , If relationtype by 2 Put the intercepted data into frindes[] Array . Then use two for Loop nested traversal output <key,value>, among key=buyer[m],value=friends[n].
Complete code
package mapreduce;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class DanJoin {
public static class Map extends Mapper<Object,Text,Text,Text>{
public void map(Object key,Text value,Context context)
throws IOException,InterruptedException{
String line = value.toString();
String[] arr = line.split("\t");
String mapkey=arr[0];
String mapvalue=arr[1];
String relationtype=new String();
relationtype="1";
context.write(new Text(mapkey),new Text(relationtype+"+"+mapvalue));
//System.out.println(relationtype+"+"+mapvalue);
relationtype="2";
context.write(new Text(mapvalue),new Text(relationtype+"+"+mapkey));
//System.out.println(relationtype+"+"+mapvalue);
}
}
public static class Reduce extends Reducer<Text, Text, Text, Text>{
public void reduce(Text key,Iterable<Text> values,Context context)
throws IOException,InterruptedException{
int buyernum=0;
String[] buyer=new String[20];
int friendsnum=0;
String[] friends=new String[20];
Iterator ite=values.iterator();
while(ite.hasNext()){
String record=ite.next().toString();
int len=record.length();
int i=2;
if(0==len){
continue;
}
char relationtype=record.charAt(0);
if('1'==relationtype){
buyer [buyernum]=record.substring(i);
buyernum++;
}
if('2'==relationtype){
friends[friendsnum]=record.substring(i);
friendsnum++;
}
}
if(0!=buyernum&&0!=friendsnum){
for(int m=0;m<buyernum;m++){
for(int n=0;n<friendsnum;n++){
if(buyer[m]!=friends[n]){
context.write(new Text(buyer[m]),new Text(friends[n]));
}
}
}
}
}
}
public static void main(String[] args) throws Exception{
Configuration conf=new Configuration();
String[] otherArgs=new String[2];
otherArgs[0]="hdfs://localhost:9000/mymapreduce7/in/buyer1";
otherArgs[1]="hdfs://localhost:9000/mymapreduce7/out";
Job job=new Job(conf," Table join");
job.setJarByClass(DanJoin.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
-------------- end ----------------
WeChat official account : Below scan QR code
or Search for Laugh at Fengyun Road
Focus on
边栏推荐
- Redis分布式锁实现Redisson 15问
- 【深度学习】语义分割-源代码汇总
- Kratos ares microservice framework (I)
- CAP理论
- Redis之发布订阅
- Servlet learning diary 8 - servlet life cycle and thread safety
- Redis之五大基础数据结构深入、应用场景
- Redis geospatial
- Global and Chinese markets for hardware based encryption 2022-2028: Research Report on technology, participants, trends, market size and share
- How to intercept the string correctly (for example, intercepting the stock in operation by applying the error information)
猜你喜欢
Redis之Lua脚本
[Yu Yue education] reference materials of complex variable function and integral transformation of Shenyang University of Technology
[oc]- < getting started with UI> -- common controls - prompt dialog box and wait for the prompt (circle)
Once you change the test steps, write all the code. Why not try yaml to realize data-driven?
Parameterization of postman
Sqlmap installation tutorial and problem explanation under Windows Environment -- "sqlmap installation | CSDN creation punch in"
Kratos ares microservice framework (II)
IDS cache preheating, avalanche, penetration
Sentinel mode of redis
发生OOM了,你知道是什么原因吗,又该怎么解决呢?
随机推荐
Selenium+pytest automated test framework practice
go-redis之初始化連接
Go redis initialization connection
解决小文件处过多
[deep learning] semantic segmentation - source code summary
【深度学习】语义分割:论文阅读:(2021-12)Mask2Former
【深度学习】语义分割:论文阅读:(CVPR 2022) MPViT(CNN+Transformer):用于密集预测的多路径视觉Transformer
IDS' deletion policy
Redis cluster
AcWing 2456. Notepad
The carousel component of ant design calls prev and next methods in TS (typescript) environment
Global and Chinese market for annunciator panels 2022-2028: Research Report on technology, participants, trends, market size and share
Mapreduce实例(八):Map端join
Redis之持久化实操(Linux版)
Minio distributed file storage cluster for full stack development
Global and Chinese market of capacitive displacement sensors 2022-2028: Research Report on technology, participants, trends, market size and share
Reids之删除策略
Global and Chinese market of metallized flexible packaging 2022-2028: Research Report on technology, participants, trends, market size and share
Global and Chinese market of appointment reminder software 2022-2028: Research Report on technology, participants, trends, market size and share
Mapreduce实例(十):ChainMapReduce