当前位置:网站首页>MapReduce instance (VII): single table join
MapReduce instance (VII): single table join
2022-07-06 09:33:00 【Laugh at Fengyun Road】
MR Realization Single table join
Hello everyone , I am Fengyun , Welcome to my blog perhaps WeChat official account 【 Laugh at Fengyun Road 】, In the days to come, let's learn about big data related technologies , Work hard together , Meet a better self !
Knowledge review
Distinguish Cartesian product , Natural join , Equivalent connection , Internal connection , External connection <= Review the basics of database
Realize the idea
Based on the buyer1(buyer_id,friends_id) Take table as an example to illustrate the experimental principle of single table connection . Single meter connection , The connection is from the left table buyer_id Column and right table friends_id Column , And the left table and the right table are the same table .
therefore , stay map The stage divides the read data into buyer_id and friends_id after , Will buyer_id Set to key,friends_id Set to value, Output directly and use it as the left table ; Then put the same pair buyer_id and friends_id Medium friends_id Set to key,buyer_id Set to value For the output , As the right table .
To distinguish between the left and right tables in the output , Need to output value Add the information in the left and right tables , For example value Of String Add characters at the beginning 1 Represents the left table , Add characters 2 Represents the right table . In this way map The left table and the right table are formed in the result of , then stay shuffle Complete the connection in the process .
reduce Received the result of the connection , Each of them key Of value-list contains "buyer_idfriends_id–friends_idbuyer_id" Relationship . Remove each key Of value-list To analyze , Put buyer_id Put an array , In the right table friends_id Put an array , then Find the Cartesian product of two arrays That's the end result .
Code writing
Map Code
public static class Map extends Mapper<Object,Text,Text,Text>{
// Realization map function
public void map(Object key,Text value,Context context)
throws IOException,InterruptedException{
String line = value.toString();
String[] arr = line.split("\t"); // Intercept by line
String mapkey=arr[0];
String mapvalue=arr[1];
String relationtype=new String(); // Left and right table marks
relationtype="1"; // Output left table
context.write(new Text(mapkey),new Text(relationtype+"+"+mapvalue));
//System.out.println(relationtype+"+"+mapvalue);
relationtype="2"; // Output the right table
context.write(new Text(mapvalue),new Text(relationtype+"+"+mapkey));
//System.out.println(relationtype+"+"+mapvalue);
}
}
Map It deals with a plain text file ,Mapper The data is processed by InputFormat Cut the data set into small data sets InputSplit, And use RecordReader It can be interpreted as <key/value> Yes, for map Function USES .map Function split(“\t”) Method to intercept each line of data , And store the data into the array arr[], hold arr[0] Assign a value to mapkey,arr[1] Assign a value to mapvalue. With two context Of write() Method output two copies of data , Then through the identifier relationtype by 1 or 2 For two copies of output data value Marking .
Reduce Code
public static class Reduce extends Reducer<Text, Text, Text, Text>{
// Realization reduce function
public void reduce(Text key,Iterable<Text> values,Context context)
throws IOException,InterruptedException{
int buyernum=0;
String[] buyer=new String[20];
int friendsnum=0;
String[] friends=new String[20];
Iterator ite=values.iterator();
while(ite.hasNext()){
String record=ite.next().toString();
int len=record.length();
int i=2;
if(0==len){
continue;
}
// Get the left and right table identification
char relationtype=record.charAt(0);
// Take out record, Put in buyer
if('1'==relationtype){
buyer [buyernum]=record.substring(i);
buyernum++;
}
// Take out record, Put in friends
if('2'==relationtype){
friends[friendsnum]=record.substring(i);
friendsnum++;
}
}
//buyernum and friendsnum Find Cartesian product of array
if(0!=buyernum&&0!=friendsnum){
for(int m=0;m<buyernum;m++){
for(int n=0;n<friendsnum;n++){
if(buyer[m]!=friends[n]){
// Output results
context.write(new Text(buyer[m]),new Text(friends[n]));
}
}
}
}
}
}
reduce The end is receiving map The data transmitted from the end has been the same key All of the value Put them all in one Iterator In the container values.reduce Function , First, create two new arrays buyer[] and friends[] For storage map Two copies of output data at the end . then Iterator In iteration hasNext() and Next() Method plus while Loop through the output values And assign it to record, use charAt(0) Method to get record The first character is assigned to relationtype, use if Determine if the relationtype by 1 Then use it substring(2) The method is subscripted 2 Began to intercept record Store it in buyer[] in , If relationtype by 2 Put the intercepted data into frindes[] Array . Then use two for Loop nested traversal output <key,value>, among key=buyer[m],value=friends[n].
Complete code
package mapreduce;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class DanJoin {
public static class Map extends Mapper<Object,Text,Text,Text>{
public void map(Object key,Text value,Context context)
throws IOException,InterruptedException{
String line = value.toString();
String[] arr = line.split("\t");
String mapkey=arr[0];
String mapvalue=arr[1];
String relationtype=new String();
relationtype="1";
context.write(new Text(mapkey),new Text(relationtype+"+"+mapvalue));
//System.out.println(relationtype+"+"+mapvalue);
relationtype="2";
context.write(new Text(mapvalue),new Text(relationtype+"+"+mapkey));
//System.out.println(relationtype+"+"+mapvalue);
}
}
public static class Reduce extends Reducer<Text, Text, Text, Text>{
public void reduce(Text key,Iterable<Text> values,Context context)
throws IOException,InterruptedException{
int buyernum=0;
String[] buyer=new String[20];
int friendsnum=0;
String[] friends=new String[20];
Iterator ite=values.iterator();
while(ite.hasNext()){
String record=ite.next().toString();
int len=record.length();
int i=2;
if(0==len){
continue;
}
char relationtype=record.charAt(0);
if('1'==relationtype){
buyer [buyernum]=record.substring(i);
buyernum++;
}
if('2'==relationtype){
friends[friendsnum]=record.substring(i);
friendsnum++;
}
}
if(0!=buyernum&&0!=friendsnum){
for(int m=0;m<buyernum;m++){
for(int n=0;n<friendsnum;n++){
if(buyer[m]!=friends[n]){
context.write(new Text(buyer[m]),new Text(friends[n]));
}
}
}
}
}
}
public static void main(String[] args) throws Exception{
Configuration conf=new Configuration();
String[] otherArgs=new String[2];
otherArgs[0]="hdfs://localhost:9000/mymapreduce7/in/buyer1";
otherArgs[1]="hdfs://localhost:9000/mymapreduce7/out";
Job job=new Job(conf," Table join");
job.setJarByClass(DanJoin.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
-------------- end ----------------
WeChat official account : Below scan QR code
or Search for Laugh at Fengyun Road
Focus on
边栏推荐
- Scoped in webrtc_ refptr
- Leetcode problem solving 2.1.1
- Redis之Bitmap
- Design and implementation of film and television creation forum based on b/s (attached: source code paper SQL file project deployment tutorial)
- Global and Chinese market of linear regulators 2022-2028: Research Report on technology, participants, trends, market size and share
- Kratos ares microservice framework (I)
- Sentinel mode of redis
- What is an R-value reference and what is the difference between it and an l-value?
- LeetCode41——First Missing Positive——hashing in place & swap
- Global and Chinese market of airport kiosks 2022-2028: Research Report on technology, participants, trends, market size and share
猜你喜欢
Improved deep embedded clustering with local structure preservation (Idec)
Design and implementation of film and television creation forum based on b/s (attached: source code paper SQL file project deployment tutorial)
Redis之核心配置
Blue Bridge Cup_ Single chip microcomputer_ Measure the frequency of 555
Use of activiti7 workflow
Redis之五大基础数据结构深入、应用场景
Pytest's collection use case rules and running specified use cases
【shell脚本】——归档文件脚本
IDS cache preheating, avalanche, penetration
QML control type: Popup
随机推荐
Redis cluster
Redis之五大基础数据结构深入、应用场景
Webrtc blog reference:
Redis之主从复制
Global and Chinese markets for small seed seeders 2022-2028: Research Report on technology, participants, trends, market size and share
One article read, DDD landing database design practice
Publish and subscribe to redis
Mapreduce实例(六):倒排索引
【shell脚本】——归档文件脚本
基于B/S的影视创作论坛的设计与实现(附:源码 论文 sql文件 项目部署教程)
Detailed explanation of cookies and sessions
Vs All comments and uncomments
MapReduce instance (V): secondary sorting
[Yu Yue education] reference materials of complex variable function and integral transformation of Shenyang University of Technology
[three storage methods of graph] just use adjacency matrix to go out
Design and implementation of online shopping system based on Web (attached: source code paper SQL file)
Advanced Computer Network Review(3)——BBR
Mapreduce实例(四):自然排序
【shell脚本】使用菜单命令构建在集群内创建文件夹的脚本
Master slave replication of redis