当前位置:网站首页>[teacher Zhao Yuqiang] calculate aggregation using MapReduce in mongodb
[teacher Zhao Yuqiang] calculate aggregation using MapReduce in mongodb
2022-07-03 05:45:00 【Teacher zhaoyuqiang】

MapReduce Can compute very complex aggregation logic , Very flexible , however ,MapReduce Very slow , It should not be used in real-time data analysis .MapReduce It can be used in multiple channels Server Execute in parallel , Each station Server Only part of it wordload, The final will be wordload Send to Master Server On the merger , Calculate the final result set , Back to the client .
MapReduce The basic idea of , As shown in the figure below :

In this case , Let's take a summation example . First, execute Map Stage , Divide a big task into several small tasks , Each small task runs on a different node , To support distributed computing , This stage is called Map( As shown in the blue box ); The output of each small task is calculated again , Finally, we get the result 55, This stage is called Reduce( As shown in the red box ).
Use MapReduce How to calculate aggregation , There are three main steps :Map,Shuffle( Put together ) and Reduce,Map and Reduce You need to explicitly define ,shuffle from MongoDB To achieve .
- Map: Map operations to each doc, produce Key and Value
- Shuffle: according to Key Grouping , And will key same Value Combine them into arrays
- Reduce: hold Value The array is reduced to a single value
Let's take the following test data ( Employee data ) For example , Let's show you .
db.emp.insert(
[
{_id:7369,ename:'SMITH' ,job:'CLERK' ,mgr:7902,hiredate:'17-12-80',sal:800,comm:0,deptno:20},
{_id:7499,ename:'ALLEN' ,job:'SALESMAN' ,mgr:7698,hiredate:'20-02-81',sal:1600,comm:300 ,deptno:30},
{_id:7521,ename:'WARD' ,job:'SALESMAN' ,mgr:7698,hiredate:'22-02-81',sal:1250,comm:500 ,deptno:30},
{_id:7566,ename:'JONES' ,job:'MANAGER' ,mgr:7839,hiredate:'02-04-81',sal:2975,comm:0,deptno:20},
{_id:7654,ename:'MARTIN',job:'SALESMAN' ,mgr:7698,hiredate:'28-09-81',sal:1250,comm:1400,deptno:30},
{_id:7698,ename:'BLAKE' ,job:'MANAGER' ,mgr:7839,hiredate:'01-05-81',sal:2850,comm:0,deptno:30},
{_id:7782,ename:'CLARK' ,job:'MANAGER' ,mgr:7839,hiredate:'09-06-81',sal:2450,comm:0,deptno:10},
{_id:7788,ename:'SCOTT' ,job:'ANALYST' ,mgr:7566,hiredate:'19-04-87',sal:3000,comm:0,deptno:20},
{_id:7839,ename:'KING' ,job:'PRESIDENT',mgr:0,hiredate:'17-11-81',sal:5000,comm:0,deptno:10},
{_id:7844,ename:'TURNER',job:'SALESMAN' ,mgr:7698,hiredate:'08-09-81',sal:1500,comm:0,deptno:30},
{_id:7876,ename:'ADAMS' ,job:'CLERK' ,mgr:7788,hiredate:'23-05-87',sal:1100,comm:0,deptno:20},
{_id:7900,ename:'JAMES' ,job:'CLERK' ,mgr:7698,hiredate:'03-12-81',sal:950,comm:0,deptno:30},
{_id:7902,ename:'FORD' ,job:'ANALYST' ,mgr:7566,hiredate:'03-12-81',sal:3000,comm:0,deptno:20},
{_id:7934,ename:'MILLER',job:'CLERK' ,mgr:7782,hiredate:'23-01-82',sal:1300,comm:0,deptno:10}
]
);
( Case a ) Ask for employees in the table , The number of people in each position
var map1=function(){emit(this.job,1)}
var reduce1=function(job,count){return Array.sum(count)}
db.emp.mapReduce(map1,reduce1,{out:"mrdemo1"})
( Case 2 ) Ask for employees in the table , The sum of the salaries in each department
var map2=function(){emit(this.deptno,this.sal)}
var reduce2=function(deptno,sal){return Array.sum(sal)}
db.emp.mapReduce(map2,reduce2,{out:"mrdemo2"})
( Case three )Troubleshoot the Map Function
Define your own emit function :
var emit = function(key, value) {
print("emit");
print("key: " + key + " value: " + tojson(value));
}
Test a piece of data :
emp7839=db.emp.findOne({_id:7839})
map2.apply(emp7839)
Output the following results :
emit
key: 10 value: 5000
Test multiple data :
var myCursor=db.emp.find()
while (myCursor.hasNext()) {
var doc = myCursor.next();
print ("document _id= " + tojson(doc._id));
map2.apply(doc);
print();
}
( Case four )Troubleshoot the Reduce Function
A simple test case
var myTestValues = [ 5, 5, 10 ];
var reduce1=function(key,values){return Array.sum(values)}
reduce1("mykey",myTestValues)
test :Reduce Of value Contains multiple values
Test data : salary 、 Bonus :
var myTestObjects = [
{ sal: 1000, comm: 5 },
{ sal: 2000, comm: 10 },
{ sal: 3000, comm: 15 }
];
Development reduce Method :
var reduce2=function(key,values) {
reducedValue = { sal: 0, comm: 0 };
for(var i=0;i<values.length;i++) {
reducedValue.sal += values[i].sal;
reducedValue.comm += values[i].comm;
}
return reducedValue;
}
test :
reduce2("aa",myTestObjects)
边栏推荐
猜你喜欢

Progressive multi grasp detection using grasp path for rgbd images

Altaro o365 total backup subscription plan

Analysis of the example of network subnet division in secondary vocational school

配置xml文件的dtd

How to install and configure altaro VM backup for VMware vSphere

DEX net 2.0 for crawl detection

Sophomore dilemma (resumption)
![[minesweeping of two-dimensional array application] | [simple version] [detailed steps + code]](/img/b0/aa5dce0bb60c50eea907de9e127d6c.jpg)
[minesweeping of two-dimensional array application] | [simple version] [detailed steps + code]

Error 1045 (28000) occurs when Linux logs in MySQL: access denied for user 'root' @ 'localhost' (using password: yes)

Notepad++ wrap by specified character
随机推荐
Basic introduction of redis and explanation of eight types and transactions
[advanced pointer (2)] | [function pointer, function pointer array, callback function] key analysis + code explanation
一起上水硕系列】Day 9
大二困局(复盘)
【无标题】
Redis使用Lua脚本简介
MySQL startup error: several solutions to the server quit without updating PID file
一起上水碩系列】Day 9
Classification and discussion of plane grab detection methods based on learning
Installing altaro VM backup
2022.7.2day594
Disassembly and installation of Lenovo r7000 graphics card
Pytorch through load_ state_ Dict load weight
Introduction to redis using Lua script
Source insight operation manual installation trial
Redis encountered noauth authentication required
EMD distance - example of use
Crontab command usage
redis 无法远程连接问题。
聊聊如何利用p6spy进行sql监控
