当前位置:网站首页>[teacher Zhao Yuqiang] calculate aggregation using MapReduce in mongodb
[teacher Zhao Yuqiang] calculate aggregation using MapReduce in mongodb
2022-07-03 05:45:00 【Teacher zhaoyuqiang】

MapReduce Can compute very complex aggregation logic , Very flexible , however ,MapReduce Very slow , It should not be used in real-time data analysis .MapReduce It can be used in multiple channels Server Execute in parallel , Each station Server Only part of it wordload, The final will be wordload Send to Master Server On the merger , Calculate the final result set , Back to the client .
MapReduce The basic idea of , As shown in the figure below :

In this case , Let's take a summation example . First, execute Map Stage , Divide a big task into several small tasks , Each small task runs on a different node , To support distributed computing , This stage is called Map( As shown in the blue box ); The output of each small task is calculated again , Finally, we get the result 55, This stage is called Reduce( As shown in the red box ).
Use MapReduce How to calculate aggregation , There are three main steps :Map,Shuffle( Put together ) and Reduce,Map and Reduce You need to explicitly define ,shuffle from MongoDB To achieve .
- Map: Map operations to each doc, produce Key and Value
- Shuffle: according to Key Grouping , And will key same Value Combine them into arrays
- Reduce: hold Value The array is reduced to a single value
Let's take the following test data ( Employee data ) For example , Let's show you .
db.emp.insert(
[
{_id:7369,ename:'SMITH' ,job:'CLERK' ,mgr:7902,hiredate:'17-12-80',sal:800,comm:0,deptno:20},
{_id:7499,ename:'ALLEN' ,job:'SALESMAN' ,mgr:7698,hiredate:'20-02-81',sal:1600,comm:300 ,deptno:30},
{_id:7521,ename:'WARD' ,job:'SALESMAN' ,mgr:7698,hiredate:'22-02-81',sal:1250,comm:500 ,deptno:30},
{_id:7566,ename:'JONES' ,job:'MANAGER' ,mgr:7839,hiredate:'02-04-81',sal:2975,comm:0,deptno:20},
{_id:7654,ename:'MARTIN',job:'SALESMAN' ,mgr:7698,hiredate:'28-09-81',sal:1250,comm:1400,deptno:30},
{_id:7698,ename:'BLAKE' ,job:'MANAGER' ,mgr:7839,hiredate:'01-05-81',sal:2850,comm:0,deptno:30},
{_id:7782,ename:'CLARK' ,job:'MANAGER' ,mgr:7839,hiredate:'09-06-81',sal:2450,comm:0,deptno:10},
{_id:7788,ename:'SCOTT' ,job:'ANALYST' ,mgr:7566,hiredate:'19-04-87',sal:3000,comm:0,deptno:20},
{_id:7839,ename:'KING' ,job:'PRESIDENT',mgr:0,hiredate:'17-11-81',sal:5000,comm:0,deptno:10},
{_id:7844,ename:'TURNER',job:'SALESMAN' ,mgr:7698,hiredate:'08-09-81',sal:1500,comm:0,deptno:30},
{_id:7876,ename:'ADAMS' ,job:'CLERK' ,mgr:7788,hiredate:'23-05-87',sal:1100,comm:0,deptno:20},
{_id:7900,ename:'JAMES' ,job:'CLERK' ,mgr:7698,hiredate:'03-12-81',sal:950,comm:0,deptno:30},
{_id:7902,ename:'FORD' ,job:'ANALYST' ,mgr:7566,hiredate:'03-12-81',sal:3000,comm:0,deptno:20},
{_id:7934,ename:'MILLER',job:'CLERK' ,mgr:7782,hiredate:'23-01-82',sal:1300,comm:0,deptno:10}
]
);
( Case a ) Ask for employees in the table , The number of people in each position
var map1=function(){emit(this.job,1)}
var reduce1=function(job,count){return Array.sum(count)}
db.emp.mapReduce(map1,reduce1,{out:"mrdemo1"})
( Case 2 ) Ask for employees in the table , The sum of the salaries in each department
var map2=function(){emit(this.deptno,this.sal)}
var reduce2=function(deptno,sal){return Array.sum(sal)}
db.emp.mapReduce(map2,reduce2,{out:"mrdemo2"})
( Case three )Troubleshoot the Map Function
Define your own emit function :
var emit = function(key, value) {
print("emit");
print("key: " + key + " value: " + tojson(value));
}
Test a piece of data :
emp7839=db.emp.findOne({_id:7839})
map2.apply(emp7839)
Output the following results :
emit
key: 10 value: 5000
Test multiple data :
var myCursor=db.emp.find()
while (myCursor.hasNext()) {
var doc = myCursor.next();
print ("document _id= " + tojson(doc._id));
map2.apply(doc);
print();
}
( Case four )Troubleshoot the Reduce Function
A simple test case
var myTestValues = [ 5, 5, 10 ];
var reduce1=function(key,values){return Array.sum(values)}
reduce1("mykey",myTestValues)
test :Reduce Of value Contains multiple values
Test data : salary 、 Bonus :
var myTestObjects = [
{ sal: 1000, comm: 5 },
{ sal: 2000, comm: 10 },
{ sal: 3000, comm: 15 }
];
Development reduce Method :
var reduce2=function(key,values) {
reducedValue = { sal: 0, comm: 0 };
for(var i=0;i<values.length;i++) {
reducedValue.sal += values[i].sal;
reducedValue.comm += values[i].comm;
}
return reducedValue;
}
test :
reduce2("aa",myTestObjects)
边栏推荐
- [advanced pointer (2)] | [function pointer, function pointer array, callback function] key analysis + code explanation
- Simpleitk learning notes
- Redis encountered noauth authentication required
- 32GB Jetson Orin SOM 不能刷机问题排查
- 【一起上水硕系列】Day 10
- Apache+PHP+MySQL环境搭建超详细!!!
- Jetson AGX Orin 平台移植ar0233-gw5200-max9295相机驱动
- 6.23 warehouse operation on Thursday
- 【无标题】
- Life is a process of continuous learning
猜你喜欢

Win10 install pytullet and test

PHP笔记超详细!!!

Map的扩容机制

Apache+PHP+MySQL环境搭建超详细!!!

@Import annotation: four ways to import configuration classes & source code analysis
![[together Shangshui Shuo series] day 7 content +day8](/img/fc/74b12addde3a4d3480e98f8578a969.png)
[together Shangshui Shuo series] day 7 content +day8

How to set up altaro offsite server for replication

SimpleITK学习笔记

Final review (Day5)

Strategy pattern: encapsulate changes and respond flexibly to changes in requirements
随机推荐
Communication - how to be a good listener?
32GB Jetson Orin SOM 不能刷机问题排查
今天很多 CTO 都是被干掉的,因为他没有成就业务
Jetson AGX Orin 平台移植ar0233-gw5200-max9295相机驱动
Best practices for setting up altaro VM backups
Simpleitk learning notes
How to install and configure altaro VM backup for VMware vSphere
Classification and discussion of plane grab detection methods based on learning
Ansible firewall firewalld setting
Altaro set grandfather parent child (GFS) archiving
Error 1045 (28000) occurs when Linux logs in MySQL: access denied for user 'root' @ 'localhost' (using password: yes)
Download the corresponding version of chromedriver
Jetson AgX Orin platform porting ar0233 gw5200 max9295 camera driver
Final review (day3)
大二困局(复盘)
Obtenir et surveiller les journaux du serveur distant
AtCoder Beginner Contest 258(A-D)
[function explanation (Part 2)] | [function declaration and definition + function recursion] key analysis + code diagram
JS implements the problem of closing the current child window and refreshing the parent window
Get and monitor remote server logs
