当前位置:网站首页>[teacher Zhao Yuqiang] calculate aggregation using MapReduce in mongodb
[teacher Zhao Yuqiang] calculate aggregation using MapReduce in mongodb
2022-07-03 05:45:00 【Teacher zhaoyuqiang】

MapReduce Can compute very complex aggregation logic , Very flexible , however ,MapReduce Very slow , It should not be used in real-time data analysis .MapReduce It can be used in multiple channels Server Execute in parallel , Each station Server Only part of it wordload, The final will be wordload Send to Master Server On the merger , Calculate the final result set , Back to the client .
MapReduce The basic idea of , As shown in the figure below :

In this case , Let's take a summation example . First, execute Map Stage , Divide a big task into several small tasks , Each small task runs on a different node , To support distributed computing , This stage is called Map( As shown in the blue box ); The output of each small task is calculated again , Finally, we get the result 55, This stage is called Reduce( As shown in the red box ).
Use MapReduce How to calculate aggregation , There are three main steps :Map,Shuffle( Put together ) and Reduce,Map and Reduce You need to explicitly define ,shuffle from MongoDB To achieve .
- Map: Map operations to each doc, produce Key and Value
- Shuffle: according to Key Grouping , And will key same Value Combine them into arrays
- Reduce: hold Value The array is reduced to a single value
Let's take the following test data ( Employee data ) For example , Let's show you .
db.emp.insert(
[
{_id:7369,ename:'SMITH' ,job:'CLERK' ,mgr:7902,hiredate:'17-12-80',sal:800,comm:0,deptno:20},
{_id:7499,ename:'ALLEN' ,job:'SALESMAN' ,mgr:7698,hiredate:'20-02-81',sal:1600,comm:300 ,deptno:30},
{_id:7521,ename:'WARD' ,job:'SALESMAN' ,mgr:7698,hiredate:'22-02-81',sal:1250,comm:500 ,deptno:30},
{_id:7566,ename:'JONES' ,job:'MANAGER' ,mgr:7839,hiredate:'02-04-81',sal:2975,comm:0,deptno:20},
{_id:7654,ename:'MARTIN',job:'SALESMAN' ,mgr:7698,hiredate:'28-09-81',sal:1250,comm:1400,deptno:30},
{_id:7698,ename:'BLAKE' ,job:'MANAGER' ,mgr:7839,hiredate:'01-05-81',sal:2850,comm:0,deptno:30},
{_id:7782,ename:'CLARK' ,job:'MANAGER' ,mgr:7839,hiredate:'09-06-81',sal:2450,comm:0,deptno:10},
{_id:7788,ename:'SCOTT' ,job:'ANALYST' ,mgr:7566,hiredate:'19-04-87',sal:3000,comm:0,deptno:20},
{_id:7839,ename:'KING' ,job:'PRESIDENT',mgr:0,hiredate:'17-11-81',sal:5000,comm:0,deptno:10},
{_id:7844,ename:'TURNER',job:'SALESMAN' ,mgr:7698,hiredate:'08-09-81',sal:1500,comm:0,deptno:30},
{_id:7876,ename:'ADAMS' ,job:'CLERK' ,mgr:7788,hiredate:'23-05-87',sal:1100,comm:0,deptno:20},
{_id:7900,ename:'JAMES' ,job:'CLERK' ,mgr:7698,hiredate:'03-12-81',sal:950,comm:0,deptno:30},
{_id:7902,ename:'FORD' ,job:'ANALYST' ,mgr:7566,hiredate:'03-12-81',sal:3000,comm:0,deptno:20},
{_id:7934,ename:'MILLER',job:'CLERK' ,mgr:7782,hiredate:'23-01-82',sal:1300,comm:0,deptno:10}
]
);
( Case a ) Ask for employees in the table , The number of people in each position
var map1=function(){emit(this.job,1)}
var reduce1=function(job,count){return Array.sum(count)}
db.emp.mapReduce(map1,reduce1,{out:"mrdemo1"})
( Case 2 ) Ask for employees in the table , The sum of the salaries in each department
var map2=function(){emit(this.deptno,this.sal)}
var reduce2=function(deptno,sal){return Array.sum(sal)}
db.emp.mapReduce(map2,reduce2,{out:"mrdemo2"})
( Case three )Troubleshoot the Map Function
Define your own emit function :
var emit = function(key, value) {
print("emit");
print("key: " + key + " value: " + tojson(value));
}
Test a piece of data :
emp7839=db.emp.findOne({_id:7839})
map2.apply(emp7839)
Output the following results :
emit
key: 10 value: 5000
Test multiple data :
var myCursor=db.emp.find()
while (myCursor.hasNext()) {
var doc = myCursor.next();
print ("document _id= " + tojson(doc._id));
map2.apply(doc);
print();
}
( Case four )Troubleshoot the Reduce Function
A simple test case
var myTestValues = [ 5, 5, 10 ];
var reduce1=function(key,values){return Array.sum(values)}
reduce1("mykey",myTestValues)
test :Reduce Of value Contains multiple values
Test data : salary 、 Bonus :
var myTestObjects = [
{ sal: 1000, comm: 5 },
{ sal: 2000, comm: 10 },
{ sal: 3000, comm: 15 }
];
Development reduce Method :
var reduce2=function(key,values) {
reducedValue = { sal: 0, comm: 0 };
for(var i=0;i<values.length;i++) {
reducedValue.sal += values[i].sal;
reducedValue.comm += values[i].comm;
}
return reducedValue;
}
test :
reduce2("aa",myTestObjects)
边栏推荐
- How to create your own repository for software packages on Debian
- 一起上水碩系列】Day 9
- 获取并监控远程服务器日志
- Making coco datasets
- The request database reported an error: "could not extract resultset; SQL [n/a]; needed exception is org.hibernate.exception.sqlgram"
- Complete set of C language file operation functions (super detailed)
- Explanation of several points needing attention in final (tested by the author)
- 期末复习DAY8
- Azure file synchronization of altaro: the end of traditional file servers?
- Hotel public broadcasting background music - Design of hotel IP network broadcasting system based on Internet +
猜你喜欢

Life is a process of continuous learning
![Together, Shangshui Shuo series] day 9](/img/39/c1ba1bac82b0ed110f36423263ffd0.png)
Together, Shangshui Shuo series] day 9

College campus IP network broadcasting - manufacturer's design guide for college campus IP broadcasting scheme based on campus LAN
![[advanced pointer (2)] | [function pointer, function pointer array, callback function] key analysis + code explanation](/img/9b/a309607c037b0a18ff6b234a866f9f.jpg)
[advanced pointer (2)] | [function pointer, function pointer array, callback function] key analysis + code explanation
![[escape character] [full of dry goods] super detailed explanation + code illustration!](/img/33/ec5a5e11bfd43f53f2767a9a0f0cc9.jpg)
[escape character] [full of dry goods] super detailed explanation + code illustration!

Communication - how to be a good listener?

SimpleITK学习笔记

求质数的方法

配置xml文件的dtd
![[set theory] relational closure (reflexive closure | symmetric closure | transitive closure)](/img/c8/2995c503e9dabae4e2cc704449e04f.jpg)
[set theory] relational closure (reflexive closure | symmetric closure | transitive closure)
随机推荐
Notepad++ wrap by specified character
The request database reported an error: "could not extract resultset; SQL [n/a]; needed exception is org.hibernate.exception.sqlgram"
聊聊如何利用p6spy进行sql监控
获取并监控远程服务器日志
Redis encountered noauth authentication required
Final review (Day7)
Troubleshooting of 32GB Jetson Orin SOM failure to brush
Can altaro back up Microsoft teams?
Disassembly and installation of Lenovo r7000 graphics card
为什么网站打开速度慢?
Together, Shangshui Shuo series] day 9
2022.7.2day594
Jetson AgX Orin platform porting ar0233 gw5200 max9295 camera driver
【一起上水硕系列】Day 10
Use telnet to check whether the port corresponding to the IP is open
Latest version of source insight
CAD插件的安装和自动加载dll、arx
2022.7.2 simulation match
[untitled]
Apache+php+mysql environment construction is super detailed!!!
