当前位置:网站首页>[teacher Zhao Yuqiang] calculate aggregation using MapReduce in mongodb
[teacher Zhao Yuqiang] calculate aggregation using MapReduce in mongodb
2022-07-03 05:45:00 【Teacher zhaoyuqiang】
MapReduce Can compute very complex aggregation logic , Very flexible , however ,MapReduce Very slow , It should not be used in real-time data analysis .MapReduce It can be used in multiple channels Server Execute in parallel , Each station Server Only part of it wordload, The final will be wordload Send to Master Server On the merger , Calculate the final result set , Back to the client .
MapReduce The basic idea of , As shown in the figure below :
In this case , Let's take a summation example . First, execute Map Stage , Divide a big task into several small tasks , Each small task runs on a different node , To support distributed computing , This stage is called Map( As shown in the blue box ); The output of each small task is calculated again , Finally, we get the result 55, This stage is called Reduce( As shown in the red box ).
Use MapReduce How to calculate aggregation , There are three main steps :Map,Shuffle( Put together ) and Reduce,Map and Reduce You need to explicitly define ,shuffle from MongoDB To achieve .
- Map: Map operations to each doc, produce Key and Value
- Shuffle: according to Key Grouping , And will key same Value Combine them into arrays
- Reduce: hold Value The array is reduced to a single value
Let's take the following test data ( Employee data ) For example , Let's show you .
db.emp.insert(
[
{_id:7369,ename:'SMITH' ,job:'CLERK' ,mgr:7902,hiredate:'17-12-80',sal:800,comm:0,deptno:20},
{_id:7499,ename:'ALLEN' ,job:'SALESMAN' ,mgr:7698,hiredate:'20-02-81',sal:1600,comm:300 ,deptno:30},
{_id:7521,ename:'WARD' ,job:'SALESMAN' ,mgr:7698,hiredate:'22-02-81',sal:1250,comm:500 ,deptno:30},
{_id:7566,ename:'JONES' ,job:'MANAGER' ,mgr:7839,hiredate:'02-04-81',sal:2975,comm:0,deptno:20},
{_id:7654,ename:'MARTIN',job:'SALESMAN' ,mgr:7698,hiredate:'28-09-81',sal:1250,comm:1400,deptno:30},
{_id:7698,ename:'BLAKE' ,job:'MANAGER' ,mgr:7839,hiredate:'01-05-81',sal:2850,comm:0,deptno:30},
{_id:7782,ename:'CLARK' ,job:'MANAGER' ,mgr:7839,hiredate:'09-06-81',sal:2450,comm:0,deptno:10},
{_id:7788,ename:'SCOTT' ,job:'ANALYST' ,mgr:7566,hiredate:'19-04-87',sal:3000,comm:0,deptno:20},
{_id:7839,ename:'KING' ,job:'PRESIDENT',mgr:0,hiredate:'17-11-81',sal:5000,comm:0,deptno:10},
{_id:7844,ename:'TURNER',job:'SALESMAN' ,mgr:7698,hiredate:'08-09-81',sal:1500,comm:0,deptno:30},
{_id:7876,ename:'ADAMS' ,job:'CLERK' ,mgr:7788,hiredate:'23-05-87',sal:1100,comm:0,deptno:20},
{_id:7900,ename:'JAMES' ,job:'CLERK' ,mgr:7698,hiredate:'03-12-81',sal:950,comm:0,deptno:30},
{_id:7902,ename:'FORD' ,job:'ANALYST' ,mgr:7566,hiredate:'03-12-81',sal:3000,comm:0,deptno:20},
{_id:7934,ename:'MILLER',job:'CLERK' ,mgr:7782,hiredate:'23-01-82',sal:1300,comm:0,deptno:10}
]
);
( Case a ) Ask for employees in the table , The number of people in each position
var map1=function(){emit(this.job,1)}
var reduce1=function(job,count){return Array.sum(count)}
db.emp.mapReduce(map1,reduce1,{out:"mrdemo1"})
( Case 2 ) Ask for employees in the table , The sum of the salaries in each department
var map2=function(){emit(this.deptno,this.sal)}
var reduce2=function(deptno,sal){return Array.sum(sal)}
db.emp.mapReduce(map2,reduce2,{out:"mrdemo2"})
( Case three )Troubleshoot the Map Function
Define your own emit function :
var emit = function(key, value) {
print("emit");
print("key: " + key + " value: " + tojson(value));
}
Test a piece of data :
emp7839=db.emp.findOne({_id:7839})
map2.apply(emp7839)
Output the following results :
emit
key: 10 value: 5000
Test multiple data :
var myCursor=db.emp.find()
while (myCursor.hasNext()) {
var doc = myCursor.next();
print ("document _id= " + tojson(doc._id));
map2.apply(doc);
print();
}
( Case four )Troubleshoot the Reduce Function
A simple test case
var myTestValues = [ 5, 5, 10 ];
var reduce1=function(key,values){return Array.sum(values)}
reduce1("mykey",myTestValues)
test :Reduce Of value Contains multiple values
Test data : salary 、 Bonus :
var myTestObjects = [
{ sal: 1000, comm: 5 },
{ sal: 2000, comm: 10 },
{ sal: 3000, comm: 15 }
];
Development reduce Method :
var reduce2=function(key,values) {
reducedValue = { sal: 0, comm: 0 };
for(var i=0;i<values.length;i++) {
reducedValue.sal += values[i].sal;
reducedValue.comm += values[i].comm;
}
return reducedValue;
}
test :
reduce2("aa",myTestObjects)
边栏推荐
- Map的扩容机制
- Basic introduction of redis and explanation of eight types and transactions
- 2022.7.2day594
- Skip table: principle introduction, advantages and disadvantages of skiplist
- "C and pointer" - Chapter 13 function pointer 1: callback function 2 (combined with template to simplify code)
- 期末复习(Day5)
- How to create your own repository for software packages on Debian
- NG Textarea-auto-resize
- The request database reported an error: "could not extract resultset; SQL [n/a]; needed exception is org.hibernate.exception.sqlgram"
- [untitled]
猜你喜欢
SimpleITK学习笔记
redis 无法远程连接问题。
Analysis of the example of network subnet division in secondary vocational school
Method of finding prime number
6.23星期四库作业
MySQL 5.7.32-winx64 installation tutorial (support installing multiple MySQL services on one host)
Communication - how to be a good listener?
How does win7 solve the problem that telnet is not an internal or external command
[advanced pointer (1)] | detailed explanation of character pointer, pointer array, array pointer
Qt读写Excel--QXlsx插入图表5
随机推荐
chromedriver对应版本下载
[branch and cycle] | | super long detailed explanation + code analysis + a trick game
[set theory] relational closure (reflexive closure | symmetric closure | transitive closure)
Export the altaro event log to a text file
Explanation of several points needing attention in final (tested by the author)
NG Textarea-auto-resize
JS implements the problem of closing the current child window and refreshing the parent window
Map的扩容机制
[minesweeping of two-dimensional array application] | [simple version] [detailed steps + code]
Classification and discussion of plane grab detection methods based on learning
Altaro virtual machine replication failed: "unsupported file type vmgs"
mysql启动报错:The server quit without updating PID file几种解决办法
2022.7.2day594
redis 遇到 NOAUTH Authentication required
Troubleshooting of 32GB Jetson Orin SOM failure to brush
Crontab command usage
EMD distance - example of use
Complete set of C language file operation functions (super detailed)
ES 2022 正式发布!有哪些新特性?
[untitled]