当前位置:网站首页>[teacher Zhao Yuqiang] calculate aggregation using MapReduce in mongodb
[teacher Zhao Yuqiang] calculate aggregation using MapReduce in mongodb
2022-07-03 05:45:00 【Teacher zhaoyuqiang】

MapReduce Can compute very complex aggregation logic , Very flexible , however ,MapReduce Very slow , It should not be used in real-time data analysis .MapReduce It can be used in multiple channels Server Execute in parallel , Each station Server Only part of it wordload, The final will be wordload Send to Master Server On the merger , Calculate the final result set , Back to the client .
MapReduce The basic idea of , As shown in the figure below :

In this case , Let's take a summation example . First, execute Map Stage , Divide a big task into several small tasks , Each small task runs on a different node , To support distributed computing , This stage is called Map( As shown in the blue box ); The output of each small task is calculated again , Finally, we get the result 55, This stage is called Reduce( As shown in the red box ).
Use MapReduce How to calculate aggregation , There are three main steps :Map,Shuffle( Put together ) and Reduce,Map and Reduce You need to explicitly define ,shuffle from MongoDB To achieve .
- Map: Map operations to each doc, produce Key and Value
- Shuffle: according to Key Grouping , And will key same Value Combine them into arrays
- Reduce: hold Value The array is reduced to a single value
Let's take the following test data ( Employee data ) For example , Let's show you .
db.emp.insert(
[
{_id:7369,ename:'SMITH' ,job:'CLERK' ,mgr:7902,hiredate:'17-12-80',sal:800,comm:0,deptno:20},
{_id:7499,ename:'ALLEN' ,job:'SALESMAN' ,mgr:7698,hiredate:'20-02-81',sal:1600,comm:300 ,deptno:30},
{_id:7521,ename:'WARD' ,job:'SALESMAN' ,mgr:7698,hiredate:'22-02-81',sal:1250,comm:500 ,deptno:30},
{_id:7566,ename:'JONES' ,job:'MANAGER' ,mgr:7839,hiredate:'02-04-81',sal:2975,comm:0,deptno:20},
{_id:7654,ename:'MARTIN',job:'SALESMAN' ,mgr:7698,hiredate:'28-09-81',sal:1250,comm:1400,deptno:30},
{_id:7698,ename:'BLAKE' ,job:'MANAGER' ,mgr:7839,hiredate:'01-05-81',sal:2850,comm:0,deptno:30},
{_id:7782,ename:'CLARK' ,job:'MANAGER' ,mgr:7839,hiredate:'09-06-81',sal:2450,comm:0,deptno:10},
{_id:7788,ename:'SCOTT' ,job:'ANALYST' ,mgr:7566,hiredate:'19-04-87',sal:3000,comm:0,deptno:20},
{_id:7839,ename:'KING' ,job:'PRESIDENT',mgr:0,hiredate:'17-11-81',sal:5000,comm:0,deptno:10},
{_id:7844,ename:'TURNER',job:'SALESMAN' ,mgr:7698,hiredate:'08-09-81',sal:1500,comm:0,deptno:30},
{_id:7876,ename:'ADAMS' ,job:'CLERK' ,mgr:7788,hiredate:'23-05-87',sal:1100,comm:0,deptno:20},
{_id:7900,ename:'JAMES' ,job:'CLERK' ,mgr:7698,hiredate:'03-12-81',sal:950,comm:0,deptno:30},
{_id:7902,ename:'FORD' ,job:'ANALYST' ,mgr:7566,hiredate:'03-12-81',sal:3000,comm:0,deptno:20},
{_id:7934,ename:'MILLER',job:'CLERK' ,mgr:7782,hiredate:'23-01-82',sal:1300,comm:0,deptno:10}
]
);
( Case a ) Ask for employees in the table , The number of people in each position
var map1=function(){emit(this.job,1)}
var reduce1=function(job,count){return Array.sum(count)}
db.emp.mapReduce(map1,reduce1,{out:"mrdemo1"})
( Case 2 ) Ask for employees in the table , The sum of the salaries in each department
var map2=function(){emit(this.deptno,this.sal)}
var reduce2=function(deptno,sal){return Array.sum(sal)}
db.emp.mapReduce(map2,reduce2,{out:"mrdemo2"})
( Case three )Troubleshoot the Map Function
Define your own emit function :
var emit = function(key, value) {
print("emit");
print("key: " + key + " value: " + tojson(value));
}
Test a piece of data :
emp7839=db.emp.findOne({_id:7839})
map2.apply(emp7839)
Output the following results :
emit
key: 10 value: 5000
Test multiple data :
var myCursor=db.emp.find()
while (myCursor.hasNext()) {
var doc = myCursor.next();
print ("document _id= " + tojson(doc._id));
map2.apply(doc);
print();
}
( Case four )Troubleshoot the Reduce Function
A simple test case
var myTestValues = [ 5, 5, 10 ];
var reduce1=function(key,values){return Array.sum(values)}
reduce1("mykey",myTestValues)
test :Reduce Of value Contains multiple values
Test data : salary 、 Bonus :
var myTestObjects = [
{ sal: 1000, comm: 5 },
{ sal: 2000, comm: 10 },
{ sal: 3000, comm: 15 }
];
Development reduce Method :
var reduce2=function(key,values) {
reducedValue = { sal: 0, comm: 0 };
for(var i=0;i<values.length;i++) {
reducedValue.sal += values[i].sal;
reducedValue.comm += values[i].comm;
}
return reducedValue;
}
test :
reduce2("aa",myTestObjects)
边栏推荐
- [Shangshui Shuo series together] day 10
- Classification and discussion of plane grab detection methods based on learning
- Personal outlook | looking forward to the future from Xiaobai's self analysis and future planning
- [escape character] [full of dry goods] super detailed explanation + code illustration!
- Final review (Day7)
- [branch and cycle] | | super long detailed explanation + code analysis + a trick game
- C 语言文件操作函数大全 (超详细)
- 【无标题】
- [advanced pointer (1)] | detailed explanation of character pointer, pointer array, array pointer
- [set theory] relational closure (reflexive closure | symmetric closure | transitive closure)
猜你喜欢

redis 无法远程连接问题。
![[Shangshui Shuo series together] day 10](/img/a3/e8b9df588bef67ead925813a75c8c0.png)
[Shangshui Shuo series together] day 10

Redis cannot connect remotely.
![[explain in depth the creation and destruction of function stack frames] | detailed analysis + graphic analysis](/img/df/884313a69fb1e613aec3497800f7ba.jpg)
[explain in depth the creation and destruction of function stack frames] | detailed analysis + graphic analysis

【一起上水硕系列】Day 7 内容+Day8

Export the altaro event log to a text file

为什么网站打开速度慢?

@Autowired 导致空指针报错 解决方式

2022.DAY592

Win10 install pytullet and test
随机推荐
Source insight operation manual installation trial
"C and pointer" - Chapter 13 function of function pointer 1 - callback function 1
Win10 install pytullet and test
[explain in depth the creation and destruction of function stack frames] | detailed analysis + graphic analysis
Linux登录MySQL出现ERROR 1045 (28000): Access denied for user ‘root‘@‘localhost‘ (using password: YES)
期末复习(DAY6)
MySQL startup error: several solutions to the server quit without updating PID file
[advanced pointer (2)] | [function pointer, function pointer array, callback function] key analysis + code explanation
[branch and cycle] | | super long detailed explanation + code analysis + a trick game
@Autowired 导致空指针报错 解决方式
2022.7.2 simulation match
How to install and configure altaro VM backup for VMware vSphere
Final review (Day7)
配置xml文件的dtd
Hotel public broadcasting background music - Design of hotel IP network broadcasting system based on Internet +
About debugging the assignment of pagenum and PageSize of the formal parameter pageweb < T > (i.e. page encapsulation generic) in the controller
Today, many CTOs were killed because they didn't achieve business
redis 无法远程连接问题。
2022.DAY592
How to use source insight
