设为首页 收藏本站
查看: 827|回复: 0

[经验分享] mongodb aggregate and mapReduce

[复制链接]
累计签到:1 天
连续签到:1 天
发表于 2016-8-11 09:55:03 | 显示全部楼层 |阅读模式
Aggregate
MongoDB中聚合(aggregate)主要用于处理数据(诸如统计平均值,求和等),并返回计算后的数据结果。有点类似sql语句中的 count(*)
语法如下:
db.collection.aggregate()
db.collection.aggregate(pipeline,options)
db.runCommand({
aggregate: "<collection>",
pipeline: [ <stage>, <...> ],
explain: <boolean>,
allowDiskUse: <boolean>,
cursor: <document>
})

在使用aggregate实现聚合操作之前,我们首先来认识下几个常用的聚合操作符。
$project::可以对结果集中的键 重命名,控制键是否显示,对列进行计算。
$match:  过滤结果集
$group:  分组,聚合,求和,平均数,等
$skip:  在显示结果的时候跳过前几行
$sort:  对即将显示的结果集排序
$limit:  控制结果集的大小

例:
db.createCollection("emp")
db.emp.insert({_id:1,"ename":"tom","age":25,"department":"Sales","salary":6000})
db.emp.insert({_id:2,"ename":"eric","age":24,"department":"HR","salary":4500})
db.emp.insert({_id:3,"ename":"robin","age":30,"department":"Sales","salary":8000})
db.emp.insert({_id:4,"ename":"jack","age":28,"department":"Development","salary":8000})
db.emp.insert({_id:5,"ename":"Mark","age":22,"department":"Development","salary":6500})
db.emp.insert({_id:6,"ename":"marry","age":23,"department":"Planning","salary":5000})
db.emp.insert({_id:7,"ename":"hellen","age":32,"department":"HR","salary":6000})
db.emp.insert({_id:8,"ename":"sarah","age":24,"department":"Development","salary":7000})

> use company
switched to db company
> db.emp.aggregate(
... {$group:{_id:"$department",dpct:{$sum:1}}}
... )
{ "_id" : "Development", "dpct" : 3 }
{ "_id" : "HR", "dpct" : 2 }
{ "_id" : "Planning", "dpct" : 1 }
{ "_id" : "Sales", "dpct" : 2 }
> db.emp.aggregate(
... {$group:{_id:"$department",salct:{$sum:"$salary"},salavg:{$avg:"$salary"}}}
... )
{ "_id" : "Development", "salct" : 21500, "salavg" : 7166.666666666667 }
{ "_id" : "HR", "salct" : 10500, "salavg" : 5250 }
{ "_id" : "Planning", "salct" : 5000, "salavg" : 5000 }
{ "_id" : "Sales", "salct" : 14000, "salavg" : 7000 }
> db.emp.aggregate(
... {$match:{age:{$lt:25}}}
... )
{ "_id" : 2, "ename" : "eric", "age" : 24, "department" : "HR", "salary" : 4500 }
{ "_id" : 5, "ename" : "Mark", "age" : 22, "department" : "Development", "salary" : 6500 }
{ "_id" : 6, "ename" : "marry", "age" : 23, "department" : "Planning", "salary" : 5000 }
{ "_id" : 8, "ename" : "sarah", "age" : 24, "department" : "Development", "salary" : 7000 }
> db.emp.aggregate(
... {$match:{age:{$gt:25}}},
... {$group:{_id:"$department",salct:{$sum:"$salary"},salavg:{$avg:"$salary"}}}
... )
{ "_id" : "HR", "salct" : 6000, "salavg" : 6000 }
{ "_id" : "Development", "salct" : 8000, "salavg" : 8000 }
{ "_id" : "Sales", "salct" : 8000, "salavg" : 8000 }
> db.emp.aggregate(
... {$group:{_id:"$department",salct:{$sum:"$salary"},salavg:{$avg:"$salary"}}},
... {$match:{salavg:{$gt:6000}}}
... )
{ "_id" : "Development", "salct" : 21500, "salavg" : 7166.666666666667 }
{ "_id" : "Sales", "salct" : 14000, "salavg" : 7000 }
>
> db.emp.aggregate(
... {$sort:{age:1}},{$limit:3}
... )
{ "_id" : 5, "ename" : "Mark", "age" : 22, "department" : "Development", "salary" : 6500 }
{ "_id" : 6, "ename" : "marry", "age" : 23, "department" : "Planning", "salary" : 5000 }
{ "_id" : 2, "ename" : "eric", "age" : 24, "department" : "HR", "salary" : 4500 }
> db.emp.aggregate( {$sort:{age:-1}},{$limit:3} )
{ "_id" : 7, "ename" : "hellen", "age" : 32, "department" : "HR", "salary" : 6000 }
{ "_id" : 3, "ename" : "robin", "age" : 30, "department" : "Sales", "salary" : 8000 }
{ "_id" : 4, "ename" : "jack", "age" : 28, "department" : "Development", "salary" : 8000 }
> db.emp.aggregate( {$sort:{age:-1}},{$skip:4} )
{ "_id" : 2, "ename" : "eric", "age" : 24, "department" : "HR", "salary" : 4500 }
{ "_id" : 8, "ename" : "sarah", "age" : 24, "department" : "Development", "salary" : 7000 }
{ "_id" : 6, "ename" : "marry", "age" : 23, "department" : "Planning", "salary" : 5000 }
{ "_id" : 5, "ename" : "Mark", "age" : 22, "department" : "Development", "salary" : 6500 }
>
> db.emp.aggregate( {$project:{"姓名":"$ename","年龄":"$age","部门":"$department","工资":"$salary",_id:0}})
{ "姓名" : "tom", "年龄" : 25, "部门" : "Sales", "工资" : 6000 }
{ "姓名" : "eric", "年龄" : 24, "部门" : "HR", "工资" : 4500 }
{ "姓名" : "robin", "年龄" : 30, "部门" : "Sales", "工资" : 8000 }
{ "姓名" : "jack", "年龄" : 28, "部门" : "Development", "工资" : 8000 }
{ "姓名" : "Mark", "年龄" : 22, "部门" : "Development", "工资" : 6500 }
{ "姓名" : "marry", "年龄" : 23, "部门" : "Planning", "工资" : 5000 }
{ "姓名" : "hellen", "年龄" : 32, "部门" : "HR", "工资" : 6000 }
{ "姓名" : "sarah", "年龄" : 24, "部门" : "Development", "工资" : 7000 }
> db.emp.aggregate( {$project:{"姓名":"$ename","年龄":"$age","部门":"$department","工资":"$salary",_id:0}},{$match:{"工资":{$gt:6000}}})
{ "姓名" : "robin", "年龄" : 30, "部门" : "Sales", "工资" : 8000 }
{ "姓名" : "jack", "年龄" : 28, "部门" : "Development", "工资" : 8000 }
{ "姓名" : "Mark", "年龄" : 22, "部门" : "Development", "工资" : 6500 }
{ "姓名" : "sarah", "年龄" : 24, "部门" : "Development", "工资" : 7000 }
>

Map Reduce
Map-Reduce是一种计算模型,简单的说就是将大批量的工作(数据)分解(MAP)执行,然后再将结果合并成最终结果(REDUCE)。
MongoDB提供的Map-Reduce非常灵活,对于大规模数据分析也相当实用。
以下是MapReduce的基本语法:
>db.collection.mapReduce(
   function() {emit(key,value);},  //map 函数
   function(key,values) {return reduceFunction},   //reduce 函数
   {
      out: collection,
      query: document,
      sort: document,
      limit: number
   }
)
使用 MapReduce 要实现两个函数 Map 函数和 Reduce 函数,Map 函数调用 emit(key, value), 遍历 collection 中所有的记录, 将key 与 value 传递给 Reduce 函数进行处理。
Map 函数必须调用 emit(key, value) 返回键值对。
参数说明:
map :映射函数 (生成键值对序列,作为 reduce 函数参数)。
reduce 统计函数,reduce函数的任务就是将key-values变成key-value,也就是把values数组变成一个单一的值value。。
out 统计结果存放集合 (不指定则使用临时集合,在客户端断开后自动删除)。
query 一个筛选条件,只有满足条件的文档才会调用map函数。(query。limit,sort可以随意组合)
sort 和limit结合的sort排序参数(也是在发往map函数前给文档排序),可以优化分组机制
limit 发往map函数的文档数量的上限(要是没有limit,单独使用sort的用处不大)

> db.emp.mapReduce( function() { emit(this.department,1); }, function(key,values) { return Array.sum(values) }, { out:"depart_summary" } ).find()
{ "_id" : "Development", "value" : 3 }
{ "_id" : "HR", "value" : 2 }
{ "_id" : "Planning", "value" : 1 }
{ "_id" : "Sales", "value" : 2 }
    利用内置的sum函数返回每个部门的人数

> db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) {  return Array.avg(values) }, { out:"depart_summary" } ).find()
{ "_id" : "Development", "value" : 7166.666666666667 }
{ "_id" : "HR", "value" : 5250 }
{ "_id" : "Planning", "value" : 5000 }
{ "_id" : "Sales", "value" : 7000 }
    利用内置的avg函数返回每个部门的工资平均数

> db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) {  return Array.avg(values).toFixed(2) }, { out:"depart_summary" } ).find()
{ "_id" : "Development", "value" : "7166.67" }
{ "_id" : "HR", "value" : "5250.00" }
{ "_id" : "Planning", "value" : 5000 }
{ "_id" : "Sales", "value" : "7000.00" }
>    保留两位小数

> db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) {  return Array.sum(values) }, { out:"depart_summary" } ).find()
{ "_id" : "Development", "value" : 21500 }
{ "_id" : "HR", "value" : 10500 }
{ "_id" : "Planning", "value" : 5000 }
{ "_id" : "Sales", "value" : 14000 }
>  利用内置的sum函数返回每个部门的工资总和

> db.emp.mapReduce( function() { emit(this.department,{count:1}); }, function(key,values) { var sum=0; values.forEach(function(val){sum+=val.count}); return sum; }, { out:"depart_summary" } ).find()
{ "_id" : "Development", "value" : 3 }
{ "_id" : "HR", "value" : 2 }
{ "_id" : "Planning", "value" : { "count" : 1 } }
{ "_id" : "Sales", "value" : 2 }
>  手工计算每个部门的员工总数

> db.emp.mapReduce( function() { emit(this.department,{salct:this.salary,count:1}); }, function(key,values) { var res={salct:0,sum:0}; values.forEach(function(val){res.sum+=val.count;res.salct+=val.salct}); return res; }, { out:"depart_summary" } ).find()
{ "_id" : "Development", "value" : { "salct" : 21500, "sum" : 3 } }
{ "_id" : "HR", "value" : { "salct" : 10500, "sum" : 2 } }
{ "_id" : "Planning", "value" : { "salct" : 5000, "count" : 1 } }
{ "_id" : "Sales", "value" : { "salct" : 14000, "sum" : 2 } }
>  手工计算每个部门的员工总数和工资总数

> db.emp.mapReduce( function() { emit(this.department,{salct:this.salary,count:1}); }, function(key,values) { var res={salct:0,sum:0}; values.forEach(function(val){res.sum+=val.count;res.salct+=val.salct}); return res.salct/res.sum; }, { out:"depart_summary" } ).find()
{ "_id" : "Development", "value" : 7166.666666666667 }
{ "_id" : "HR", "value" : 5250 }
{ "_id" : "Planning", "value" : { "salct" : 5000, "count" : 1 } }
{ "_id" : "Sales", "value" : 7000 }
>  手工计算每个部门的工资平均值

> db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) {  return Array.avg(values) }, { out:"depart_summary" } ).find({value:{$gt:5000}})
{ "_id" : "Development", "value" : 7166.666666666667 }
{ "_id" : "HR", "value" : 5250 }
{ "_id" : "Sales", "value" : 7000 }
    将分组计算后的值进行过滤显示,只显示工资平均数大于5000的部门

> db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) {  return Array.avg(values) }, { out:"depart_summary" } ).find({value:{$gt:5000}}).sort({value:1})
{ "_id" : "HR", "value" : 5250 }
{ "_id" : "Sales", "value" : 7000 }
{ "_id" : "Development", "value" : 7166.666666666667 }
     将分组计算后的值进行排序,默认为升序

> db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) {  return Array.avg(values) }, { out:"depart_summary" } ).find({value:{$gt:5000}}).sort({value:-1})
{ "_id" : "Development", "value" : 7166.666666666667 }
{ "_id" : "Sales", "value" : 7000 }
{ "_id" : "HR", "value" : 5250 }
>    将分组计算后的值进行排序,手工指定降序

> db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) {  return Array.avg(values) }, { out:"depart_summary" } ).find({value:{$gt:5000}}).sort({value:-1}).limit(2)
{ "_id" : "Development", "value" : 7166.666666666667 }
{ "_id" : "Sales", "value" : 7000 }
>    将分组计算后的值进行降序排序后,取其中的两个值

> db.emp.mapReduce( function() { emit(this.department,{count:1}); }, function(key,values) { var sum=0; values.forEach(function(val){sum+=val.count}); return sum; }, { out:"depart_summary",query:{age:{$gt:25}} } ).find()
{ "_id" : "Development", "value" : { "count" : 1 } }
{ "_id" : "HR", "value" : { "count" : 1 } }
{ "_id" : "Sales", "value" : { "count" : 1 } }
>    分组前过滤数据,然后再分组计算

> db.emp.mapReduce( function() { emit(this.department,{count:1}); }, function(key,values) { var sum=0; values.forEach(function(val){sum+=val.count}); return sum; }, { out:"depart_summary",query:{age:{$gt:22}},sort:{age:1} } ).find()
{ "_id" : "Development", "value" : 2 }
{ "_id" : "HR", "value" : 2 }
{ "_id" : "Planning", "value" : { "count" : 1 } }
{ "_id" : "Sales", "value" : 2 }
>   分组前过滤数据,并排序,然后再分组计算 (本示例无意义)


运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-256270-1-1.html 上篇帖子: mongodb Explain and Index 下篇帖子: mongodb remove update find
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表