MongoDB 用实例学习聚合操作

华风 · 发表于 2018-10-26 07:00:56

　　Mongodb官方网站提供了一个美国人口统计数据，下载地址如下
　　http://media.mongodb.org/zips.json
　　数据示例：
[root@localhost cluster]# head zips.json　　
{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" }
　　
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }
　　
{ "_id" : "01005", "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA" }
　　
{ "_id" : "01007", "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, 42.275103 ], "pop" : 10579, "state" : "MA" }
　　
{ "_id" : "01008", "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA" }
　　
{ "_id" : "01010", "city" : "BRIMFIELD", "loc" : [ -72.188455, 42.116543 ], "pop" : 3706, "state" : "MA" }
　　
{ "_id" : "01011", "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA" }
　　
{ "_id" : "01012", "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA" }
　　
{ "_id" : "01013", "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA" }
　　
{ "_id" : "01020", "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA" }
　　使用mongoimport将数据导入mongodb数据库
[root@localhost cluster]# mongoimport -d test -c "zipcodes" --file zips.json -h 192.168.199.219:27020　　
2016-01-16T18:31:29.424+0800connected to: 192.168.199.219:27020
　　
2016-01-16T18:31:32.420+0800[################........] test.zipcodes2.1 MB/3.0 MB (68.5%)
　　
2016-01-16T18:31:34.471+0800[########################] test.zipcodes3.0 MB/3.0 MB (100.0%)
　　
2016-01-16T18:31:34.471+0800imported 29353 documents
　　一、单一目的的聚合操作
　　求count，distinct等简单操作
　　实例1.1：求zipcodes集合的文档数
db.zipcodes.count()　　实例1.2 求MA州的文档总数
db.zipcodes.count({state:"MA"})　　实例1.3 求zipcodes中有哪些州
db.zipcodes.distinct("state")　　二、使用aggregate聚合框架，进行更复杂的聚合操作
　　实例2.1：统计每个州的人口总数
db.zipcodes.aggregate(　　
[
　　
   { $group: { _id: "$state", total: { $sum: "$pop" } } }
　　
]
　　
)
　　使用集合的aggregate方法，进行聚合查询。
　　$group关键字后面指定分组的字段(引用字段时，一定要用$前缀)，以及聚合函数。
　　_id:是关键字，代表返回结果集的主键。
　　该查询等价的SQL为
select state as _id,sum(pop) as total　　
  from zipcodes
　　
group by state
　　实例2.2：统计每个州每个城市的人口总数
db.zipcodes.aggregate(　　
[
　　
   { $group: { _id: {state:"$state",city:"$city"}, pop: { $sum: "$pop" } } },
　　
]
　　
)
　　分组的字段如果多于一个，那么每个字段都要给定一个别名,如 state:"$state"
　　实例2.3：统计每个州人口多于10000的城市的人口总和
db.zipcodes.aggregate(　　
[
　　
   { $match: {"pop":{$gt: 10000} }},
　　
   { $group: { _id: {state:"$state"}, pop: { $sum: "$pop" } } },
　　
]
　　
)
　　$match 关键字后面跟上集合的过滤条件。该语句等价于如下SQL
select state,sum(pop) as pop　　
  from zipcodes
　　
where pop>10000
　　
group by state
　　实例2.4：查询人口总数超过1千万的州
db.zipcodes.aggregate(　　
[
　　
   { $group: { _id: {state:"$state"}, pop: { $sum: "$pop" } } },
　　
   { $match: {"pop":{$gt: 1000*10000} }}
　　
]
　　
)
　　将$match放在$group后面，相当于是先执行group操作，再对结果集进行过滤。等价的sql如下
select state,sum(pop) as pop　　
  from zipcodes
　　
group by state
　　
having sum(pop)>1000*10000
　　实例5：求每个州城市的平均人口
db.zipcodes.aggregate(　　
[
　　
   { $group: { _id: {state:"$state",city:"$city"}, pop: { $sum: "$pop" } } },
　　
   { $group: {_id:"$_id.state",avgPop:{$avg: "$pop"}}}
　　
]
　　
)
　　我们的aggregate函数支持多次迭代，该语句的等价sql为
select state,avg(pop) as avgPop　　
  from
　　
  (select state,city,sum(pop) pop
　　
   from zipcodes
　　
group by state,city)
　　
group by state
　　实例2.5 ：求每个州人口最多及最少的城市名及对应的人口数量
db.zipcodes.aggregate(　　
[
　　
   { $group: { _id: {state:"$state",city:"$city"}, cityPop: { $sum: "$pop" } } },
　　
   { $sort: { cityPop: 1 } },
　　
   { $group: {
　　
      _id:"$_id.state",
　　
      biggestCity:{$last:"$_id.city"},
　　
      biggestPop:{$last:"$cityPop"},
　　
      smallestCity:{$first:"$_id.city"},
　　
      smallestPop:{$first:"$cityPop"}
　　
   }}
　　
]
　　
)
　　第一个$group求出按state，city分组的人口数。
　　$sort操作按照人口数排序
　　第二个$group 按照state分组，此时每个state分组的数据已经安装cityPop排序。每个组的第一行数据（$first 取得）是人口最少的city，最后一行（$last 取得）是人口最多的city。
　　实例2.6 利用$project重新格式化结果
db.zipcodes.aggregate(　　
[
　　
   { $group: { _id: {state:"$state",city:"$city"}, cityPop: { $sum: "$pop" } } },
　　
   { $sort: { cityPop: 1 } },
　　
   {
　　
      $group: {
　　
      _id:"$_id.state",
　　
      biggestCity:{$last:"$_id.city"},
　　
      biggestPop:{$last:"$cityPop"},
　　
      smallestCity:{$first:"$_id.city"},
　　
      smallestPop:{$first:"$cityPop"}
　　
      }
　　
   },
　　
   {
　　
      $project: {
　　
         _id:0,
　　
         state: "$_id",
　　
         biggestCity: { name: "$biggestCity", pop: "$biggestPop" },
　　
         smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
　　
      }
　　
   }
　　
]
　　
)
　　实例2.7 对数组中的内容做聚合统计
　　我们假设有一个学生选课的集合，数据示例如下
db.course.insert({name:"张三",age:10,grade:"四年级",course:["数学","英语","政治"]})　　

　　
db.course.insert({name:"李四",age:9,grade:"三年级",course:["数学","语文","自然"]})
　　

　　
db.course.insert({name:"王五",age:11,grade:"四年级",course:["数学","英语","语文"]})
　　

　　
db.course.insert({name:"赵六",age:9,grade:"四年级",course:["数学","历史","政治"]})
　　求每门课程有多少人选修
db.course.aggregate(　　
[
　　
   { $unwind: "$course" },
　　
   { $group: { _id: "$course", sum: { $sum: 1 } } },
　　
   { $sort: { sum: -1 } }
　　
]
　　
)
　　$unwind，用来将数组中的内容拆包，然后再按照拆包后的数据进行分组，另外aggregate中没有$count关键字，使用$sum:1 来计算count 。
　　实例2.8 求每个州有哪些city。
db.zipcodes.aggregate(　　
[
　　
   { $group: { _id: "$state", cities: { $addToSet: "$city"} } },
　　
]
　　
)
　　$addToSet 将每个分组的city内容，写到一个数组中。
　　假设我们有如下数据结构
db.book.insert({　　
  _id: 1,
　　
  title: "MongoDB Documentation",
　　
  tags: [ "Mongodb", "NoSQL" ],
　　
  year: 2014,
　　
  subsections: [
　　
{
　　
   subtitle: "Section 1: Install MongoDB",
　　
   tags: [ "NoSQL", "Document" ],
　　
   content:  "Section 1: This is the content of section 1."
　　
},
　　
{
　　
   subtitle: "Section 2: MongoDB CRUD Operations",
　　
   tags: [ "Insert","Mongodb" ],
　　
   content: "Section 2: This is the content of section 2."
　　
},
　　
{
　　
   subtitle: "Section 3: Aggregation",
　　
   tags: [ "Aggregate" ],
　　
   content: {
　　
      text: "Section 3: This is the content of section3.",
　　
      tags: [ "MapReduce","Aggregate" ]
　　
   }
　　
}
　　
  ]
　　
})
　　该文档描述书的章节内容，每章节有tags字段，书本身也有tags字段。
　　如果客户有需要，查询带有标签Mongodb的书，以及只显示有标签Mongodb的章节。我们使用find()方法是无法满足的。
db.book.find(　　
         {
　　
               $or:
　　
               [{tags:{$in: ['Mongodb']}},
　　
               {"subsections.tags":{$in: ['Mongodb']}}
　　
               ]
　　
         }
　　
)
　　上面类似的查询，会显示命中文档的所有部分，把不包含Mongodb标签的章节也显示出来了。
　　Aggregate提供了一个$redact表达式，可以对结果进行裁剪。
db.book.aggregate(　　
[
　　
   {$redact: {
　　
      $cond: {
　　
         if: {
　　
               $gt:[ {$size: {$setIntersection: ["$tags",["Mongodb"]] }},0]
　　
         },
　　
         then:"$$DESCEND" ,
　　
         else: "$$PRUNE"
　　
      }
　　
   }}
　　
]
　　
)
　　$$DESCEND 如果满足条件，则返回条件tags字段，对于内嵌文档，则返回父级字段。所有判断条件会作用到内嵌文档中。
　　$$PRUNE 如果不满足条件，则不显示该字段。
　　查询结果如下
{　　"_id" : 1,
　　"title" : "MongoDB Documentation",
　　"tags" : [
　　"Mongodb",
　　"NoSQL"
　　],
　　"year" : 2014,
　　"subsections" : [
　　{
　　"subtitle" : "Section 2: MongoDB CRUD Operations",
　　"tags" : [
　　"Insert",
　　"Mongodb"
　　],
　　"content" : "Section 2: This is the content of section 2."
　　}
　　]
　　
}
　　三、使用mapReduce
　　实例3.1 ：统计每个州的人口总数
db.zipcodes.mapReduce(　　
   function () {emit(this.state, this.pop)}, //mapFunction
　　
   (key, values)=>{return Array.sum(values)},//reduceFunction
　　
   { out: "zipcodes_groupby_state"}
　　
)
　　使用mapReduce，最少有三个参数，map函数、reduce函数、out输出参数。
　　map函数中，this表示处理的当前文档。emit函数，将传入的键值对传出给reduce函数。
　　reduce接受map函数的输出，作为输入。reduce中的values是一个列表。对上例来说，state是键，相同state的每条记录对应的pop组成一个列表作为值。形式如下
　　state = "CA" values=[51841,40629,...]
　　reduce函数的key是默认一定会返回的，return的返回值，将values中的值相加。作为值。
　　out：输出结果保存的集合
　　实例3.2 统计每个城市的人口数，及每个城市的文档个数。
db.zipcodes.mapReduce(　　
   function () {
　　
      var key = {state:this.state,city:this.city}
　　
      emit(key, {count:1,pop:this.pop})
　　
   }, //mapFunction
　　
   (key, values)=>{
　　
      var retval = {count:0,pop:0}
　　
      for (var i =0;i< values.length;i++){
　　
            retval.count += values.count
　　
            retval.pop += values.pop
　　
      }
　　
      return retval
　　
   },//reduceFunction
　　
   { out: "zipcodes_groupby_state_city"}
　　
)
　　我们将{state，city}作为一个对象当成值，传递给map函数的key。将{count：1，pop:this.pop}对象传递给map的value 。
　　再reduce函数中再次计算count，pop的值。返回。
　　等价的sql如下
select state,city,count(*) as count,sum(pop) as pop　　
  from zipcodes
　　
group by state,city

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

Red Hat RHCE 8 (EX294) Cert Guide

c++ size_t 和 int 的区别

HERE 使用 AWS EF 和 JFrog Artifactory 打

C++ 指针大全：从基础到进阶，一篇快速上手

wirelessnetview好用的无线分析工具

亿图图示专家(EDraw Max) V7.9 中文破解版

[经验分享] MongoDB 用实例学习聚合操作

浏览过的版块

扫码加入运维网微信交流群