Trees in MongoDB

蔷薇525 · 发表于 2015-7-7 10:48:28

　　引用地址：http://api.mongodb.org/wiki/current/Trees%20in%20MongoDB.html
　　
　　树结构存储最好的方式常常依赖于要执行的操作；下面讨论一下不同的存储方案。在实践中，许多开发人员找到了一些使用起来很方便的模式：“单文档存储整根树(Full Tree in single Document)”，“父连接(Parent Links)”和“祖先数组(Array of Ancestors)”。

1       模式

1.1    单文档存储整根树(Full Tree in Signle Document)
　　{
　　  comments: [
　　 {by: "mathias", text: "...", replies: []}
　　 {by: "eliot", text: "...", replies: [
　　    {by: "mike", text: "...", replies: []}
　　 ]}
　　  ]
　　}
　　优点：

单文档来获取每页
整根树存储在磁盘的同一位置
很容易查看整个树结构

　　缺点：

搜索困难
获取局部结果困难
对于大型树结构读取难处理，进一步来说，MongoDB v1.8对于文档大小有限制，16MB（未来版本可能会提高限制）。

1.2    (父连接)Parent Links
　　用单个集合来存储所有节点，每个节点包含他父节点的ID，是一种简单的解决方案。这种方法最大的问题是获取完整子树时需要查找多次数据库（或使用db.eval函数）。

> t = db.tree1;

> t.find()
{ "_id" : 1 }
{ "_id" : 2, "parent" : 1 }
{ "_id" : 3, "parent" : 1 }
{ "_id" : 4, "parent" : 2 }
{ "_id" : 5, "parent" : 4 }
{ "_id" : 6, "parent" : 4 }

> // find children of node 4
> t.ensureIndex({parent:1})
> t.find( {parent : 4 } )
{ "_id" : 5, "parent" : 4 }
{ "_id" : 6, "parent" : 4 }
1.3    (子链接)Child Links
　　另一种选择是在每个节点文档中存储所有子节点的ID。这个方法是有限制的，如果不操作完整子树是没有问题。他可能也是用于存储一个节点有多个父节点情况的最有效方法。

> t = db.tree2
> t.find()
{ "_id" : 1, "children" : [ 2, 3 ] }
{ "_id" : 2 }
{ "_id" : 3, "children" : [ 4 ] }
{ "_id" : 4 }

> // find immediate children of node 3
> t.findOne({_id:3}).children
[ 4 ]

> // find immediate parent of node 3
> t.ensureIndex({children:1})
> t.find({children:3})
{ "_id" : 1, "children" : [ 2, 3 ] }
1.4    (祖先数组)Array of Ancestors
　　在这种方法中将一个节点的所有祖先节点存储到一个数组中。这使得类似于“获取X节点的所有子节点”的操作快且容易。

> t = db.mytree;

> t.find()
{ "_id" : "a" }
{ "_id" : "b", "ancestors" : [ "a" ], "parent" : "a" }
{ "_id" : "c", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "d", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "e", "ancestors" : [ "a" ], "parent" : "a" }
{ "_id" : "f", "ancestors" : [ "a", "e" ], "parent" : "e" }
{ "_id" : "g", "ancestors" : [ "a", "b", "d" ], "parent" : "d" }

> t.ensureIndex( { ancestors : 1 } )

> // find all descendents of b:
> t.find( { ancestors : 'b' })
{ "_id" : "c", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "d", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "g", "ancestors" : [ "a", "b", "d" ], "parent" : "d" }

> // get all ancestors of f:
> anc = db.mytree.findOne({_id:'f'}).ancestors
[ "a", "e" ]
> db.mytree.find( { _id : { $in : anc } } )
{ "_id" : "a" }
{ "_id" : "e", "ancestors" : [ "a" ], "parent" : "a" }
　　ensureIndex和MongoDB的multikey特性可以使上面的查询更高效。
　　除了祖先数组，我们也存储了节点的直接父节点，使得查找节点的直接父节点更容易。

1.5    物化路径(Materialized Path[Full Path in Each Node))
　　物化路径使得对树的特定查询容易。我们在每个节点中存储文档在树中位置的全路径。通常情况下上面提到的“祖先数组”方法都工作很好；当不得不处理字符串建造、正则表达式，字符逃逸，物化路径更容易。（理论上，物化路径将会更快。）
　　MongoDB实现物化路径最好的方式是将路径存储成字符串，然后采用正则表达式查询。以“^”开头的正则表达可以被高效执行。把数据看作一个字符串，你需要选择一个分隔符，我们采用“,”。举例：

> t = db.tree
test.tree

> // get entire tree -- we use sort() to make the order nice
> t.find().sort({path:1})
{ "_id" : "a", "path" : "a," }
{ "_id" : "b", "path" : "a,b," }
{ "_id" : "c", "path" : "a,b,c," }
{ "_id" : "d", "path" : "a,b,d," }
{ "_id" : "g", "path" : "a,b,g," }
{ "_id" : "e", "path" : "a,e," }
{ "_id" : "f", "path" : "a,e,f," }
{ "_id" : "g", "path" : "a,b,g," }

> t.ensureIndex( {path:1} )

> // find the node 'b' and all its descendents:
> t.find( { path : /^a,b,/ } )
{ "_id" : "b", "path" : "a,b," }
{ "_id" : "c", "path" : "a,b,c," }
{ "_id" : "d", "path" : "a,b,d," }
{ "_id" : "g", "path" : "a,b,g," }

> // find the node 'b' and its descendents, where path to 'b' is not already known:
> nodeb = t.findOne( { _id : "b" } )
{ "_id" : "b", "path" : "a,b," }
> t.find( { path : new RegExp("^" + nodeb.path) } )
{ "_id" : "b", "path" : "a,b," }
{ "_id" : "c", "path" : "a,b,c," }
{ "_id" : "d", "path" : "a,b,d," }
{ "_id" : "g", "path" : "a,b,g," }
　　Ruby实例：http://github.com/banker/newsmonger/blob/master/app/models/comment.rb
　　嵌套数据集：
　　http://ar.rubyonrails.org/classes/ActiveRecord/Acts/NestedSet/ClassMethods.html
　　这种模式适用于很少需要修改数据集中文档的情况。

2       用例(Use Cases)

2.1    根据局部路径查找节点(Find Nodes by a Partial Path)
　　假设我们想要查找树中的给定节点，给出从树的某部分到这个点的路径，然后再到这个点，也许就向下面描述的一样。
　　用物化路径方法，我们可以实现上面所提到这种情况。如果“a..b..c..d..e”是到文档的路径，我们想要查找文档用“..b..c..d..”，主要做的事情是使操作快。如果我们是从最上面开始他是很容易的（就像在物化路径部分描述的）。一种方法是使用物化路径加上祖先数据，像这样：
　　{ path : ",a,b,c,d,e,",  ancestor : ['a','b','c','d','e'] }
　　我们可以在祖先上创建multikey索引。然后我们会做一个查找“..b,c,d..”路径上节点的查询，像这样：
　　find({ path : /,b,c,d,/, ancestor : 'd',  })
　　上面提到祖先的索引会被使用，仅仅“d”以下的文档会被检查。接下来尝试更好的依赖聪明的查询优化。
　　find( { path : /,b,c,d,/, ancestor : { $all : ['a','d'] }, ... } )

账号		自动登录	找回密码
密码			立即注册

wirelessnetview好用的无线分析工具

Red Hat RHCE 8 (EX294) Cert Guide

Shell从入门到精通（阿良）

亿图图示专家(EDraw Max) V7.9 中文破解版

zabbix3.4.1安装部署+微信推送信息+大屏显

Red Hat OpenShift I: Containers & Kubern

2025 年，C++ 还能“硬核”多久？

[经验分享] Trees in MongoDB

扫码加入运维网微信交流群