|
引用地址:http://api.mongodb.org/wiki/current/Trees%20in%20MongoDB.html
树结构存储最好的方式常常依赖于要执行的操作;下面讨论一下不同的存储方案。在实践中,许多开发人员找到了一些使用起来很方便的模式:“单文档存储整根树(Full Tree in single Document)”,“父连接(Parent Links)”和“祖先数组(Array of Ancestors)”。
1 模式
1.1 单文档存储整根树(Full Tree in Signle Document)
{
comments: [
{by: "mathias", text: "...", replies: []}
{by: "eliot", text: "...", replies: [
{by: "mike", text: "...", replies: []}
]}
]
}
优点:
- 单文档来获取每页
- 整根树存储在磁盘的同一位置
- 很容易查看整个树结构
缺点:
- 搜索困难
- 获取局部结果困难
- 对于大型树结构读取难处理,进一步来说,MongoDB v1.8对于文档大小有限制,16MB(未来版本可能会提高限制)。
1.2 (父连接)Parent Links
用单个集合来存储所有节点,每个节点包含他父节点的ID,是一种简单的解决方案。这种方法最大的问题是获取完整子树时需要查找多次数据库(或使用db.eval函数)。
> t = db.tree1;
> t.find()
{ "_id" : 1 }
{ "_id" : 2, "parent" : 1 }
{ "_id" : 3, "parent" : 1 }
{ "_id" : 4, "parent" : 2 }
{ "_id" : 5, "parent" : 4 }
{ "_id" : 6, "parent" : 4 }
> // find children of node 4
> t.ensureIndex({parent:1})
> t.find( {parent : 4 } )
{ "_id" : 5, "parent" : 4 }
{ "_id" : 6, "parent" : 4 }
1.3 (子链接)Child Links
另一种选择是在每个节点文档中存储所有子节点的ID。这个方法是有限制的,如果不操作完整子树是没有问题。他可能也是用于存储一个节点有多个父节点情况的最有效方法。
> t = db.tree2
> t.find()
{ "_id" : 1, "children" : [ 2, 3 ] }
{ "_id" : 2 }
{ "_id" : 3, "children" : [ 4 ] }
{ "_id" : 4 }
> // find immediate children of node 3
> t.findOne({_id:3}).children
[ 4 ]
> // find immediate parent of node 3
> t.ensureIndex({children:1})
> t.find({children:3})
{ "_id" : 1, "children" : [ 2, 3 ] }
1.4 (祖先数组)Array of Ancestors
在这种方法中将一个节点的所有祖先节点存储到一个数组中。这使得类似于“获取X节点的所有子节点”的操作快且容易。
> t = db.mytree;
> t.find()
{ "_id" : "a" }
{ "_id" : "b", "ancestors" : [ "a" ], "parent" : "a" }
{ "_id" : "c", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "d", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "e", "ancestors" : [ "a" ], "parent" : "a" }
{ "_id" : "f", "ancestors" : [ "a", "e" ], "parent" : "e" }
{ "_id" : "g", "ancestors" : [ "a", "b", "d" ], "parent" : "d" }
> t.ensureIndex( { ancestors : 1 } )
> // find all descendents of b:
> t.find( { ancestors : 'b' })
{ "_id" : "c", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "d", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "g", "ancestors" : [ "a", "b", "d" ], "parent" : "d" }
> // get all ancestors of f:
> anc = db.mytree.findOne({_id:'f'}).ancestors
[ "a", "e" ]
> db.mytree.find( { _id : { $in : anc } } )
{ "_id" : "a" }
{ "_id" : "e", "ancestors" : [ "a" ], "parent" : "a" }
ensureIndex和MongoDB的multikey特性可以使上面的查询更高效。
除了祖先数组,我们也存储了节点的直接父节点,使得查找节点的直接父节点更容易。
1.5 物化路径(Materialized Path[Full Path in Each Node))
物化路径使得对树的特定查询容易。我们在每个节点中存储文档在树中位置的全路径。通常情况下上面提到的“祖先数组”方法都工作很好;当不得不处理字符串建造、正则表达式,字符逃逸,物化路径更容易。(理论上,物化路径将会更快。)
MongoDB实现物化路径最好的方式是将路径存储成字符串,然后采用正则表达式查询。以“^”开头的正则表达可以被高效执行。把数据看作一个字符串,你需要选择一个分隔符,我们采用“,”。举例:
> t = db.tree
test.tree
> // get entire tree -- we use sort() to make the order nice
> t.find().sort({path:1})
{ "_id" : "a", "path" : "a," }
{ "_id" : "b", "path" : "a,b," }
{ "_id" : "c", "path" : "a,b,c," }
{ "_id" : "d", "path" : "a,b,d," }
{ "_id" : "g", "path" : "a,b,g," }
{ "_id" : "e", "path" : "a,e," }
{ "_id" : "f", "path" : "a,e,f," }
{ "_id" : "g", "path" : "a,b,g," }
> t.ensureIndex( {path:1} )
> // find the node 'b' and all its descendents:
> t.find( { path : /^a,b,/ } )
{ "_id" : "b", "path" : "a,b," }
{ "_id" : "c", "path" : "a,b,c," }
{ "_id" : "d", "path" : "a,b,d," }
{ "_id" : "g", "path" : "a,b,g," }
> // find the node 'b' and its descendents, where path to 'b' is not already known:
> nodeb = t.findOne( { _id : "b" } )
{ "_id" : "b", "path" : "a,b," }
> t.find( { path : new RegExp("^" + nodeb.path) } )
{ "_id" : "b", "path" : "a,b," }
{ "_id" : "c", "path" : "a,b,c," }
{ "_id" : "d", "path" : "a,b,d," }
{ "_id" : "g", "path" : "a,b,g," }
Ruby实例:http://github.com/banker/newsmonger/blob/master/app/models/comment.rb
嵌套数据集:
http://ar.rubyonrails.org/classes/ActiveRecord/Acts/NestedSet/ClassMethods.html
这种模式适用于很少需要修改数据集中文档的情况。
2 用例(Use Cases)
2.1 根据局部路径查找节点(Find Nodes by a Partial Path)
假设我们想要查找树中的给定节点,给出从树的某部分到这个点的路径,然后再到这个点,也许就向下面描述的一样。
用物化路径方法,我们可以实现上面所提到这种情况。如果“a..b..c..d..e”是到文档的路径,我们想要查找文档用“..b..c..d..”,主要做的事情是使操作快。如果我们是从最上面开始他是很容易的(就像在物化路径部分描述的)。一种方法是使用物化路径加上祖先数据,像这样:
{ path : ",a,b,c,d,e,", ancestor : ['a','b','c','d','e'] }
我们可以在祖先上创建multikey索引。然后我们会做一个查找“..b,c,d..”路径上节点的查询,像这样:
find({ path : /,b,c,d,/, ancestor : 'd', })
上面提到祖先的索引会被使用,仅仅“d”以下的文档会被检查。接下来尝试更好的依赖聪明的查询优化。
find( { path : /,b,c,d,/, ancestor : { $all : ['a','d'] }, ... } ) |
|