elasticsearch parent-child

搜ijsio · 发表于 2017-5-20 11:59:35

介绍下ElasticSearch里Parent-Child特性的使用。

//首先创建一系列新闻的索引，这里我们将hot类型作为parent-chid关系里面的parent。

curl -XPUT 'http://localhost:9200/news/hot/1'  -d '{ "uname" : "medcl", "content" : "河南警方：“南阳老板遭逼供致残与狗同笼”纯属谎言"}'
curl -XPUT 'http://localhost:9200/news/hot/2'  -d '{ "uname" : "medcl", "content" : "马英九打两岸牌反制绿营"}'
curl -XPUT 'http://localhost:9200/news/hot/3'  -d '{ "uname" : "medcl", "content" : "专题：中共十七届六中全会公报"}'

//假设我们对每个新闻的评论也分别建立索引，那么新闻的评论和新闻之间会存在一个关系，一篇新闻肯定是存在多个评论，那么我们将评论comment存一个索引，并且和前面的hot新闻索引建立parent-chid关系，评论在这里为child。

//将索引类型hot与comment之间建立parent-child的mapping关系

curl -XPUT 'http://localhost:9200/news/comment/_mapping' -d '{ "comment" : {       "_parent" : {          "type" : "hot"       } }}'

//直接试试创建一个comment评论看看，发现已经不行了，原来指定了类型拥有parent-child关系之后，必须要带上parent参数

curl -XPUT 'http://localhost:9200/news/comment/1' -d '
{ "uname" : "凤凰网安徽省六安市网友", "content" : "河南警方的话没人信"}'
{"error":"RoutingMissingException[routing is required for [news]/[comment]/[1]]"
,"status":500}

正确的方式创建几条评论，并且和前面的第一条新闻建立关系,如下：

curl -XPUT 'http://localhost:9200/news/comment/1?parent=1' -d '{ "uname" : "凤凰网安徽省六安市网友", "content" : "河南警方的话没人信"}'
curl -XPUT 'http://localhost:9200/news/comment/2?parent=1' -d '{ "uname" : "凤凰网湖北省武汉市网友", "content" : "没有监督权\n没有调查\n一切当然只能是谣言"}'
curl -XPUT 'http://localhost:9200/news/comment/3?parent=1' -d '{ "uname" : "ladygaga", "content" : "双下肢不活动，存在废用性肌肉萎缩。请问，这个是怎么做到的？"}'
curl -XPUT 'http://localhost:9200/news/comment/4?parent=1' -d '{ "uname" : "medcl", "content" : "额"}'

我们使用has_child查询一下：

curl -XGET 'http://localhost:9200/news/_search' -d '{
"query" : {
      "has_child" : {
         "type" : "comment",
         "query" :{"term" : {"uname" : "medcl"} }
      }
}
}'

结果如下：

{
"took": 10,
"timed_out": false,
"_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
},
"hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "news",
            "_type": "hot",
            "_id": "1",
            "_score": 1,
            "_source": {
                  "uname": "medcl",
                  "content": "河南警方：“南阳老板遭逼供致残与狗同笼”纯属谎言"
            }
         }
      ]
}
}

注意，如果是中文可能需要调整analyzer，默认查询不出来

curl -XGET 'http://localhost:9200/news/_search' -d '{
"query" : {
      "has_child" : {
         "type" : "comment",
         "query" :{"term" : {"uname" : "凤凰网湖北省武汉市网友"} }
      }
}
}'

结果：

{
"took": 0,
"timed_out": false,
"_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
},
"hits": {
      "total": 0,
      "max_score": null,
      "hits": []
}
}

原因分析：
hot类型没有指定使用的分析器，所以中文默认的standard analyzer拆成了一个个的字，但是has_child模式里面的DSL的term query为区配查询，所以查询不出来

curl -XGET 'http://localhost:9200/news/_search' -d '{
"query" : {
      "has_child" : {
         "type" : "comment",
         "query" :{"term" : {"uname" : "凤"} }
      }
}
}'

试一下，果然如此，所以mapping的配置还是很重要的。

那怎样解决呢？两种方案
1.给hot类型mapping设置analyzer为not_analyzer（在这里不适用）
2.设置查询时使用的分析器，那就不能使用term query了，term query不支持分析器，我们这里使用可以text query

curl -XGET 'http://localhost:9200/news/_search' -d '{
"query" : {
      "has_child" : {
         "type" : "comment",
         "query" :{"text" : {"uname" : "凤凰网"} }
      }
}
}'

当然中文分词用standard analyzer肯定是不行的，你可以灵活替换成其他的中文分析器。
关于text query的其他参数，可以看这里：http://www. elasticsearch .org/guide/reference/query-dsl/text-query.html

ok，我们再试试top_children查询（http://www.elasticsearch.org/guide/reference/query-dsl/top-children-query.html）
topchildren在child进行查询的时候，会预估一个hit size，然后对这个hit size大小的hit结果与parent做查询结果合并聚集，如果合并之后的结果文档数小于查询条件中的from/size参数设置，es则会进行更深度的搜索，这样做的好处是显而易见的，它不会对所有的child索引文档进行处理，可以节省一些额外的开销，但是注意由此造成的total_hits值的不准确。

调用如下：

curl -XGET 'http://localhost:9200/news/_search' -d '{
"query": {
      "top_children": {
         "type": "comment",
         "query": {
            "text": {
                  "uname": "凤凰网"
            }
         },
         "score": "max",
         "factor": 5,
         "incremental_factor": 2
      }
}
}'

结果：

{
"took": 30,
"timed_out": false,
"_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
},
"hits": {
      "total": 1,
      "max_score": 0.60276437,
      "hits": [
         {
            "_index": "news",
            "_type": "hot",
            "_id": "1",
            "_score": 0.60276437,
            "_source": {
                  "uname": "medcl",
                  "content": "河南警方：“南阳老板遭逼供致残与狗同笼”纯属谎言"
            }
         }
      ]
}
}

有了parent-chid这个特性，我们可以做很多事情，比如，如果要给索引数据加上权限，一般来说索引内容本身更新不是很频繁，但是权限信息更新很频繁，我们也可以采用parent-child这种方式来做，如下：

//建立权限子索引

curl -XPUT 'http://localhost:9200/news/role/_mapping' -d '{ "role" : {       "_parent" : {          "type" : "hot"       } }}'

//创建每个新闻的用户权限索引记录

curl -XPUT 'http://localhost:9200/news/role/1?parent=1' -d '{ "uid" : "1001"}'
curl -XPUT 'http://localhost:9200/news/role/2?parent=2' -d '{ "uid" : "1001"}'
curl -XPUT 'http://localhost:9200/news/role/3?parent=3' -d '{ "uid" : "1003"}'

//执行查询，返回uid为1001可以查看的新闻集合

curl -XGET 'http://localhost:9200/news/_search' -d '{
"query": {
      "top_children": {
         "type": "role",
         "query": {
            "text": {
                  "uid": "1001"
            }
         },
         "score": "max",
         "factor": 5,
         "incremental_factor": 2
      }
}
}'

结果：

{
"took": 10,
"timed_out": false,
"_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
},
"hits": {
      "total": 2,
      "max_score": 2.098612,
      "hits": [
         {
            "_index": "news",
            "_type": "role",
            "_id": "1",
            "_score": 2.098612,
            "_source": {
                  "uid": "1001"
            }
         },
         {
            "_index": "news",
            "_type": "role",
            "_id": "2",
            "_score": 1,
            "_source": {
                  "uid": "1001"
            }
         }
      ]
}
}

//上面通过用户可以直接对parent索引进行数据的过滤，但是往往我们还需要对parent的其他条件进行查询，怎么做呢？
首先ES支持filtered query,对应lucene的filteredquery，主要就是说先查询，得到一个结果集，然后对这个结果集执行filtered查询再过滤一把，es说明在这里（http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query.html）

下面是一些查询的例子，方便参考：

curl -XGET 'http://localhost:9200/news/_search' -d '{
"query": {
      "filtered": {
         "query": {
            "term": {
                  "content": "河"
            }
         },
         "filter": {
            "or": [
                  {
                     "has_child": {
                        "type": "role",
                        "query": {
                              "term": {
                                 "uid": "1001"
                              }
                        }
                     }
                  },
                  {
                     "has_child": {
                        "type": "comment",
                        "query": {
                              "term": {
                                 "uname": "ladygaga"
                              }
                        }
                     }
                  }
            ]
         }
      }
}
}'

curl -XGET 'http://localhost:9200/news/_search' -d '{
"query":

{
"filtered" : {
      "query" : {
         "top_children": {
                  "type": "role",
                  "query": {
                     "term": {
                        "uid": "1001"
                     }
                  },
                  "score": "max",
                  "factor": 5,
                  "incremental_factor": 2
            }
      },
      "filter" : {
            "fquery" : {
            "query" : {
                  "query_string" : {
                     "query" : "content:马"
                  }
            },
            "_cache" : true
      }
}
}}}'

curl -XGET 'http://localhost:9200/news/_search' -d '{
"query":

{
"filtered" : {
      "query" : {
                  "query_string" : {
                     "query" : "uname:medcl"
                  }

      },
      "filter" : {
            "has_child": {
                  "type": "role",
                  "query": {
                     "term": {
                        "uid": "1001"
                     }
                  },
                  "score": "max",
                  "factor": 5,
                  "incremental_factor": 2
            }
}
}}}'

curl -XGET 'http://localhost:9200/news/_search' -d '{
  "query": {
"bool": {
   "must": [
      {
      "top_children": {
         "type": "role",
         "query": {
            "term": {
            "uid": 1001
            }
         }
      }
      },
      {
      "text": {
         "uname": "medcl"
      }
      }
   ]
}
  }
}'

curl -XGET 'http://localhost:9200/news/_search' -d '{
  "query": {
"bool": {
   "must": [
      {
      "top_children": {
         "type": "role",
         "query": {
            "term": {
            "uid": 1001
            }
         }
      }
      },
      {
      "text": {
         "content": "马英九"
      }
      }
   ]
}
  }
}'

需要注意的是，parent-chid这种特性虽好，但是也要用对地方，服务端会将所有的_id加载到内存中进行关联和过滤，所以必须保证内存足够大才行。

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] elasticsearch parent-child

浏览过的版块

扫码加入运维网微信交流群