elasticsearch 的mapping定义

hb120973135 · 发表于 2017-5-20 13:51:57

elasticsearch 的mapping
例子一:
订单号如：ATTS000928732 类型不分词。 index: not_analyzed
订单号是全部数据如： 63745345637 这样的分词是可以的。

PUT /Order_v5
{
"settings": {
//设置10个分片，理解为类似数据库中的表分区中一个个分区的概念，不知道是否妥当
"number_of_shards": 10
},
"mappings": {
"trades": {
"_id": {
"path": "id"
},
"properties": {
"id": {
"type": "integer",
//id：自增数字
//要求：查询
"store" : true
},
"name": {
//名称：佳洁士,强生婴儿沐浴露，100w Led节能灯，户外多功能折叠椅等
//要求：抓住关键字，如:佳洁士+牙膏 or 牙刷； 强生+沐浴露; led+节能+100W; 户外+折叠椅等
//结论：如果分词，就意味着产品品牌名词可能被拆分，如 "佳洁士", 如果不分词就意味着对用户输入要求匹配度高。先默认分词，试试看看。
"type": "string"
},
"brand": { //品牌： PG，P&G,宝洁集团，宝洁股份，联想集团，联想电脑等
"type": "string"
},
"orderNo": { //订单号：如ATTS000928732
"type": "string",
"index": "not_analyzed"
},
"description": {
//描述： 2015款玫瑰香型强生婴儿沐浴露，550ml，包邮
//搜索：要求高亮所以设置store:true. 关键词权重：沐浴露 -> {强生+沐浴露 or 玫瑰花 + 沐浴露 or 550ml + 沐浴露 or 沐浴露 + 包邮->
{2015年 + 玫瑰香...}}
//设置：必须分词，而且要控制好
"type": "string"， 
"sort": true
},
"date": {
"type": "date"
},
"city": {
"type": "string"
},
"qty": { // index无效
"type": "float"
},
"price": {
//价格： float index无效
"type": "float"
}
}
}
}
}

例子二
定义mapping
在添加索引的mapping时就可以这样定义分词器
{
"page":{
 "properties":{
 "title":{
 "type":"string",
 "indexAnalyzer":"ik",
 "searchAnalyzer":"ik"
 },
 "content":{
 "type":"string",
 "indexAnalyzer":"ik",
 "searchAnalyzer":"ik"
 }
 }
}
}
indexAnalyzer为索引时使用的分词器，searchAnalyzer为搜索时使用的分词器。
java mapping代码如下：
XContentBuilder content = XContentFactory.jsonBuilder().startObject()
 .startObject("page")
 .startObject("properties")
 .startObject("title")
 .field("type", "string")
 .field("indexAnalyzer", "ik")
 .field("searchAnalyzer", "ik")
 .endObject()
 .startObject("code")
 .field("type", "string")
 .field("indexAnalyzer", "ik")
 .field("searchAnalyzer", "ik")
 .endObject()
 .endObject()
 .endObject()
 .endObject()

测试分词可用调用下面api，注意indexname为索引名，随便指定一个索引就行了
http://localhost:9200/indexname/_analyze?analyzer=ik&text=测试elasticsearch分词器
elasticsearch中的mapping映射配置与查询典型案例
elasticsearch中的mapping映射配置示例
比如要搭建个中文新闻信息的搜索引擎，新闻有"标题"、"内容"、"作者"、"类型"、"发布时间"这五个字段；
我们要提供"标题和内容的检索"、"排序"、"高亮"、"统计"、"过滤"等一些基本功能。
ES提供了smartcn的中文分词插件，测试的话建议使用IK分词插件。
内容中properties对应mapping里的内容，里面5个字段。
type指出字段类型、内容、标题字段要进行分词和高亮因此要设置分词器和开启term_vector。
{
 "news": {
"properties": {
 "content": {#内容
 "type": "string", #字段类型
 "store": "no", #是否存储
 "term_vector": "with_positions_offsets",#开启向量，用于高亮
 "index_analyzer": "ik",#索引时分词器
 "search_analyzer": "ik"#搜索时分词器
 },
 "title": {
 "type": "string",
 "store": "no",
 "term_vector": "with_positions_offsets",
 "index_analyzer": "ik",
 "search_analyzer": "ik",
 "boost": 5
 },
 "author": {
 "type": "string",
 "index": "not_analyzed"#该字段不分词
 },
 "publish_date": {
 "type": "date",
 "format": "yyyy/MM/dd",
 "index": "not_analyzed"#该字段不分词
 },
 "category": {
 "type": "string",
 "index": "not_analyzed"#该字段不分词
 }
}
 }
}
查询示例：内容包括几个部分：
分页:from/size、字段:fields、排序sort、查询:query、过滤:filter、高亮:highlight、统计:facet
{
 "from": 0,
 "size": 10,
 "fields": [
"title",
"content",
"publish_date",
"category",
"author"
 ],
 "sort": [
{
 "publish_date": {
 "order": "asc"
 }
},
"_score"
 ],
 "query": {
"bool": {
 "should": [
 {
 "term": {
 "title": "中国"
 }
 },
 {
 "term": {
 "content": "中国"
 }
 }
 ]
}
 },
 "filter": {
"range": {
 "publish_date": {
 "from": "2010/07/01",
 "to": "2010/07/21",
 "include_lower": true,
 "include_upper": false
 }
}
 },
 "highlight": {
"pre_tags": [
 "<tag1>",
 "<tag2>"
],
"post_tags": [
 "</tag1>",
 "</tag2>"
],
"fields": {
 "title": {},
 "content": {}
}
 },
 "facets": {
"cate": {
 "terms": {
 "field": "category"
 }
}
 }
}
结果包含需要的几个部分。
值得注意的是，facet的统计是命中的结果进行统计，filter是对结果进行过滤，filter不会影响facet，如果要统计filter掉的的就要使用filter facet。

账号		自动登录	找回密码
密码			立即注册

wirelessnetview好用的无线分析工具

Red Hat RHCE 8 (EX294) Cert Guide

Shell从入门到精通（阿良）

亿图图示专家(EDraw Max) V7.9 中文破解版

zabbix3.4.1安装部署+微信推送信息+大屏显

Red Hat OpenShift I: Containers & Kubern

2025 年，C++ 还能“硬核”多久？

[经验分享] elasticsearch 的mapping定义

浏览过的版块

扫码加入运维网微信交流群