ElasticSearch入门学习

lmwtzw6u5l0 发表于 2019-1-29 08:16:15

基于 ElasticSearch-6.1.2

关于文档元数据
参考官方权威指南文档元数据
一个文档有三个必须的元数据元素：

[*]　　_index：表示文档存放在哪个 index 中；
[*]　　_type：文档表示的对象类型；
[*]　　_id：文档的唯一标识；
1. 索引新文档

通过使用 index API，使得文档可以被索引 -- 即存储文档，以及使得文档可以被搜索。

1.1 使用自定义的ID

以下为索引一篇blog的例子，其中：index 为 website，类型为 blog，自定义的ID为 123，

PUT /website/blog/123 HTTP/1.1
{ "date": "2014/01/01",
"text": "Still trying this out...",
"title": "My second blog entry"}

ES 的响应体如下：

HTTP/1.1 201 Created
Location: /website/blog/123
content-encoding: gzip
content-length: 143
content-type: application/json; charset=UTF-8
{ "_id": "123",
"_index": "website",
"_primary_term": 1,
"_seq_no": 0,
"_shards": {    "failed": 0,
   "successful": 1,
   "total": 2
},
"_type": "blog",
"_version": 1,
"result": "created"
}

在 ES 中，每个文档都会有个版本号（响应中的 _version 字段），每次修改和删除，_version 都会自增。

1.2 使用 ES 自动生成的ID

采用 POST 提交索引请求：

POST /website/blog/ HTTP/1.1
{ "date": "2014/01/01",
"text": "Still trying this out...",
"title": "My second blog entry"
}

如下为 ES 的响应：

HTTP/1.1 201 Created
Location: /website/blog/UALcG2EBr-dnzPFB0zH1
content-encoding: gzip
content-length: 165
content-type: application/json; charset=UTF-8
{ "_id": "UALcG2EBr-dnzPFB0zH1",
"_index": "website",
"_primary_term": 1,
"_seq_no": 0,
"_shards": {    "failed": 0,
   "successful": 1,
   "total": 2
},
"_type": "blog",
"_version": 1,
"result": "created"}

除了 _id 是 ES 自动生成的之外，其他响应字段都和上面的类似。

自动生成的 ID 是 URL-safe、基于 Base64 编码且长度为20个字符的 GUID 字符串。这些 GUID 字符串由可修改的 FlakeID 模式生成，这种模式允许多个节点并行生成唯一 ID ，且互相之间的冲突概率几乎为零。

2. 检索文档
2.1 检索指定ID的文档
查询 ID 为 123 的 blog 的请求：

GET /website/blog/123?pretty HTTP/1.1请求后面的 pretty 参数使得 ES 在输出时调用 prety-print 功能，使得 JSON 响应体更加可读。

ES 响应如下：

HTTP/1.1 200 OK
content-encoding: gzip
content-length: 173
content-type: application/json; charset=UTF-8
{ "_id": "123",
"_index": "website",
"_source": {    "date": "2014/01/01",
   "text": "Still trying this out...",
   "title": "My second blog entry"
},
"_type": "blog",
"_version": 1,
"found": true}

响应体中的 {"found": true} 表示文档已经检索到，如果没有指定的文档，则会返回 found = false，如下：

GET /website/blog/124?pretty HTTP/1.1
HTTP/1.1 404 Not Found
content-encoding: gzip
content-length: 87
content-type: application/json; charset=UTF-8
{ "_id": "124",
"_index": "website",
"_type": "blog",
"found": false}

2.2 返回文档的一部分
如下只返回 blog 的标题字段，而不是默认的返回所有字段：

GET /website/blog/123?pretty&_source=title HTTP/1.1
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 136
content-type: application/json; charset=UTF-8
{ "_id": "123",
"_index": "website",
"_source": {    "title": "My second blog entry"
},
"_type": "blog",
"_version": 1,
"found": true}

2.3 只返回文档内容，不需要返回元信息
GET /website/blog/123/_source?pretty HTTP/1.1
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 113
content-type: application/json; charset=UTF-8
{ "date": "2014/01/01",
"text": "Still trying this out...",
"title": "My second blog entry"}　　2.3 检查文档是否存在

　　使用 HEAD 请求，只返回一个 HTTP 请求报文头：

HEAD /website/blog/123 HTTP/1.1　　如果文档存在，则返回一个200 OK 的状态码：

HTTP/1.1 200 OKcontent-length: 186
content-type: application/json; charset=UTF-8
　　如果不存在，则返回一个 404 Not Found 的状态码：

HTTP/1.1 404 Not Found
content-length: 61
content-type: application/json; charset=UTF-8
　　3，更新文档
　　ES 中的文档是 *不可改变* 的，不能修改的，如果需要更新现有的文档，需要重建索引或者进行替换。

　　PUT /website/blog/123 HTTP/1.1
　　Accept: application/json, */*
　　Accept-Encoding: gzip, deflate
　　Connection: keep-alive
　　Content-Length: 117
　　Content-Type: application/json
　　Host: localhost:9200
　　{
　　"date": "2014/01/02",
　　"text": "I am starting to get the hang of this...",
　　"title": "My first blog entry"
　　}
　　如下为 ES 的响应体：
　　HTTP/1.1 200 OK
　　content-encoding: gzip
　　content-length: 143
　　content-type: application/json; charset=UTF-8
　　{
　　"_id": "123",
　　"_index": "website",
　　"_primary_term": 2,
　　"_seq_no": 1,
　　"_shards": {
　　"failed": 0,
　　"successful": 1,
　　"total": 2
　　},
　　"_type": "blog",
　　"_version": 2,
　　"result": "updated"
　　}
　　如上可以看到 _version 字段自增了。在内部，Elasticsearch 已将旧文档标记为已删除，并增加一个全新的文档。尽管你不能再对旧版本的文档进行访问，但它并不会立即消失。当继续索引更多的数据，Elasticsearch 会在后台清理这些已删除文档。
　　4，删除文档
　　使用 DELETE 方法来删除文档。

　　DELETE /website/blog/123 HTTP/1.1
　　Accept: */*
　　Accept-Encoding: gzip, deflate
　　Connection: keep-alive
　　Content-Length: 0
　　Host: localhost:9200
　　ES 返回的响应体如下：
　　HTTP/1.1 200 OK
　　content-encoding: gzip
　　content-length: 143
　　content-type: application/json; charset=UTF-8
　　{
　　"_id": "123",
　　"_index": "website",
　　"_primary_term": 2,
　　"_seq_no": 2,
　　"_shards": {
　　"failed": 0,
　　"successful": 1,
　　"total": 2
　　},
　　"_type": "blog",
　　"_version": 3,
　　"result": "deleted"
　　}

页: [1]

运维网's Archiver

ElasticSearch入门学习