搜索引擎:
索引组件:获取数据-->建立文档-->文档分析-->文档索引(倒排索引)
搜索组件:用户搜索接口-->建立查询(将用户键入的信息转换为可处理的查询对象)-->搜索查询-->展现结果
索引组件:Lucene
搜索组件:Solr, ElasticSearch
注意:mysql数据库中的myisam引擎支持全文索引,但是格式比较复杂,不适于作为搜索
引擎的组件;
Lucene Core:
Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Solr:
SolrTM is a high performance search server built using Lucene Core, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface.
ElasticSearch:
Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.
Elastic Stack:
ElasticSearch
Logstash
Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite “stash.” (Ours is Elasticsearch, naturally.)
Beats:
Filebeat:Log Files
Metricbeat:Metrics
Packetbeat:Network Data
Winlogbeat:Windows Event Logs
Heartbeat:Uptime Monitoring
Kibana:
Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack, so you can do anything from learning why you're getting paged at 2:00 a.m. to understanding the impact rain might have on your quarterly numbers.
TF/IDF算法:
https://zh.wikipedia.org/wiki/Tf-idf
ES的核心组件:
物理组件:
集群:
状态:green, yellow, red
节点:
Shard:
Lucene的核心组件:
索引(index):数据库(database)
类型(type):表(table)
文档(Document):行(row)
映射(Mapping):
ElasticSearch 5的程序环境:
配置文件:
/etc/elasticsearch/elasticsearch.yml
/etc/elasticsearch/jvm.options
/etc/elasticsearch/log4j2.properties
Unit File:elasticsearch.service
程序文件:
/usr/share/elasticsearch/bin/elasticsearch
/usr/share/elasticsearch/bin/elasticsearch-keystore:
/usr/share/elasticsearch/bin/elasticsearch-plugin:管理插件程序
搜索服务:
9200/tcp
集群服务:
9300/tcp
els集群的工作逻辑:
多播、单播:9300/tcp
关键因素:clustername
所有节点选举一个主节点,负责管理整个集群的状态(green/yellow/red),以及各shards的分布方式;
插件:
elk实现框图:
注意:elk是由elastic stack search、logstash和kibana组成的,如图中间颜色比较暗的是elastic
stack search实现的部分,而下面的数据收集部分由logstash实现,最后kibana负责上方的图形搜
索界面接口;但是logstash数据收集器是由JRuby语言开发的,是用ruby语言先通过java解释器将
其翻译成java语言,之后进行编译执行,效率很低,故而出现了filebeat轻量级组件来代替它;
logstash是通过在每个要采集的日志服务器植入agent组件,一旦日志有变化就将改变的数据拉取
到logstash服务器进行数据的文档化,之后将文档化的数据交给elastic stack search集群进行相
关处理。由于基于lucene的solr搜索引擎在后期没有支持大数据分布式的存储,被elk所取代;
http://lucene.apache.org/ 将数据文档化之后数据形成索引的lucene网址
https://www.elastic.co/ elk访问地址,可以下载els镜像
https://db-engines.com/en/ 体现数据库地位的网址
elasticsearch集群: elasticsearch是由java开发的
准备工作:关闭防火墙、配置chrony时间同步、用本地文件进行dns解析
https://mirrors.cnnic.cn 清华大学的elastic stack search的镜像网站,下载速度快
yum install java-1.8.0-openjdk-devel -y
rpm -ivh elasticsearch-5.6.8.rpm java编写的
scp elasticsearch-5.6.8.rpm server2:/root/ 复制过去后进行rpm安装
scp elasticsearch-5.6.8.rpm server3:/root/
cd /etc/elasticsearch/
vim elasticsearch.yml
cluster.name: myels
node.name: server1
path.data: /els/data
path.logs: /els/logs 需要在外面创建目录,设置属组和属主为elasticsearch用户
network.host: 192.168.43.60
discovery.zen.ping.unicast.hosts: ["server1","server2","server3"]
discovery.zen.minimum_master_nodes: 1 2个节点正常就可以正常使用
vim jvm.options
-Xms1g 注意初始化值和最大值要相同
-Xmx1g
mkdir /els/{data,logs} -pv && chown -R elasticsearch.elasticsearch /els/*
scp elasticsearch.yml jvm.options server2:/etc/elasticsearch/
vim elasticsearch.yml
network.host: 192.168.43.63
node.name: server2
scp elasticsearch.yml jvm.options server3:/etc/elasticsearch/
vim elasticsearch.yml
network.host: 192.168.43.62
node.name: server3
java -version
systemctl daemon-reload && systemctl start elasticsearch
ss -ntl
curl http://server1:9200/ 看测试是否成功
tail /els/logs/myels.log 可以查看日志找错误
free -m 查看内存的大小,以便定虚拟机的初始化值
curl -XGET 'http://server1:9200/_cluster/health?pretty=true' 发起查询请求
集群配置:
elasticsearch.yml配置文件:
cluster.name: myels
node.name: node1
path.data: /data/els/data
path.logs: /data/els/logs
network.host: 0.0.0.0
http.port: 9200 9200端口是客户端用的,9300是集群内部进行通信的
discovery.zen.ping.unicast.hosts: ["node1", "node2", "node3"]
discovery.zen.minimum_master_nodes: 2
· node.attr.rack: r1 表示可以集群分片到不同的机架,以防止机架中交换机断网