Solr: SolrCloud

wslhs · 发表于 2016-12-14 09:08:29

Distributed indexing

　　Document shard assignment
　　A document is assigned to one and only one shard per collection. Solr uses a component called a document router to determine which shard a document should be assigned to. There are two basic document-routing strategies supported by SolrCloud: compositeId (default) and implicit.
　　Solr uses the MurmurHash algorithm, because it’s fast and creates an even distribution of hash values, which keeps the number of documents in each shard balanced (roughly).
　　

　　Adding documents
　　You can send update requests to any node in the cluster, and the request will be forwarded to the correct shard leader.
　　STEP 1: SEND THE UPDATE REQUEST USING CLOUDSOLRSERVER
　　STEP 2: ROUTE THE DOCUMENT TO THE CORRECT SHARD
　　STEP 3: LEADER ASSIGNS VERSION ID
　　STEP 4: FORWARD REQUEST TO REPLICAS
　　STEP 5: ACKNOWLEDGE WRITE SUCCESS
　　Near real-time search
　　NRTmakes documents visible in search results within seconds of their being indexed,hence the use of the near qualifier. To allow documents to be visible in NRT, Solr provides a soft commit mechanism, which skips the costly aspects of hard commits, such as flushing documents stored in memory to disk.
　　

cache autowarming settings and warming queries must execute faster than your soft commit frequency.
　　Although NRT search is a powerful feature, you do not have to use it with SolrCloud. It’s perfectly acceptable to not use soft commits, and we recommend not using them unless you really need indexed documents to be visible in near real-time. Do not feel like you must use NRT search when using SolrCloud. One of the drawbacks to using soft commits is that your caches are constantly being invalidated
　　Node recovery process
　　SolrCloud supports two basic recovery scenarios: peer sync and snapshot replication. The recovery process for these two scenarios is differentiated by how many update requests (add, delete, update) the recovering node missed while it was offline.

Peer sync—If the outage was short-lived and the recovering node missed only a few updates, it will recover by pulling updates from the shard leader’s update log. The upper limit on missed updates is currently hardcoded to 100. If the number of missed updates exceeds this limit, the recovering node pulls a full index snapshot from the shard leader.
Snapshot replication—If a node is offline for an extended period of time such that it becomes too far out of sync with the shard leader, it uses Solr’s HTTP－based replication, based on the snapshot of the index.

　　－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

Distributed search

　　Once you shard your index, you have a new problem: you must query all shards to get a complete result set. Querying across all shards in a collection to create a unified result set is known as a distributed query. The distrib parameter determines if a query is distributed or local; when SolrCloud mode is enabled, distrib defaults to true.
　　Multistage query process
　　Distributed queries work differently than nondistributed queries because Solr needs to gather results for all shards, then merge the results into a single response to the client. Solr uses a multistage query process to execute distributed queries.
　　

　　STEP 1: CLIENT SENDS QUERY TO ANY NODE
　　STEP 2: QUERY CONTROLLER RECEIVES REQUEST
　　STEP 3: QUERY STAGE
　　STEP 4: GET FIELDS STAGE
　　Distributed search limitations
　　Unfortunately, not all Solr query features work in distributed mode. Specifically, there are three main limitations you should be aware of:

Inverse document frequency (idf) is based on the frequency of a term in the local index only. It is used when scoring documents, so there can be some bias introduced when ranking documents in a distributed query. Because documents are randomly distributed across shards (by default), the idf for a term in shard1 is typically close to the idf for a term across all shards.
Joins do not work in distributed mode unless you use the custom hashing solution.
In order to use Solr’s grouping functionality in SolrCloud, you need to use custom hashing to collocate documents that will be collapsed into the same group.

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] Solr: SolrCloud

浏览过的版块

扫码加入运维网微信交流群