设为首页 收藏本站
查看: 799|回复: 0

[经验分享] [ solr扩展 ] Different ways to implement autosuggest using SOLR

[复制链接]

尚未签到

发表于 2015-7-17 10:29:54 | 显示全部楼层 |阅读模式
  转载地址:http://knowlspace.wordpress.com/2011/06/15/different-ways-to-implement-autosuggest-using-solr/
  
  There are currently five techniques that can be used to create an auto-suggest functionality:
  1- The TermsComponent
2- Facet Prefixes
3- The new Suggester component
4- Edge N-Grams
5- Wildcard queries.
  TermsComponent
  Implementing an autosuggest with the TermsComponent is probably the easiest way of doing it. The TermsComponent is a low level Solr component that returns all the terms indexed for one field for all the documents in the index. It also contains a parameter called “terms.prefix”, which restricts the terms returned by the component to only those that start with that prefix. So, using this component for autosuggest is as easy as querying it, setting the value of “terms.prefix” to the text entered by the user.
  Unfortunately this has big limitations. First, this component will show the “indexed” terms, and not the stored, so an extra field with no analysis should be used for it. But there is another problem related to this. If the term indexed is “Vostro”, and the user enters “vos” (with lowercase), then the terms component won’t return “Vostro” as it starts with upper case.
  Faceting to suggest
Faceting is sometimes used for autosuggesting. The idea is similar to the TermsComponent approach, as faceting also has a “facet.prefix” parameter. By faceting on a field that contains the product names and using the facet.prefix parameter with the user entered text, the returned facets could be the suggestions. Unfortunately, this approach suffers the same problems as the TermsComponent approach.
  Suggester
This is a new component available in version 3.1 of Solr. Suggester reuses much of the SpellCheckComponent infrastructure, so it also reuses many common SpellCheck parameters, such as spellcheck=true or spellcheck.build=true, etc. The way this component is configured in solrconfig.xml is also very similar. It is technically a spellchecker but instead of correcting misspelled words it returns a list of suggested words.
  It was developed with performance and versatility in mind. The other approaches weren’t thought as suggestion components in the first place but components that may be used to implement the autosuggest use case. The Suggester is a component made from scratch.The suggester obtains the suggestions from an external dictionary or a field.
  Edge Ngrams
Edge Ngrams are substrings of the term that contain the first letters of it. For example the Edge Ngrams of the term “house” are “h”, “ho”, “hou”, “hous” and “house”
The idea is to associate each of this Ngrams with the full word. Usually this is accomplished with a specialized field for the suggestions with a special analysis. Suggestion of text with multiple words can be easily accomplished using this approach.
For this example, the user is searching for discs, and the system should recommend “Dark side of the moon” when the user begins to type “side”. For this , the schema of the recommendation index would consist of an Edge Ngrams field, that is, a field that at least has the following filters:
  Whitespace tokenizer
Lowercase filter
Edge Ngrams filter
  Applying this chain to the title of that disk will produce:
Original text: Dark side of the moon
Whitespace tokenizer: Dark | side | of | the | moon
Lowecase filter: dark | side | of | the | moon
EdgeNgrams filter: d | da | dar | dark | s | si | sid | side | o | of | t | th | the | m | mo | moo | moon
  The best way of implementing this approach for this example is to add an extra field named “edge_title” or similar, that must be indexed with the analysis chain provided above (not necessarily stored if the title is being stored on other field). The auto-suggest should issue queries like:
…&q=edge_title:[user-entered-text]&fl=title
The query analysis chain to apply should be the same as in the indexing phase, except for the EdgeNgrams filter that should not be applied in the query.
There is a drawback with this approach that is the disk space usage. When using edge-ngrams, the index will grow significantly.
  Execute Wildcard queries
There are two problems with this approach. Wildcard queries are not as fast as regular queries. Autosuggestion must be fast, and with a relatively large index, this approach wont probably achieve the necessary speed.
The other big issue with this approach is the analysis. When a query contains wildcards, Solr don’t analyze it. So, if there is a small difference between the text entered by the user and the indexed text (case, etc), Solr won’t suggest that document, even when the user enters the text correctly. In the first example, if the user enters “Vostro” or “Dell”, Solr won’t suggest “Dell Vostro”, as that field was lower-cased on index time.
One advantage of this approach against all the others is that when the user enters a part of the word, which is not the first part of it, like “str”, “Dell Vostro” could be suggested.

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-87576-1-1.html 上篇帖子: 关于solr异常:org.apache.solr.client.solrj.SolrServerException: IOException occured w 下篇帖子: solr建立pdf/word/excel索引的方法
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表