Solr 同义词搜索 synonyms

zhaolu · 发表于 2016-12-15 06:07:02

Solr同义词搜索是一个很好的功能实现，解决了产品需求中很大的问题，如：搜索用户搜索"刮胡刀" 更好的展示结果是把 "刮胡刀"跟"剃须刀"都显示给用户，这样就可以达到更好的效果。下面讲下具体实现: solr.SynonymFilterFactory

Creates SynonymFilter

Matches strings of tokens and replaces them with other strings of tokens.

The synonyms parameter names an external file defining the synonyms.
If ignoreCase is true, matching will lowercase before checking equality.
If expand is true, a synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list.
The optional tokenizerFactory parameter names a tokenizer factory class to analyze synonyms (see https://issues.apache.org/jira/browse/SOLR-319 ), which can help with the synonym+stemming problem described in http://search-lucene.com/m/hg9ri2mDvGk1 .

schema.xml配置

<fieldTypename="text"class="solr.TextField"positionIncrementGap="100"><analyzertype="index"><tokenizerclass="solr.ChineseTokenizerFactory"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"tokenizerFactory="solr.ChineseTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"enablePositionIncrements="true"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="1"catenateNumbers="1"catenateAll="0"splitOnCaseChange="0"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.RemoveDuplicatesTokenFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.ChineseTokenizerFactory"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"tokenizerFactory="solr.ChineseTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"enablePositionIncrements="true"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="0"catenateNumbers="0"catenateAll="0"splitOnCaseChange="1"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.RemoveDuplicatesTokenFilterFactory"/></analyzer></fieldType>
synonyms.txt配置

# blank lines and lines starting with pound are comments.  #Explicit mappings match any token sequence on the LHS of "=>"#and replace with all alternatives on the RHS.  These types of mappings  #ignore the expand parameter in the schema.  #Examples:  #-----------------------------------------------------------------------  #some test synonym mappings unlikely to appear in real input text
aaafoo => aaabar
bbbfoo => bbbfoo bbbbar
cccfoo => cccbar cccbaz
fooaaa,baraaa,bazaaa
# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television,Televisions, TV,TVs#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming  #after us won't split it into two words.  飞利浦刮胡刀,飞利浦剃须刀# Synonym mappings can be used for spelling correction too
pixima => pixma
a\,a => b\,b

账号		自动登录	找回密码
密码			立即注册

wirelessnetview好用的无线分析工具

Red Hat RHCE 8 (EX294) Cert Guide

Shell从入门到精通（阿良）

亿图图示专家(EDraw Max) V7.9 中文破解版

zabbix3.4.1安装部署+微信推送信息+大屏显

Red Hat OpenShift I: Containers & Kubern

2025 年，C++ 还能“硬核”多久？

[经验分享] Solr 同义词搜索 synonyms

扫码加入运维网微信交流群