zhaolu 发表于 2016-12-15 06:07:02

Solr 同义词搜索 synonyms

Solr同义词搜索是一个很好的功能实现,解决了产品需求中很大的问题,如:搜索用户搜索"刮胡刀" 更好的展示结果是把 "刮胡刀"跟"剃须刀"都显示给用户,这样就可以达到更好的效果。下面讲下具体实现: solr.SynonymFilterFactory

Creates SynonymFilter

Matches strings of tokens and replaces them with other strings of tokens.


[*]The synonyms parameter names an external file defining the synonyms.
[*]If ignoreCase is true, matching will lowercase before checking equality.
[*]If expand is true, a synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list.
[*]The optional tokenizerFactory parameter names a tokenizer factory class to analyze synonyms (see https://issues.apache.org/jira/browse/SOLR-319 ), which can help with the synonym+stemming problem described in http://search-lucene.com/m/hg9ri2mDvGk1 .

schema.xml配置

<fieldTypename="text"class="solr.TextField"positionIncrementGap="100"><analyzertype="index"><tokenizerclass="solr.ChineseTokenizerFactory"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"tokenizerFactory="solr.ChineseTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"enablePositionIncrements="true"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="1"catenateNumbers="1"catenateAll="0"splitOnCaseChange="0"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.RemoveDuplicatesTokenFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.ChineseTokenizerFactory"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"tokenizerFactory="solr.ChineseTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"enablePositionIncrements="true"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="0"catenateNumbers="0"catenateAll="0"splitOnCaseChange="1"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.RemoveDuplicatesTokenFilterFactory"/></analyzer></fieldType>
synonyms.txt配置

# blank lines and lines starting with pound are comments.#Explicit mappings match any token sequence on the LHS of "=>"#and replace with all alternatives on the RHS.These types of mappings#ignore the expand parameter in the schema.#Examples:#-----------------------------------------------------------------------#some test synonym mappings unlikely to appear in real input text
aaafoo => aaabar
bbbfoo => bbbfoo bbbbar
cccfoo => cccbar cccbaz
fooaaa,baraaa,bazaaa
# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television,Televisions, TV,TVs#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming#after us won't split it into two words.飞利浦刮胡刀,飞利浦剃须刀# Synonym mappings can be used for spelling correction too
pixima => pixma
a\,a => b\,b
页: [1]
查看完整版本: Solr 同义词搜索 synonyms