白森 发表于 2016-12-14 10:48:20

Solr添加IKAnalysis中文分词

  1.下载中文分词器IKAnalyzer
  地址:http://code.google.com/p/ik-analyzer/downloads/list
  2.修改schema.xml文件,加入以下配置:

<fieldType name="textik" class="solr.TextField" >
<analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>
<analyzer type="index">
<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="false"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="false"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
  然后定义需要使用中文分词功能的字段,比如我这里是title,代码如下:

<fields>
<field name="title" type="textik" indexed="true" stored="true" required="true" />
</fields>
  3. 将下载的IKAnalyzer目录下的IKAnalyzer3.2.8.jar放入 TOMCAT/webapps/该solr工程/WEB-INFO/lib 目录下
  4. 将下载的IKAnalyzer目录下的IKAnalyzer.cfg.xml和ext_stopword.dic文件放入 TOMCAT/webapps/该solr工程/classes 目录下,你也可以自己定义停用词字典,然后在IKAnalyzer.cfg.xml中进行配置,多个停用词字典之间用逗号隔开
  5. 重启tomcat,输入http://域名:端口号/该solr工程/admin/analysis.jsp,效果如下:
  

 
页: [1]
查看完整版本: Solr添加IKAnalysis中文分词