Solr5.0配置中文分词包

wfkjxy · 发表于 2017-3-2 10:20:27

　　Solr中默认的中文分词是用Lucene的一元分词包。现在说明在Solr5.0中配置Lucene的SmartCN中文分词包。
　　1，进入Solr的安装目录，我这里是：/root/nutch/solr-5.0.0
把contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-5.0.0.jar包复制到solr的启动目录的lib目录下。

# cp ./contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-5.0.0.jar ./server/solr-webapp/webapp/WEB-INF/lib/

　　2，修改managed-schema配置文件。此文件在/root/nutch/solr-5.0.0/server/solr/mycore1/conf目录下，其中mycore1是建立的core名称。
在/root/nutch/solr-5.0.0/server/solr/mycore1/conf/managed-schema配置文件中添加如下信息：#vi managed-schema

　　
在schema.xml 文件的接近末尾的地方增加我们的字段类型的

<fieldType name="text_smart" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
<!--
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
-->
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.SmartChineseWordTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
<!--
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
-->
<filter class="solr.SmartChineseWordTokenFilterFactory"/>
</analyzer>
</fieldType>

　　字段需要使用text_smartcn的就是用下面的语句，比如我的是content字段
　　

<field name="content" type="text_smartcn" indexed="true" stored="true"/>

　　3，重启solr服务

# ./bin/solr restart
Sending stop command to Solr running on port 8983 ... waiting 5 seconds to allow Jetty process 50325 to stop gracefully.
Waiting to see Solr listening on port 8983 []
Started Solr server on port 8983 (pid=50745). Happy searching!

　　

　　
4，验证
用SmartCN的分词。

　　
用Solr默认的分词。

账号		自动登录	找回密码
密码			立即注册

VMware vcenter+vSphere 6.5 U2共享

【跟谁学】韩宇极简英语课-技术人员不得不

用Zabbix通过JMX方式监控weblogic

winhex数据恢复教程（非常巨大，内容丰富）

Symantec Backup Exec 2015 2016/2012 BE20

NetScaler VPX部署之：NetScaler Gateway调

zabbix3.4.1安装部署+微信推送信息+大屏显

[经验分享] Solr5.0配置中文分词包

扫码加入运维网微信交流群