Apache SOLR and Carrot2 integration strategies 1
deploy carrot2-webapp1. download soucre code
#git clone git://github.com/carrot2/carrot2.git
2.compile
#cd carrot2
#ant webapp
3.deploy
#cp tmp/webapp/carrot2-webapp.war /path/to/tomcat/webapps
4.configure carrot2
#cd /path/to/tomcat/webapps/carrot2-webapp/WEB-INF/suites
#mv suite-webapp.xml suite-webapp.xml.old
#cp source-solr.xml suite-webapp.xml
alter it like this:
<component-suite>
<sources>
<source component-class="org.carrot2.source.solr.SolrDocumentSource" id="solr"
attribute-sets-resource="source-solr-attributes.xml">
<label>Solr</label>
<title>Solr Search Engine</title>
<icon-path>icons/solr.png</icon-path>
<mnemonic>s</mnemonic>
<description>Solr document source queries an instance of Apache Solr search engine.</description>
<example-queries>
<example-query>test</example-query>
<example-query>solr</example-query>
</example-queries>
</source>
</sources>
<include suite="algorithm-lingo.xml"></include>
</component-suite>
4. edit source-solr-attributes.xml
<attribute-sets default="overridden-attributes">
<attribute-set id="overridden-attributes">
<value-set>
<label>overridden-attributes</label>
<attribute key="SolrDocumentSource.serviceUrlBase">
<value type="java.lang.String" value="http://192.168.10.204:8983/inokarticle/clustering"/>
</attribute>
<attribute key="SolrDocumentSource.solrSummaryFieldName">
<value type="java.lang.String" value="content"/>
</attribute>
<attribute key="SolrDocumentSource.solrTitleFieldName">
<value type="java.lang.String" value="content"/>
</attribute>
</value-set>
</attribute-set>
</attribute-sets>
5. edit algorithm-lingo-attributes.xml algorithm-lingo.xml
----------------------------------------------------
integrate with solr
1. configure solrconfig.xml
a. import related jars
<lib dir="../contrib/clustering/lib/" regex=".*\.jar" />
<lib dir="../dist/" regex="solr-clustering-\d.*\.jar" />
b. add component adn clustering requesthandler
<searchComponent name="clustering"
enable="true"
class="solr.clustering.ClusteringComponent" >
<lst name="engine">
<str name="name">lingo</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<str name="carrot.resourcesDir">clustering/carrot2</str>
<str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>
<str name="PreprocessingPipeline.tokenizerFactory">org.carrot2.text.linguistic.DefaultTokenizerFactory</str>
</lst>
</searchComponent>
<requestHandler name="/clustering"
startup="lazy"
enable="true"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">lingo</str>
<bool name="clustering.results">true</bool>
<!-- Field name with the logical "title" of a each document (optional) -->
<str name="carrot.title">content</str>
<!-- Field name with the logical "URL" of a each document (optional) -->
<str name="carrot.url">id</str>
<!-- Field name with the logical "content" of a each document (optional) -->
<str name="carrot.snippet">content</str>
<!-- Apply highlighter to the title/ content and use this for clustering. -->
<bool name="carrot.produceSummary">true</bool>
<!-- the maximum number of labels per cluster -->
<int name="carrot.numDescriptions">5</int>
<!-- produce sub clusters -->
<bool name="carrot.outputSubClusters">true</bool>
<str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>
<!-- Configure the remaining request handler parameters. -->
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
2.custom chinese tokenizer for clustering
a. modify related carrot souce code and recompile
b. copy related jars and lexicon to solr web lib dir
Details see Apache SOLR and Carrot2 integration strategies 2
References
http://wiki.apache.org/solr/ClusteringComponent
http://www.cnblogs.com/cy163/archive/2010/05/07/1730172.html
http://carrot2.github.io/solr-integration-strategies/carrot2-3.8.0/index.html
http://download.carrot2.org/head/manual/index.html#section.advanced-topics.building-from-source-code
http://www.cnblogs.com/shm10/p/3700604.html
页:
[1]