|
Solr 1.4 中的 TokenizerFactory 有变化。以至旧的 solr 分词扩展不能用。它的 create 方法要求返回 Tokenizer,而 PaodingTokenizer 不是继承 Tokenizer 的。所以不太方便。因此写个包装。
新写一个 SolrPaodingTokenizer 继承 Tokenizer,而 PaodingTokenizer 是其属性。如:
- package com.chenlb.solr.paoding;
- import java.io.IOException;
- import java.io.Reader;
- import net.paoding.analysis.analyzer.PaodingTokenizer;
- import net.paoding.analysis.analyzer.TokenCollector;
- import net.paoding.analysis.knife.Knife;
- import org.apache.lucene.analysis.Token;
- import org.apache.lucene.analysis.Tokenizer;
- /**
- * Solr 1.4 中使用对 PaodingTkenizer 的包装。
- *
- * @author chenlb 2009-12-18 下午 04:46:06
- */
- public class SolrPaodingTokenizer extends Tokenizer {
- private PaodingTokenizer paodingTokenizer;
- private Knife knife;
- private TokenCollector tokenCollector;
- public SolrPaodingTokenizer(Reader input, Knife knife, TokenCollector tokenCollector) {
- paodingTokenizer = new PaodingTokenizer(input, knife, tokenCollector);
- this.input = input;
- this.knife = knife;
- this.tokenCollector = tokenCollector;
- }
- public Token next throws IOException {
- return paodingTokenizer.next;
- }
- public void close throws IOException {
- paodingTokenizer.close;
- }
- public void reset(Reader input) throws IOException {
- paodingTokenizer = new PaodingTokenizer(input, knife, tokenCollector);
- this.input = input;
- }
- }
package com.chenlb.solr.paoding; import java.io.IOException; import java.io.Reader; import net.paoding.analysis.analyzer.PaodingTokenizer; import net.paoding.analysis.analyzer.TokenCollector; import net.paoding.analysis.knife.Knife; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; /** * Solr 1.4 中使用对 PaodingTkenizer 的包装。 * * @author chenlb 2009-12-18 下午04:46:06 */ public class SolrPaodingTokenizer extends Tokenizer { private PaodingTokenizer paodingTokenizer; private Knife knife; private TokenCollector tokenCollector; public SolrPaodingTokenizer(Reader input, Knife knife, TokenCollector tokenCollector) { paodingTokenizer = new PaodingTokenizer(input, knife, tokenCollector); this.input = input; this.knife = knife; this.tokenCollector = tokenCollector; } public Token next throws IOException { return paodingTokenizer.next; } public void close throws IOException { paodingTokenizer.close; } public void reset(Reader input) throws IOException { paodingTokenizer = new PaodingTokenizer(input, knife, tokenCollector); this.input = input; } } 然后再写个 PaodingTokenizerFactory,我这就不写了,下载:solr-1.4-paoding.zip
包装 paoding 2.0.4-beta,以至可以在 solr 1.4 中使用。
使用:
把 apache-solr-1.4.0-paoding.war 替换 apache-solr-1.4.0.war,里面打包了 paoding-2.0.4-beta、词库、还在包装的 solr-1.4-paoding.jar
源码在 solr-1.4-paoding-src 中,solr-1.4-paoding.jar 是在 solr 1.3 下编译的。测试过,可以在 solr 1.4 中使用。
solr/conf 包括有 schema.xml 配置:
如果只用 PaodingAnalyzer 不用包装也行的。如:
|
|
|