yuanhaoliang 发表于 2015-11-12 09:46:42

Apache Solr单机环境配置(包括中文分词和Java API的使用)

一、使用场景说明

项目中需要实现一个地址查询框,类似于在谷歌地图中输入地址,根据定义的规则实时检索并展现匹配的地址的功能(当然比那简单的多),为了减小数据库压力和提高检索效率,现采用apache solr实现。由于对性能和可靠性可用性等没太高的要求,所以此处没有考虑集群,只是一个简单的单机版环境。

二、概念介绍



三、安装启动和中文分词配置

1、下载solr-4.10.2.zip,解压得到solr-4.10.2目录
2、将solr-4.10.2\example下的solr目录拷贝出来作为SOLR_HOME(即配各个collection配置和数据存储的目录),此处拷贝到C:\Users\jiayu\Desktop\solr\下并重命名为solr_home
3、将solr-4.10.2\dist下的solr-4.10.2.war拷贝到tomcat的webapps目录下,启动tomcat后会自动解压,然后此war包即可删除,解压后默认目录名称为solr-4.10.2,为方便此处将其名称改为solr,然后在tomcat\webapps\solr\META-INF下创建context.xml(tomcat配置:可在tomcat的\conf\Catalina\localhost目录下创建与应用上下文同名的xml文件或在应用上下文的META-INF目录下创建context.xml),xml内容如下:


<?xml version=&quot;1.0&quot;encoding=&quot;UTF-8&quot; standalone=&quot;yes&quot;?>
<ContextdocBase=&quot;D:\setups\apache-tomcat-7.0.54-others\webapps\solr.war&quot;debug=&quot;0&quot; crossContext=&quot;true&quot; >
<Environmentname=&quot;solr/home&quot; type=&quot;java.lang.String&quot;value=&quot;C:\Users\jiayu\Desktop\solr\solr_home&quot;override=&quot;true&quot; />
</Context>


  
4、启动tomcat,此时会发现报出来一个错误:Error filterStart,查看tomcat日志:localhost.log,报缺少slf4j包,将solr-4.10.2\example\lib\ext下的所有jar包(此处没有校验具体需要哪个)拷贝到solr/WEB-INF/lib目录下,同时在WEB-INF下建立classes目录,建立log4j.properties文件,内容如下


#Logging level
solr.log=logs/
log4j.rootLogger=INFO, file, CONSOLE
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%-4r[%t] %-5p %c %x \u2013 %m%n
#- size rotation with log cleanup.
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.MaxFileSize=4MB
log4j.appender.file.MaxBackupIndex=9
#- File to log to and log format
log4j.appender.file.File=${solr.log}/solr.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%-5p- %d{yyyy-MM-dd HH:mm:ss.SSS}; %C; %m\n
log4j.logger.org.apache.zookeeper=WARN
log4j.logger.org.apache.hadoop=WARN
# set to INFO to enable infostream logmessages
log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF


  
重新启动,访问localhost:8080/solr/admin.html,出现如下界面说明启动成功





配置中文分词
下载IKAnalyzer2012FF_u1.jar,并放在solr/WEB-INF/lib目录下
编辑solr_home/collection1/conf目录下的scheme.xml文件,添加如下内容:


<fieldType name=&quot;text_ik&quot;class=&quot;solr.TextField&quot;>
<analyzertype=&quot;index&quot; isMaxWordLength=&quot;false&quot;class=&quot;org.wltea.analyzer.lucene.IKAnalyzer&quot;/>
<analyzertype=&quot;query&quot; isMaxWordLength=&quot;true&quot;class=&quot;org.wltea.analyzer.lucene.IKAnalyzer&quot;/>
</fieldType>


  


<fieldname=&quot;quesContent&quot; type=&quot;text_ik&quot; />


  
标明quesContent字段的分词由中文分词器去完成,打开solr管理页面,选择collection1(core selector),要分词的字段选择刚配置的quesContent,输入一段中文,点击分析,可看到分词效果



至此,配置工作完成,下面介绍一下如何从数据库导入数据,和如何利用它提供的javaAPI插入和查询数据

四、从数据库导入数据

将solr-4.10.2\dist目录下的solr-dataimporthandler-4.10.2.jar和solr-dataimporthandler-extras-4.10.2.jar拷贝到solr应用程序的WEB-INF/lib目录下,由于此处是从ORACLE中导入数据,还要将Oracle驱动oracle6.jar拷贝到lib下
在solr_home目录/collection1/conf下创建data-config.xml,内容如下


<dataConfig>
<dataSource driver=&quot;oracle.jdbc.driver.OracleDriver&quot;url=&quot;jdbc:oracle:thin:@192.168.32.152:1521:star&quot;user=&quot;ecms&quot; password=&quot;ecms&quot;/>
<document>
<entity name=&quot;addressen&quot; query=&quot;select id,name fromaddressen&quot;>
<fieldcolumn=&quot;id&quot; name=&quot;id&quot;/>
<field column=&quot;name&quot;name=&quot;name&quot;/>
</entity>
</document>
</dataConfig>


  
在/collection1/conf/solrconfig.xml中添加


<requestHandlername=&quot;/dataimport&quot;class=&quot;org.apache.solr.handler.dataimport.DataImportHandler&quot;>
<lstname=&quot;defaults&quot;>
<strname=&quot;config&quot;>data-config.xml</str>
</lst>
</requestHandler>


  
启动后,登录solr管理界面,选择collection1,选择Dataimport选项,点击Execute按钮,如果配置无误的话数据便可导入进来



五、solr-solrj-4.10.2.jar提供的java API使用

为了保证数据库中数据和solr中的数据同步,采用定时任务,每隔一段时间通过API调用,将数据库中数据同步到solr中。由于地址表中存放着各个国家的数据,在同步时,需要将数据按国家分开存放,所以对于每个国家的地址信息,单独创建一个core,在solr_home下有一个配置好的default_core目录,在插入地址时,判断如果此条地址的国家(比如country01)还没有对应的core,将创建目录country01,同时将default_core中的配置信息拷贝到country01中。
以下为完整代码:
package com.star.basic.system.domain;
import java.io.Serializable;
import java.util.List;
import org.apache.solr.client.solrj.beans.Field;
public class SolrStoredAddress implements Serializable {
/**
*
*/
private static final long serialVersionUID = 6192432007244661406L;
@Field
private String id;
@Field
private String name;
@Field
private String countryName;
@Field
private List<String> saleAreas;
@Field
private List<String> operators;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getCountryName() {
return countryName;
}
public void setCountryName(String countryName) {
this.countryName = countryName;
}
public List<String> getSaleAreas() {
return saleAreas;
}
public void setSaleAreas(List<String> saleAreas) {
this.saleAreas = saleAreas;
}
public List<String> getOperators() {
return operators;
}
public void setOperators(List<String> operators) {
this.operators = operators;
}
}


package com.star.basic.system.solr.utils;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import com.star.basic.system.domain.SolrStoredAddress;
public class SolrAddressUtils {
public static boolean available = true;
private static final String SOLR_URL = &quot;http://127.0.0.1:8081/solr&quot;;
private static HttpSolrServer server = new HttpSolrServer(SOLR_URL);
/**
* 将数据同步到solr中
* @param addresses
* @throws SolrServerException
* @throws IOException
*/
public static void addAddresses(List<SolrStoredAddress> addresses)
throws SolrServerException, IOException {
available = false;
//将地址信息按国家分类
Map<String, List<SolrStoredAddress>> map = new HashMap<String, List<SolrStoredAddress>>();
for (SolrStoredAddress address : addresses) {
if (!map.containsKey(address.getCountryName())) {
map.put(address.getCountryName(),
new ArrayList<SolrStoredAddress>());
}
map.get(address.getCountryName()).add(address);
}
for (Entry<String, List<SolrStoredAddress>> entry : map.entrySet()) {
server.setBaseURL(SOLR_URL);
if (!SolrUtils.hasCore(entry.getKey(), server)) {
SolrUtils.createCore(entry.getKey(), server);
}
server.setBaseURL(SOLR_URL + &quot;/&quot; + entry.getKey());
server.deleteByQuery(&quot;*:*&quot;); // CAUTION: deletes everything!
server.addBeans(entry.getValue());
server.commit();
}
available = true;
}
public static List<SolrStoredAddress> queryAddresses(String addrName,
String countryName) {
if (!available) {
return null;
}
SolrQuery query = new SolrQuery();
query.setQuery(&quot;name:&quot; + addrName);
query.setStart(0);
query.setRows(5);
server.setBaseURL(SOLR_URL + &quot;/&quot; + countryName);
QueryResponse response;
try {
response = server.query(query);
return response.getBeans(SolrStoredAddress.class);
} catch (SolrServerException e) {
e.printStackTrace();
}
return null;
}
public static void main(String[] args) throws SolrServerException,
IOException {
// List<SolrStoredAddress> addrs = new ArrayList<SolrStoredAddress>();
// for (int i = 0; i < 10; i++) {
// SolrStoredAddress add = new SolrStoredAddress();
// add.setId(new Long(i));
// add.setName(&quot;add&quot; + i);
// add.setCountryName(&quot;country&quot; + i);
// addrs.add(add);
// }
// SolrAddressUtils.addAddresses(addrs);
server.setBaseURL(SOLR_URL + &quot;/&quot; + &quot;country1&quot;);
List<SolrStoredAddress> addrs = new ArrayList<SolrStoredAddress>();
for (int i = 0; i < 10000; i++) {
SolrStoredAddress add = new SolrStoredAddress();
add.setId(String.valueOf(i));
add.setName(&quot;add&quot; + i);
add.setCountryName(&quot;country1&quot;);
addrs.add(add);
}
SolrAddressUtils.addAddresses(addrs);
}
}


package com.star.basic.system.solr.utils;
import java.io.File;
import java.io.IOException;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.request.CoreAdminRequest;
import org.apache.solr.common.util.NamedList;
import com.star.osgi.utils.FileUtils;
public class SolrUtils {
private static final String DEFAULT_CORE_NAME = &quot;default_core&quot;;
/**
* 判断server中是否已经有对应的core
*
* @param coreName
* @param server
* @return
* @throws SolrServerException
* @throws IOException
*/
public static boolean hasCore(String coreName, SolrServer server)
throws SolrServerException, IOException {
NamedList<Object> responseData = CoreAdminRequest
.getStatus(coreName, server).getCoreStatus().get(coreName);
String name = (String) responseData.get(&quot;name&quot;);
if (name != null && name.equals(coreName)) {
return true;
}
return false;
}
/**
* 创建core:从默认的core中拷贝所需文件,调用接口
*
* @param coreName
* @param server
* @throws SolrServerException
* @throws IOException
*/
public static void createCore(String coreName, SolrServer server)
throws SolrServerException, IOException {
NamedList<Object> list = CoreAdminRequest
.getStatus(DEFAULT_CORE_NAME, server).getCoreStatus()
.get(DEFAULT_CORE_NAME);
String defaultCorePath = (String) list.get(&quot;instanceDir&quot;);
String solrHome = defaultCorePath.substring(0,
defaultCorePath.indexOf(DEFAULT_CORE_NAME));
File corePath = new File(solrHome, coreName);
if (!corePath.exists()) {
corePath.mkdir();
}
File confPath = new File(corePath.getAbsolutePath(), &quot;conf&quot;);
if (!confPath.exists()) {
confPath.mkdir();
}
FileUtils.copyDir(new File(defaultCorePath, &quot;conf&quot;),
new File(corePath.getAbsolutePath(), &quot;conf&quot;));
CoreAdminRequest.createCore(coreName, corePath.getAbsolutePath(),
server);
}
}


查询
@RequestMapping(&quot;indexSearch.do&quot;)
@ResponseBody
public List<Address> indexSearch(@RequestParam(value = &quot;name&quot;,required = false) String name,HttpServletRequest request) throws SolrServerException{
Country country = (Country) request.getSession().getAttribute(&quot;country&quot;);
if(name == null){
return null;
}
List<SolrStoredAddress> addrs = SolrAddressUtils.querySolrAddresses(name,&quot;country1&quot;);
return convertToAddress(addrs);
}




Spring Data Solr也提供了对Solr接口的封装,简化接口调用,同时提供了事务等支持,大致测试配置如下,如有兴趣,可参考官方文档
<?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>
<beans xmlns=&quot;http://www.springframework.org/schema/beans&quot;
xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns:solr=&quot;http://www.springframework.org/schema/data/solr&quot;
xsi:schemaLocation=&quot;http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/data/solr
http://www.springframework.org/schema/data/solr/spring-solr-1.0.xsd&quot;>
<solr:repositories base-package=&quot;solr.test.spring.solr&quot; />
<solr:solr-server id=&quot;solrServer&quot; url=&quot;http://localhost:8081/solr&quot; />
<bean id=&quot;solrTemplate&quot; class=&quot;org.springframework.data.solr.core.SolrTemplate&quot;>
<constructor-arg index=&quot;0&quot; ref=&quot;solrServer&quot; />
</bean>
<!-- <solr:embedded-solr-server id=&quot;solrServer&quot; solrHome=&quot;classpath:solr/test/spring/solr&quot;
/> -->
</beans>
package solr.test.spring.solr;
import java.util.List;
import org.springframework.data.repository.CrudRepository;
import com.star.basic.system.domain.SolrStoredAddress;
public interface AddressSolrRepository extends CrudRepository<SolrStoredAddress, Long> {
List<SolrStoredAddress> findByName(String name);
}







参考资料
  solr wikihttp://wiki.apache.org/solr/Solrj
  Apache
Solr 4.5.1环境搭建及MYSQL数据导入 http://blog.iyunv.com/weijonathan/article/details/16961299


  Spring Data Solr http://docs.spring.io/spring-data/data-solr/docs/current/reference/html/
  

版权声明:本文为博主原创文章,未经博主允许不得转载。
页: [1]
查看完整版本: Apache Solr单机环境配置(包括中文分词和Java API的使用)