apache solr之我们的应用
一.安装准备:#下载 JDK
wget http://download.oracle.com/otn-pub/java/jdk/7u40-b43/jdk-7u40-linux-x64.tar.gz
#下载 tomcat
wget http://mirrors.cnnic.cn/apache/tomcat/tomcat-7/v7.0.42/bin/apache-tomcat-7.0.42.tar.gz
#下载 solr
wget http://mirrors.hust.edu.cn/apache/lucene/solr/4.4.0/solr-4.4.0.zip
#下载分词器mmseg
wget https://mmseg4j.googlecode.com/files/mmseg4j-1.9.1.zip
#下载分词库
wget https://mmseg4j.googlecode.com/files/data.zip
二.安装过程:
2.1安装JDk(1.6.32),环境中已经安装好了,这里省略:
$ cat /etc/profile
exportJAVA_HOME=/usr/local/java
exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
exportPATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOMR/bin
2.2 操作系统网络参数优化:
$ cat /etc/sysctl.conf
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse= 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 15
2.3 tomcat安装:
$ tar-zxf apache-tomcat-7.0.42.tar.gz -C /u1/javaServer
$ mv/u1/javaServer/apache-tomcat-7.0.42 /u1/javaServer/tomcat4solr
# config the environment tomcat for solr
Method1—基于JNDI:
# catconf/Catalina/localhost/solr.xml
Method2 —基于environment:
#cat bin/catalina.sh
exportJAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=
/u1/pr/gysolr/apache-solr-3.6.1/example/solr"
或者:
#cat /etc/profile #cat~/.bash_profile
exportJAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=
/u1/pr/gysolr/apache-solr-3.6.1/example/solr"
# cat conf/server.xml
注:如果没有设置URIEncoding="UTF-8",在提交查询的select的url会出现乱码;
useBodyEncodingForURI="true" tomcact中的server.xml如果如果配置了此项,则有可能导致中文乱码。
2.4 solr安装与配置:
#安装solr:
$ tar zxf apache-solr-3.6.1.tar.gz-C /u1/pr/gysolr #注意:这里使用的版本为3.6.1,4.4有改变
#配置solr.war:
$ cp /u1/pr/gysolr/apache-solr-3.6.1/dist/apache-solr-3.6.1.war\
/u1/pr/gysolr/apache-solr-3.6.1/example/webapps/solr.war
#拷贝下载好的分词器
$ cp mmseg4j-all-1.9.1.jar \
/u1/pr/gysolr/apache-solr-3.6.1/example/webapps/WEB-INF/lib
注意:如果需要连接数据库,也需要将mysql-connector-java-5.1.17.jar(举例)放在该目录
#拷贝下载好的分词库
$ cp words.dic /u1/pr/gysolr/apache-solr-3.6.1/example/solr/dic
#配置schema.xml使分词器生效
$ cat /u1/pr/gysolr/apache-solr-3.6.1/example/solr/conf/schema.xml
#配置db
$ cat/u1/pr/gysolr/apache-solr-3.6.1/example/solr/db/db-data-config.xml
附:表index_movie结构
CREATE TABLE `index_movie` (
`index_id` int(11) NOT NULL AUTO_INCREMENT,
`movie_id` int(11) DEFAULT NULL,
`movie_name` varchar(32) DEFAULT NULL,
`movie_name_alias` varchar(255) DEFAULT NULL,
`movie_name_pinyin` varchar(255) DEFAULT NULL,
`starin` varchar(128) DEFAULT NULL,
`starin_pinyin` varchar(256) DEFAULT NULL,
`director` varchar(64) DEFAULT NULL,
`director_pinyin` varchar(64) DEFAULT NULL,
`movie_type` varchar(64) DEFAULT NULL,
`show_time` varchar(10) DEFAULT NULL,
`region` varchar(32) DEFAULT NULL,
`movie_desc` varchar(1024) DEFAULT NULL,
`weight` int(11) DEFAULT '0',
`dp_class_type` varchar(128) DEFAULT NULL,
`dp_district_type` varchar(128) DEFAULT NULL,
`dp_age_type` varchar(64) DEFAULT NULL,
`show_time_format` varchar(64) DEFAULT NULL,
`resource_flag` int(10) DEFAULT '0' COMMENT '1有0无',
PRIMARY KEY (`index_id`),
KEY `index_movie_id` (`movie_id`)
) ENGINE=InnoDB AUTO_INCREMENT=196606 DEFAULT CHARSET=utf8
#在schema.xml中对需要创建索引和使用分词的表字段进行设置:
#测试下分词效果,访问:http://localhost:8090/solr/admin/analysis.jsp
#其他管理命令:
http://localhost:8090/solr/admin/ //登录
http://localhost:8090/solr/dataimport?command=full-import //更新数据(重建索引)
http://localhost:8090/solr/dataimport?command=status//查看服务状态
http://localhost:8090/solr/select/?q=movie_name:007&version=2.2&start=0&rows=10&indent=on //查看新数据是否生效
http://localhost:8090/solr/select/?q=movie_name:1942ANDmovie_type:ai&version=2.2&start=0&rows=10&indent=on&sort=diff_show_timeasc//asc升序,desc降序
#问题排查:
出现乱码或者查不到结果的排查方法:
1) Tomcat的server.xml需要保证:
如果没有设置URIEncoding="UTF-8",在提交查询的select的url会出现乱码,当然也就查不到了。
2) 添加索引的时候,xml数据文件需要包含utf-8声明,也就是:
然后使用Solr自带的post.jar或者curl来进行post索引:
java: java-Durl=http://localhost:8090/solr/update -Dcommit=yes -jar post.jar data.xml
curl:
curl -F "file=@data.xml"http://localhost:8090/solr/update
curl -H "Content-Type: text/xml"-d "" http://localhost:8090/solr/update
该步骤如果没有设置好,出错的表现是,查询的结果是乱码(可以设置查询关键词为id:[* TO *]确保显示出所有的结果).
3) 如果确保了如上的两个步骤都设置正确,但是使用http://localhost:8090/solr/admin/的查询表单(该表单支持utf-8没有问题)仍旧查不到结果,则问题可能出在分词器,可以尝试不同的分词器。)
三. 我们的应用:
# cat/u1/script/update_solr.sh
#!/bin/bash
HOSTNAME="dbip"
PORT="3306"
USERNAME="dbuser"
PASSWORD="dbpassword"
DBNAME="dbname"
#update solr for dapian
command_sql1="TRUNCATE TABLEindex_movie;"
command_sql2="INSERT INTOindex_movie(movie_id,movie_name,region,movie_type,show_time,director,starin,movie_name_pinyin,movie_desc)
SELECTmovie_id,movie_name,region,movie_type,show_time,director,starin,movie_name_pinyin,movie_descFROM md_movieinfo;"
command_sql3="UPDATEindex_movie,(SELECT movie_id,GROUP_CONCAT(md_moviename_ext.filmname) alias FROMmd_moviename_ext GROUP BY movie_id) b SET index_movie.movie_name_alias =b.alias WHERE index_movie.movie_id=b.movie_id;"
command_sql4="UPDATEindex_movie,(SELECT movie_id,GROUP_CONCAT(md_moviename_ext.pinying) alias FROMmd_moviename_ext GROUP BY movie_id) b SET index_movie.movie_name_pinyin =b.alias WHERE index_movie.movie_id=b.movie_id;"
command_sql5="UPDATEindex_movie,(SELECT movie_id,GROUP_CONCAT(NAME) people FROM md_moviepeople_extWHERE TYPE='主演' GROUP BY movie_id) b SET index_movie.starin=b.people WHEREindex_movie.movie_id=b.movie_id;"
command_sql6="UPDATEindex_movie,(SELECT movie_id,GROUP_CONCAT(pinying) people FROMmd_moviepeople_ext WHERE TYPE='主演' GROUP BY movie_id) b SET index_movie.starin_pinyin=b.people WHEREindex_movie.movie_id=b.movie_id;"
command_sql7="UPDATEindex_movie,(SELECT movie_id,GROUP_CONCAT(NAME) people FROM md_moviepeople_extWHERE TYPE='导演' GROUP BY movie_id) b SET index_movie.director=b.people WHEREindex_movie.movie_id=b.movie_id;"
command_sql8="UPDATEindex_movie,(SELECT movie_id,GROUP_CONCAT(pinying) people FROMmd_moviepeople_ext WHERE TYPE='导演' GROUP BY movie_id) b SET index_movie.director_pinyin=b.peopleWHERE index_movie.movie_id=b.movie_id;"
command_sql9="UPDATE index_movie SETresource_flag = 1 WHERE movie_id IN (SELECT DISTINCT(movie_id) FROMmd_resource2down WHERE LENGTH(resource_url) > 10) AND resource_flag =0;"
mysql -h${HOSTNAME} -P${PORT} -u${USERNAME} -p${PASSWORD} ${DBNAME} -e"${command_sql1};"
mysql -h${HOSTNAME} -P${PORT} -u${USERNAME} -p${PASSWORD} ${DBNAME} -e"${command_sql2};"
mysql -h${HOSTNAME} -P${PORT} -u${USERNAME} -p${PASSWORD} ${DBNAME} -e"${command_sql3};"
mysql -h${HOSTNAME} -P${PORT} -u${USERNAME} -p${PASSWORD} ${DBNAME} -e"${command_sql4};"
mysql -h${HOSTNAME} -P${PORT} -u${USERNAME} -p${PASSWORD} ${DBNAME} -e"${command_sql5};"
mysql -h${HOSTNAME} -P${PORT} -u${USERNAME} -p${PASSWORD} ${DBNAME} -e"${command_sql6};"
mysql -h${HOSTNAME} -P${PORT} -u${USERNAME} -p${PASSWORD} ${DBNAME} -e"${command_sql7};"
mysql -h${HOSTNAME} -P${PORT} -u${USERNAME} -p${PASSWORD} ${DBNAME} -e"${command_sql8};"
mysql -h${HOSTNAME} -P${PORT} -u${USERNAME} -p${PASSWORD} ${DBNAME} -e"${command_sql9};"
在crontab里加入:
#update solr
30 3 * * * /u1/script/update_solr.sh
0 4 * * * /usr/bin/curl http://localhost:8090/solr/dataimport?command=full-import > /dev/null 2>&1
附上述脚本中描述的表结构:
#电影资讯信息表
CREATE TABLE `md_movieinfo` (
`movie_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`movie_name` varchar(64) DEFAULT NULL,
`image` varchar(255) DEFAULT NULL,
`warn_url` varchar(256) DEFAULT NULL,
`region` varchar(32) DEFAULT NULL COMMENT '发行的地区',
`show_time_raw` varchar(32) DEFAULT NULL,
`show_time` date NOT NULL DEFAULT '0000-00-00',
`flags` tinyint(4) DEFAULT NULL COMMENT '2:上影过,1:上影中,0:将上影',
`director` varchar(64) DEFAULT NULL,
`starin` varchar(255) DEFAULT NULL,
`movie_type` varchar(20) DEFAULT NULL,
`play_type` enum('3D','2D') DEFAULT '2D',
`time_len` varchar(32) DEFAULT NULL,
`movie_desc` varchar(1024) DEFAULT NULL,
`movie_grade` tinyint(3) unsigned DEFAULT '0',
`enjoyed` int(10) unsigned DEFAULT '0',
`want_see` int(10) unsigned DEFAULT NULL,
`have_see` int(11) DEFAULT NULL,
`play_count` int(11) DEFAULT '0' COMMENT '播放次数累计',
`score_count` int(11) DEFAULT '0' COMMENT '评分累计',
`comment_num` int(10) unsigned NOT NULL DEFAULT '0',
`weburl` varchar(64) NOT NULL DEFAULT '0',
`play_cinemanum` mediumint(9) DEFAULT NULL,
`add_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`admin_id` int(10) unsigned DEFAULT NULL,
`t_count` int(10) unsigned DEFAULT NULL,
`t_price` mediumint(8) unsigned DEFAULT NULL,
`movie_feature` text,
`is_group` enum('Y','N') DEFAULT 'N',
`outline` varchar(64) DEFAULT NULL COMMENT '一句话话评论',
`copr_price` decimal(8,2) DEFAULT NULL,
`movie_name_pinyin` varchar(100) NOT NULL DEFAULT '',
`isdel` enum('Y','N') DEFAULT 'N',
`dp_class_type` varchar(128) DEFAULT '0',
`dp_district_type` varchar(128) DEFAULT '0',
`dp_age_type` varchar(64) DEFAULT '0',
`show_time_format` varchar(64) DEFAULT '0',
PRIMARY KEY (`movie_id`),
KEY `idx_movie_name_piyin` (`movie_name_pinyin`),
KEY `idx_show_time` (`show_time`)
) ENGINE=InnoDB AUTO_INCREMENT=2000010984 DEFAULT CHARSET=utf8;
#片名信息表
CREATE TABLE `md_moviename_ext` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`movie_id` int(10) unsigned NOT NULL,
`sortser` int(10) unsigned NOT NULL,
`filmname` varchar(256) DEFAULT NULL,
`country` varchar(32) DEFAULT NULL,
`fromweb` varchar(32) DEFAULT NULL,
`pinying` varchar(128) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_movied_id` (`movie_id`),
KEY `idx_movie_name` (`filmname`(255),`pinying`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=30578103 DEFAULT CHARSET=utf8;
#影人信息表
CREATE TABLE `md_moviepeople_ext` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`movie_id` int(10) unsigned NOT NULL,
`people_id` int(10) unsigned NOT NULL,
`type` varchar(32) DEFAULT NULL,
`sortser` int(10) unsigned NOT NULL,
`name` varchar(64) DEFAULT NULL,
`pinying` varchar(128) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_move_id` (`movie_id`),
KEY `idx_people_id` (`people_id`)
) ENGINE=InnoDB AUTO_INCREMENT=70119773 DEFAULT CHARSET=utf8;
#影片资源表
CREATE TABLE `md_resource2down` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`movie_id` int(10) unsigned DEFAULT NULL,
`movie_name` varchar(64) DEFAULT '',
`resource_type` varchar(16) DEFAULT '',
`resource_type_name` varchar(16) DEFAULT NULL,
`resource_url` varchar(1280) DEFAULT '',
`resource_from` varchar(256) DEFAULT '',
`add_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`weight` int(11) DEFAULT '999',
`vid` int(10) unsigned DEFAULT NULL,
`xsoname` varchar(128) DEFAULT NULL,
`hashid` bigint(20) unsigned DEFAULT NULL,
`file_type` varchar(8) DEFAULT NULL,
`time_len` varchar(16) NOT NULL,
`speed` tinyint(4) unsigned DEFAULT '6',
`movie_type` varchar(128) DEFAULT NULL COMMENT '视频格式,如:高清,超清',
`movie_size` varchar(16) DEFAULT NULL COMMENT '视频文件大小,如 700M',
`client_mtype` varchar(128) DEFAULT NULL COMMENT '视频格式规范化后的格式类型',
PRIMARY KEY (`id`),
KEY `idx_movie_id` (`movie_id`,`id`),
KEY `idx_resource_type` (`resource_type`),
KEY `idx_hashid` (`hashid`)
) ENGINE=InnoDB AUTO_INCREMENT=10424156 DEFAULT CHARSET=utf8;
#结果索引集表
CREATE TABLE `index_movie` (
`index_id` int(11) NOT NULL AUTO_INCREMENT,
`movie_id` int(11) DEFAULT NULL,
`movie_name` varchar(32) DEFAULT NULL,
`movie_name_alias` varchar(255) DEFAULT NULL,
`movie_name_pinyin` varchar(255) DEFAULT NULL,
`starin` varchar(128) DEFAULT NULL,
`starin_pinyin` varchar(256) DEFAULT NULL,
`director` varchar(64) DEFAULT NULL,
`director_pinyin` varchar(64) DEFAULT NULL,
`movie_type` varchar(64) DEFAULT NULL,
`show_time` varchar(10) DEFAULT NULL,
`region` varchar(32) DEFAULT NULL,
`movie_desc` varchar(1024) DEFAULT NULL,
`weight` int(11) DEFAULT '0',
`dp_class_type` varchar(128) DEFAULT NULL,
`dp_district_type` varchar(128) DEFAULT NULL,
`dp_age_type` varchar(64) DEFAULT NULL,
`show_time_format` varchar(64) DEFAULT NULL,
`resource_flag` int(10) DEFAULT '0' COMMENT '1有0无',
PRIMARY KEY (`index_id`),
KEY `index_movie_id` (`movie_id`)
) ENGINE=InnoDB AUTO_INCREMENT=196606 DEFAULT CHARSET=utf8;
参考博客:
http://blog.csdn.net/chenjia3615349/article/details/8112289
http://www.iteye.com/blogs/tag/solr
http://www.solr.cc/blog/
http://www.blogjava.net/RongHao/archive/2007/11/06/158621.html
页:
[1]