设为首页 收藏本站
查看: 1240|回复: 0

[经验分享] 如何用Solr搭建自己的搜索服务

[复制链接]

尚未签到

发表于 2016-12-16 08:50:14 | 显示全部楼层 |阅读模式
最近在学习Solr,借这个机会丰富一下自己空荡荡的博客,同时也加深一下自己记忆。还有特别提示各位读者,本人也是刚刚接触Solr,对其了解并不深入,有说的不对或错误的地方,望各位多多指点。
在搭建Solr搜索服务之前,先来了解两个问题。

  • 什么是Solr?
  • Solr能做什么?

什么是Solr
Solr是一个基于Lucene实现的全文搜索服务器。底层使用易于扩展和修改的Java 来实现。服务器通信使用标准的HTTP 和XML,所以如果使用Solr 了解Java 技术会有用却不是必须的要求。
Solr 主要特性有:

  • 强大的全文检索功能,
  • 高亮显示检索结果,
  • 动态集群,
  • 数据库接口和电子文档(Word ,PDF 等)的处理。

而且Solr 具有高度的可扩展,支持分布搜索和索引的复制。
Solr能做什么
现在有站点都使用Solr作为搜索服务器,不仅仅因为它是Apache的开源项目,而是因为它
下载最新的Solr和Resin
Solr:http://lucene.apache.org/solr/
Resin:http://www.caucho.com/download/
我这里下载的Solr是4.3.0版本; Resin用的是本人以前一直用的3.0.25,这里也可以用Tomcat,有关Tomcat搭建Solr搜索服务器的可以参考:
http://www.originsoft.net/archives/32
数据用的是MySql,这里就不说MySql了
Solr 程序包目录结构
解压Solr压缩包到D:/solr-4.3.0

  • build :在solr 构建过程中放置已编译文件的目录。
  • client :包含了一些特定语言调用Solr 的API 客户端程序,目前只有Ruby 可供选择,Java 客户端叫SolrJ 在src/solrj 中可以找到。
  • dist :存放Solr 构建完成的JAR 文件、WAR 文件和Solr 依赖的JAR 文件。
  • example :是一个安装好的Jetty 中间件,其中包括一些样本数据和Solr 的配置信息。            example/etc :Jetty 的配置文件。
                example/multicore :当安装Slor multicore 时,用来放置多个Solr 主目录。
                example/solr :默认安装时一个Solr 的主目录。
                example/webapps :Solr 的WAR 文件部署在这里。
  • src :Solr 相关源码。            src/java :Slor 的Java 源码。
                src/scripts :一些在大型产品发布时一些有用的Unix bash shell 脚本。
                src/solrj :Solr 的Java 客户端。
                src/test :Solr 的测试源码和测试文件。
                src/webapp :Solr web 管理界面。管理界面的Jsp 文件都放在web/admin/ 下面,可以根据你的需要修改这些文件。

  Solr 的源码没有放在同一个目录下,src/java 存放大多数文件,src/common 是服务器端与客户端公用的代码,src/test 放置solr 的测试程序,serlvet 的代码放在src/webapp/src 中。
安装Resin
这里下载的是压缩包,直接解压即可D:/resin-pro-3.0.25
开始搭建Solr搜索环境
该准备的工作已经准备完毕,下面开始详述搭建Solr搜索环境
1、打开Eclipse新建一个Web项目
这里我是用新建的Maven项目,将D:\solr-4.3.0\example\webapps\solr.war解压到新建项目的webapp下面:如下图
DSC0000.jpg
在webapp下新建solr_home,这个目录后面配置solr/home JNDI 即:Lucene索引目录文件
2、修改web.xml配置solr/home JNDI

<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>src/main/webapp/solr_home</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>

这边也可以在中间件配置solr/home
如:Tomcat可以在%CATALINA_HOME%/conf/Catalina/localhost/solr.xml
<?xml version='1.0' encoding='utf-8'?>
<Context docBase="D:/eclipse-4.2/workspace/acme-search/src/main/webapp" debug="0" crossContext="true">
<Environment name="solr/home" type="java.lang.String" value="D:/eclipse-4.2/workspace/acme-search/src/main/webapp/solr_home" override="true"/>
</Context>

3、配置索引服务
在solr/home配置的目录下有solr.xml,该文件配置该索引服务上所拥有的所有索引。每个<core>标签表示一个搜索服务,它的name可以任意取名,再好与instanceDir索引根目录名一致。
instanceDir索引根目录名,即:solr/home下索引实例目录。该目录下需要有conf子目录下面包含有data-config.xml、schema.xml、solrconfig.xml。这三个Xml文件在下面会具体讲到。
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
<cores adminPath="/admin/cores" defaultCoreName="collection1" host="${host:}" hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}" zkClientTimeout="${zkClientTimeout:15000}">
<core name="collection1" instanceDir="collection1" />
</cores>
</solr>

4、配置索引创建数据源
这里要说道data-config.xml、schema.xml、solrconfig.xml和DataImportHandler
配置索引数据源data-config.xml
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"  url="jdbc:mysql://192.168.1.101:3306/wms"  user="root"  password="123"/>  
<document name="content">
<entity name="node" pk="USER_ID" query="SELECT A.USER_ID, A.ADDRESS, DATE_FORMAT(A.BIRTHDAY,'%Y-%m-%d') BIRTHDAY, A.CARD_NO, A.EMAIL, A.NAME, A.PHONE, A.PRE_MONEY, DATE_FORMAT(A.REG_DATE,'%Y-%m-%d') REG_DATE, A.SCORE, A.SEX, A.STATUS, A.TYPE FROM USERS A">
<field column="USER_ID" name="id" />
<field column="ADDRESS" name="address" />
<field column="BIRTHDAY" name="birthday" />
<field column="CARD_NO" name="cardNo" />
<field column="EMAIL" name="email" />
<field column="NAME" name="name" />
<field column="PHONE" name="phone" />
<field column="PRE_MONEY" name="preMoney" />
<field column="REG_DATE" name="redDate" />
<field column="SCORE" name="score" />
<field column="SEX" name="sex" />
<field column="STATUS" name="status" />
<field column="TYPE" name="type" />
</entity>
</document>
</dataConfig>

配置域(Field)类型及索引、查询分词器schema.xml

<?xml version="1.0" encoding="UTF-8" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!--  
This is the Solr schema file. This file should be named "schema.xml" and
should be in the conf directory under the solr home
(i.e. ./solr/conf/schema.xml by default)
or located where the classloader for the Solr webapp can find it.
This example schema is the recommended starting point for users.
It should be kept correct and concise, usable out-of-the-box.
For more information, on how to customize this file, please see
http://wiki.apache.org/solr/SchemaXml
-->
<schema name="db" version="1.1">
<!-- attribute "name" is the name of this schema and is only used for display purposes.
Applications should change this to reflect the nature of the search collection.
version="1.1" is Solr's version number for the schema syntax and semantics.  It should
not normally be changed by applications.
1.0: multiValued attribute did not exist, all fields are multiValued by nature
1.1: multiValued attribute introduced, false by default -->
<types>
<!-- field type definitions. The "name" attribute is
just a label to be used by field definitions.  The "class"
attribute and any other attributes determine the real
behavior of the fieldType.
Class names starting with "solr" refer to java classes in the
org.apache.solr.analysis package.
-->
<!-- The StrField type is not analyzed, but indexed/stored verbatim.  
- StrField and TextField support an optional compressThreshold which
limits compression (if enabled in the derived fields) to values which
exceed a certain size (in characters).
-->
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<!-- boolean type: "true" or "false" -->
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
<!-- The optional sortMissingLast and sortMissingFirst attributes are
currently supported on types that are sorted internally as strings.
- If sortMissingLast="true", then a sort on this field will cause documents
without the field to come after documents with the field,
regardless of the requested sort order (asc or desc).
- If sortMissingFirst="true", then a sort on this field will cause documents
without the field to come before documents with the field,
regardless of the requested sort order.
- If sortMissingLast="false" and sortMissingFirst="false" (the default),
then default lucene sorting will be used which places docs without the
field first in an ascending sort and last in a descending sort.
-->   

<!-- numeric field types that store and index the text
value verbatim (and hence don't support range queries, since the
lexicographic ordering isn't equal to the numeric ordering) -->
<fieldType name="integer" class="solr.IntField" omitNorms="true"/>
<fieldType name="long" class="solr.LongField" omitNorms="true"/>
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="double" class="solr.DoubleField" omitNorms="true"/>

<!-- Numeric field types that manipulate the value into
a string value that isn't human-readable in its internal form,
but with a lexicographic ordering the same as the numeric ordering,
so that range queries work correctly. -->
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>

<!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and
is a more restricted form of the canonical representation of dateTime
http://www.w3.org/TR/xmlschema-2/#dateTime   
The trailing "Z" designates UTC time and is mandatory.
Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
All other components are mandatory.
Expressions can also be used to denote calculations that should be
performed relative to "NOW" to determine the value, ie...
NOW/HOUR
... Round to the start of the current hour
NOW-1DAY
... Exactly 1 day prior to now
NOW/DAY+6MONTHS+3DAYS
... 6 months and 3 days in the future from the start of
the current day
Consult the DateField javadocs for more information.
-->
<fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>

<!-- The "RandomSortField" is not used to store or search any
data.  You can declare fields of this type it in your schema
to generate psuedo-random orderings of your docs for sorting
purposes.  The ordering is generated based on the field name
and the version of the index, As long as the index version
remains unchanged, and the same field name is reused,
the ordering of the docs will be consistent.  
If you want differend psuedo-random orderings of documents,
for the same version of the index, use a dynamicField and
change the name
-->
<fieldType name="random" class="solr.RandomSortField" indexed="true" />
<!-- solr.TextField allows the specification of custom text analyzers
specified as a tokenizer and a list of token filters. Different
analyzers may be specified for indexing and querying.
The optional positionIncrementGap puts space between multiple fields of
this type on the same document, with the purpose of preventing false phrase
matching across fields.
For more info on customizing your analyzer chain, please see
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
-->
<!-- One can also specify an existing Analyzer class that has a
default constructor via the class attribute on the analyzer element
<fieldType name="text_greek" class="solr.TextField">
<analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
</fieldType>
-->
<!-- A text field that only splits on whitespace for exact matching of words -->
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
<!-- A text field that uses WordDelimiterFilter to enable splitting and matching of
words on case-change, alpha numeric boundaries, and non-alphanumeric chars,
so that a query of "wifi" or "wi fi" could match a document containing "Wi-Fi".
Synonyms and stopwords are customized by external files, and stemming is enabled.
Duplicate tokens at the same position (which may result from Stemmed Synonyms or
WordDelim parts) are removed.
-->
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

<!-- Less flexible matching, but less false matches.  Probably not ideal for product names,
but may be good for SKUs.  Can insert dashes in the wrong place and still match. -->
<fieldType name="textTight" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<!-- This is an example of using the KeywordTokenizer along
With various TokenFilterFactories to produce a sortable field
that does not include some properties of the source text
-->
<fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<!-- KeywordTokenizer does no actual tokenizing, so the entire
input string is preserved as a single token
-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<!-- The LowerCase TokenFilter does what you expect, which can be
when you want your sorting to be case insensitive
-->
<filter class="solr.LowerCaseFilterFactory" />
<!-- The TrimFilter removes any leading or trailing whitespace -->
<filter class="solr.TrimFilterFactory" />
<!-- The PatternReplaceFilter gives you the flexibility to use
Java Regular expression to replace any sequence of characters
matching a pattern with an arbitrary replacement string,
which may include back refrences to portions of the orriginal
string matched by the pattern.
See the Java Regular Expression documentation for more
infomation on pattern and replacement string syntax.
http://java.sun.com/j2se/1.6.0/docs/api/java/util/regex/package-summary.html
-->
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z])" replacement="" replace="all"
/>
</analyzer>
</fieldType>
<!-- since fields of this type are by default not stored or indexed, any data added to
them will be ignored outright
-->
<fieldtype name="ignored" stored="false" indexed="false" class="solr.StrField" />
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>

<fields>
<!-- Valid attributes for fields:
name: mandatory - the name for the field
type: mandatory - the name of a previously defined type from the <types> section
indexed: true if this field should be indexed (searchable or sortable)
stored: true if this field should be retrievable
multiValued: true if this field may contain multiple values per document
omitNorms: (expert) set to true to omit the norms associated with
this field (this disables length normalization and index-time
boosting for the field, and saves some memory).  Only full-text
fields or fields that need an index-time boost need norms.
termVectors: [false] set to true to store the term vector for a given field.
When using MoreLikeThis, fields used for similarity should be stored for
best performance.
-->
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="address" type="string" indexed="true" stored="true"/>
<field name="cardNo" type="string" indexed="true" stored="false"/>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="email" type="text" indexed="true" stored="true"/>
<field name="phone" type="string" indexed="true" stored="true"/>
<field name="preMoney"  type="sfloat" indexed="true" stored="true"/>
<!-- "default" values can be specified for fields, indicating which
value should be used if no value is specified when adding a document.
-->
<field name="birthday" type="string" indexed="true" stored="true"/>
<field name="score" type="sint" indexed="true" stored="true" default="0"/>
<!-- Some sample docs exists solely to demonstrate the spellchecker
functionality, this is the only field they container.
Typically you might build the spellchecker of "catchall" type field
containing all of the text in each document
-->
<field name="sex" type="string" indexed="true" stored="true"/>
<field name="type" type="string" indexed="true" stored="true"/>
<!-- non-tokenized version of manufacturer to make it easier to sort or group
results by manufacturer.  copied from "manu" via copyField -->
<field name="status" type="string" indexed="true" stored="false"/>
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<!-- Dynamic field definitions.  If a field name is not found, dynamicFields
will be used if the name matches any of the patterns.
RESTRICTION: the glob-like pattern in the name attribute must have
a "*" only at the start or the end.
EXAMPLE:  name="*_i" will match any field ending in _i (like myid_i, z_i)
Longer patterns will be matched first.  if equal size patterns
both match, the first appearing in the schema will be used.  -->
<dynamicField name="*_i"  type="sint"    indexed="true"  stored="true"/>
<dynamicField name="*_s"  type="string"  indexed="true"  stored="true"/>
<dynamicField name="*_l"  type="slong"   indexed="true"  stored="true"/>
<dynamicField name="*_t"  type="text"    indexed="true"  stored="true"/>
<dynamicField name="*_b"  type="boolean" indexed="true"  stored="true"/>
<dynamicField name="*_f"  type="sfloat"  indexed="true"  stored="true"/>
<dynamicField name="*_d"  type="sdouble" indexed="true"  stored="true"/>
<dynamicField name="*_dt" type="date"    indexed="true"  stored="true"/>
<dynamicField name="random*" type="random" />
<!-- uncomment the following to ignore any fields that don't already match an existing
field name or dynamic field, rather than reporting them as an error.
alternately, change the type="ignored" to some other type e.g. "text" if you want
unknown fields indexed and/or stored by default -->
<!--dynamicField name="*" type="ignored" multiValued="true" /-->
</fields>
<!-- Field to use to determine and enforce document uniqueness.
Unless this field is marked with required="false", it will be a required field
-->
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>name</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
<!-- Similarity is the scoring routine for each document vs. a query.
A custom similarity may be specified here, but the default is fine
for most applications.  -->
<!-- <similarity class="org.apache.lucene.search.similarities.DefaultSimilarity"/> -->
</schema>


配置DataImportHandler
打开solrconfig.xml,加上下面的内容

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>


到这里,配置完成。重启应用,打开浏览器
http://localhost:8089/solr/
DSC0001.jpg

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-314942-1-1.html 上篇帖子: solr 处理数据库数据索引 DataImportHandler 的使用[转] 下篇帖子: solr 几个有用的过滤器
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表