The schema.xml file contains all of the details about which fields your
documents can contain, and how those fields should be dealt with when
adding documents to the index, or when querying those fields.
schema.xml位于solr/conf/目录下,类似于数据表配置文件,定义了加入索引的数据的数据类型,主要包括type、fields和其他的一些缺省设置。
Data Types
The <types>
section allows you to define a list of <fieldtype>
declarations you wish to use in your schema, along with the underlying
Solr class that should be used for that type, as well as the default
options you want for fields that use that type.
types节点,这里面定义FieldType子节点,包括name,class,positionIncrementGap等一些参数。
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<!--这个分词包是空格分词,在向索引库添加text类型的索引时,Solr会首先用空格进行分词
然后把分词结果依次使用指定的过滤器进行过滤,最后剩下的结果,才会加入到索引库中以备查询。
注意:Solr的analysis包并没有带支持中文的包,需要自己添加中文分词器,google下。
-->
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
</analyzer>
</fieldType>
Fields
The <fields>
section is where you list the individual <field>
declarations you wish to use in your documents. Each <field>
has a name
that you will use to reference it when adding documents or executing searches, and an associated type
which identifies the name of the fieldtype you wish to use for this
field. There are various field options that apply to a field. These can
be set in the field type declarations, and can also be overridden at an
individual field's declaration.
fields节点内定义具体的字段(类似数据库的字段),含有以下属性:
<schema name="eshequn.post.db_post.0" version="1.1"
xmlns:xi="http://www.w3.org/2001/XInclude">
<fields>
<!-- for title -->
<field name="t" type="text" indexed="true" stored="false" />
<!-- for abstract -->
<field name="a" type="text" indexed="true" stored="false" />
<!-- for title and abstract -->
<field name="ta" type="text" indexed="true" stored="false" multiValued="true"/>
</fields>
<copyField source="t" dest="ta" />
<copyField source="a" dest="ta" />
</schema>Copy Fields
字段t是文章的标题,字段a是文章的摘要,字段ta是文章标题和摘要的联合。添加索引文档时,只需要传入t和a字段的内容,solr会自动索引ta字段。
Dynamic fields
One of the powerful features of Lucene is that you don't have to
pre-define every field when you first create your index. Even though
Solr provides strong datatyping for fields, it still preserves that
flexibility using "Dynamic Fields". Using <dynamicField>
declarations, you can create field rules that Solr will use to
understand what datatype should be used whenever it is given a field
name that is not explicitly defined, but matches a prefix or suffix used
in a dynamicField.
For example the following dynamic field declaration tells Solr that
whenever it sees a field name ending in "_i" which is not an explicitly
defined field, then it should dynamically create an integer field with
that name...
<dynamicField name="*_i" type="integer" indexed="true" stored="true"/> The Unique Key Field
The <uniqueKey>
declaration can be used to inform Solr that there is a field in your
index which should be unique for all documents. If a document is added
that contains the same value for this field as an existing document, the
old document will be deleted. It is not mandatory for a schema to have a uniqueKey field. schema.xml文档注释中的信息: