[经验分享] Solr 6.7学习笔记(03)



<httpCaching never304="true" >
<cacheControl>max-age=30, public</cacheControl>

为了自动生成 HTTP Caching headers并正确地响应Cache Validation 请求,需要设置 never304="false"  

这将使用Solr产生基于Index属性的 Last-Modified and ETag headers

The following options can also be specified to affect the
values of these headers...

lastModFrom - the default value is "openTime" which means the
Last-Modified value (and validation against If-Modified-Since

requests) will all be>  
was opened.  You can change it to lastModFrom="dirLastMod" if
you want the value to exactly correspond to when the physical
index was last modified.

etagSeed="..." is an option you can change to force the ETag
header (and validation against If-None-Match requests) to be
different even if the index has not changed (ie: when making
significant changes to your config file)

(lastModifiedFrom and etagSeed are both ignored if you use
the never304="true" option)

<httpCaching lastModifiedFrom="openTime"
<cacheControl>max-age=30, public</cacheControl>

<!-- Request Handlers




Legacy behavior: If the request path uses "/select" but no Request
Handler has that name, and if handleSelect="true" has been specified in
the requestDispatcher, then the Request Handler is dispatched based on
the qt parameter.  Handlers without a leading '/' are accessed this way
like so: http://host/app/[core/]select?qt=name  If no qt is
given, then the requestHandler that declares default="true" will be
used or the one named "standard".

If a Request Handler is declared with startup="lazy", then it will
not be initialized until the first request that uses it.


<!-- SearchHandler    http://wiki.apache.org/solr/SearchHandler


For processing Search Queries, the primary Request Handler  
provided with Solr is "SearchHandler" It delegates to a sequent
of SearchComponents (see below) and supports distributed
queries across multiple shards

<requestHandler name="/select">
默认的查询参数,可被请求中的参数覆盖  -->  
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
Consider making 'preferLocalShards' true when:
1) maxShardsPerNode > 1
2) Number of shards > 1
3) CloudSolrClient or LbHttpSolrServer is used by clients.
Without this option, every core broadcasts the distributed query to
a replica of each shard where the replicas are chosen randomly.
This option directs the cores to prefer cores hosted locally, thus
preventing network delays between machines.
This behavior also immunizes a bad/slow machine from slowing down all
the good machines (if those good machines were querying this bad one).

客户端使用 HttpSolrServer 时,建议设为false

<bool name="preferLocalShards">false</bool>
<lst name="appends">
<str name="fq">inStock:true</str>
<!-- "invariants"
"defaults", "appends" 以及请求中相同的参数.  

In this
example, the>
be fixed, limiting the>  
not turned on by default - but if the client does specify

facet=true in the request, these are the only>  
will be able to see counts for; regardless of what other

facet.field or>  


<lst name="invariants">
<str name="facet.field">cat</str>
<str name="facet.field">manu_exact</str>
<str name="facet.query">price:[* TO 500]</str>
<str name="facet.query">price:[500 TO *]</str>
<!-- If the default list of SearchComponents is not desired, that
list can either be overridden completely, or components can be
prepended or appended to the default list.  (see below)
<arr name="components">

<requestHandler name="/query">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="df">text</str>

<!-- A Robust Example

This example SearchHandler declaration shows off usage of the
SearchHandler with many defaults declared

Note that multiple instances of the same Request Handler
(SearchHandler) can be registered multiple times with different
names (and different init parameters)
<requestHandler name="/browse">
<lst name="defaults">
<str name="echoParams">explicit</str>

<!-- VelocityResponseWriter settings -->
<str name="wt">velocity</str>
<str name="v.template">browse</str>
<str name="v.layout">layout</str>
<str name="title">Solritas</str>

<!-- Query settings -->
<str name="defType">edismax</str>
<str name="qf">

text^0.5 features^1.0 name^1.2 sku^1.5>  
title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
<str name="mm">100%</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>

<str name="mlt.qf">

text^0.5 features^1.0 name^1.2 sku^1.5>  
title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
<str name="mlt.fl">text,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename</str>
<int name="mlt.count">3</int>

<str name="facet">on</str>
<str name="facet.missing">true</str>
<str name="facet.field">cat</str>
<str name="facet.field">manu_exact</str>
<str name="facet.field">content_type</str>
<str name="facet.field">author_s</str>
<str name="facet.query">ipod</str>
<str name="facet.query">GB</str>
<str name="facet.mincount">1</str>
<str name="facet.pivot">cat,inStock</str>
<str name="facet.range.other">after</str>
<str name="facet.range">price</str>
<int name="f.price.facet.range.start">0</int>
<int name="f.price.facet.range.end">600</int>
<int name="f.price.facet.range.gap">50</int>
<str name="facet.range">popularity</str>
<int name="f.popularity.facet.range.start">0</int>
<int name="f.popularity.facet.range.end">10</int>
<int name="f.popularity.facet.range.gap">3</int>
<str name="facet.range">manufacturedate_dt</str>
<str name="f.manufacturedate_dt.facet.range.start">NOW/YEAR-10YEARS</str>
<str name="f.manufacturedate_dt.facet.range.end">NOW</str>
<str name="f.manufacturedate_dt.facet.range.gap">+1YEAR</str>
<str name="f.manufacturedate_dt.facet.range.other">before</str>
<str name="f.manufacturedate_dt.facet.range.other">after</str>

<!-- Highlighting defaults -->
<str name="hl">on</str>

<str name="hl.fl">content features>  
<str name="hl.preserveMulti">true</str>
<str name="hl.encoder">html</str>
<str name="hl.simple.pre">&lt;b&gt;</str>
<str name="hl.simple.post">&lt;/b&gt;</str>
<str name="f.title.hl.fragsize">0</str>
<str name="f.title.hl.alternateField">title</str>
<str name="f.name.hl.fragsize">0</str>
<str name="f.name.hl.alternateField">name</str>
<str name="f.content.hl.snippets">3</str>
<str name="f.content.hl.fragsize">200</str>
<str name="f.content.hl.alternateField">content</str>
<str name="f.content.hl.maxAlternateFieldLength">750</str>

<!-- Spell checking defaults -->
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">false</str>      
<str name="spellcheck.count">5</str>
<str name="spellcheck.alternativeTermCount">2</str>
<str name="spellcheck.maxResultsForSuggest">5</str>      
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>  
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.maxCollations">3</str>           

<!-- append spellchecking to our list of components -->
<arr name="last-components">


<initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults">
<str name="df">text</str>

<initParams path="/update/json/docs">
<lst name="defaults">
<!--this ensures that the entire json doc will be stored verbatim into one field-->
<str name="srcField">_src_</str>
<!--This means a the uniqueKeyField will be extracted from the fields and
all fields go into the 'df' field. In this config df is already configured to be 'text'
<str name="mapUniqueKeyOnly">true</str>


<!-- The following are implicitly added
<requestHandler name="/update/json">
<lst name="defaults">
<str name="stream.contentType">application/json</str>
<requestHandler name="/update/csv">
<lst name="defaults">
<str name="stream.contentType">application/csv</str>

<!-- Solr Cell Update Request Handler


<requestHandler name="/update/extract"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>

<!-- capture link hrefs but ignore div attributes -->
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>


<!-- Field Analysis Request Handler


Request parameters are:
analysis.fieldname - field name whose analyzers are to be used

analysis.fieldtype - field type whose analyzers are to be used
analysis.fieldvalue - text for index-time analysis
q (or analysis.q) - text for query time analysis
analysis.showmatch (true|false) - When set to true and when
query analysis is performed, the produced tokens of the
field value analysis will be marked as "matched" for every
token that is produces by the query analysis
<requestHandler name="/analysis/field"
class="solr.FieldAnalysisRequestHandler" />


<!-- Document Analysis Handler


An analysis handler that provides a breakdown of the analysis
process of provided documents. This handler expects a (single)
content stream with the following format:

<field name="id">1</field>
<field name="name">The Name</field>
<field name="text">The Text Value</field>

Note: Each document must contain a field which serves as the
unique key. This key is used in the returned response to associate
an analysis breakdown to the analyzed document.

Like the FieldAnalysisRequestHandler, this handler also supports
query analysis by sending either an "analysis.query" or "q"
request parameter that holds the query text to be analyzed. It
also supports the "analysis.showmatch" parameter which when set to
true, all field tokens that match the query tokens will be marked
as a "match".
<requestHandler name="/analysis/document"
startup="lazy" />

<!-- Echo the request contents back to the client -->
<requestHandler name="/debug/dump" >
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="echoHandler">true</str>

<!-- Search Components

Search components 被注册到 SolrCore 并且被 SearchHandler 使用

<searchComponent name="query"     />
<searchComponent name="facet"     />
<searchComponent name="mlt"       />
<searchComponent name="highlight" />
<searchComponent name="stats"     />
<searchComponent name="debug"     />

Default configuration in a requestHandler would look like:

<arr name="components">

如果注册了一个标准的 searchComponent,则默认的配置将会被覆盖。

以下例子演示了如何在 'standard' components 之前/之后增加一个components:

<arr name="first-components">

<arr name="last-components">

NOTE: The component registered with the name "debug" will
always be executed after the "last-components"


<!-- Spell Check

<searchComponent name="spellcheck">

<str name="queryAnalyzerFieldType">text_general</str>

<!-- 这个组件中可以申明多个 "Spell Checkers"

<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">text</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
<str name="distanceMeasure">internal</str>
<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
<float name="accuracy">0.5</float>
<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
<int name="maxEdits">2</int>
<!-- the minimum shared prefix when enumerating terms -->
<int name="minPrefix">1</int>
<!-- maximum number of inspections per result. -->
<int name="maxInspections">5</int>
<!-- minimum length of a query term to be considered for correction -->
<int name="minQueryLength">4</int>
<!-- maximum threshold of documents a query term can appear to be considered for correction -->
<float name="maxQueryFrequency">0.01</float>
<!-- uncomment this to require suggestions to occur in 1% of the documents
<float name="thresholdTokenFrequency">.01</float>

<!-- a spellchecker that can break or combine words.  See "/spell" handler below for usage -->
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">solr.WordBreakSolrSpellChecker</str>      
<str name="field">name</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">10</int>

<!-- a spellchecker that uses a different distance measure -->
<lst name="spellchecker">
<str name="name">jarowinkler</str>
<str name="field">spell</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">

<!-- a spellchecker that use an>  

comparatorClass be one of:
1. score (default)
2. freq (Frequency first, then score)

3. A fully qualified>  
<lst name="spellchecker">
<str name="name">freq</str>
<str name="field">lowerfilt</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="comparatorClass">freq</str>

<!-- A spellchecker that reads the list of words from a file -->
<lst name="spellchecker">
<str name="classname">solr.FileBasedSpellChecker</str>
<str name="name">file</str>
<str name="sourceLocation">spellings.txt</str>
<str name="characterEncoding">UTF-8</str>
<str name="spellcheckIndexDir">spellcheckerFile</str>

<!-- spellcheck component 的使用示例.  

NOTE: 这纯粹是一个例子.  此处把 SpellCheckComponent 嵌入到 request handler 中是为了

See http://wiki.apache.org/solr/SpellCheckComponent for details
on the request parameters.
<requestHandler name="/spell" startup="lazy">
<lst name="defaults">
<!-- Solr will use suggestions from both the 'default' spellchecker
and from the 'wordbreak' spellchecker and combine them.
collations (re-written queries) can include a combination of
corrections from both spellcheckers -->
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>      
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>      
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>  
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>         
<arr name="last-components">

<!-- The SuggestComponent in Solr provides users with automatic suggestions for query terms.
You can use this to implement a powerful auto-suggest feature in your search application.
As with the rest of this solrconfig.xml file, the configuration of this component is purely
an example that applies specifically to this configset and example documents.

More information about this component and other configuration options are described in the
"Suggester" section of the reference guide available at
<searchComponent name="suggest">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>      
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">cat</str>
<str name="weightField">price</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>

<requestHandler name="/suggest"
startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<arr name="components">


<!-- Term Vector Component
<searchComponent name="tvComponent"/>

<!-- A request handler for demonstrating the term vector component

This is purely as an example.

In reality you will likely want to add the component to your
already specified request handlers.
<requestHandler name="/tvrh" startup="lazy">
<lst name="defaults">
<bool name="tv">true</bool>
<arr name="last-components">

<!-- Clustering Component

You'll need to set the solr.clustering.enabled system property
when running solr to run with clustering enabled:

<searchComponent name="clustering"
class="solr.clustering.ClusteringComponent" >
Declaration of "engines" (clustering algorithms).

The open source algorithms from Carrot2.org project:
* org.carrot2.clustering.lingo.LingoClusteringAlgorithm
* org.carrot2.clustering.stc.STCClusteringAlgorithm
* org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm
See http://project.carrot2.org/algorithms.html for more information.

Commercial algorithm Lingo3G (needs to be installed separately):
* com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm

<lst name="engine">
<str name="name">lingo3g</str>
<bool name="optional">true</bool>
<str name="carrot.algorithm">com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm</str>
<str name="carrot.resourcesDir">clustering/carrot2</str>

<lst name="engine">
<str name="name">lingo</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<str name="carrot.resourcesDir">clustering/carrot2</str>

<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
<str name="carrot.resourcesDir">clustering/carrot2</str>

<lst name="engine">
<str name="name">kmeans</str>
<str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>
<str name="carrot.resourcesDir">clustering/carrot2</str>

<!-- A request handler for demonstrating the clustering component.
This is meant as an example.
In reality you will likely want to add the component to your
already specified request handlers.
<requestHandler name="/clustering"
<lst name="defaults">
<bool name="clustering">true</bool>
<bool name="clustering.results">true</bool>
<!-- Field name with the logical "title" of a each document (optional) -->
<str name="carrot.title">name</str>
<!-- Field name with the logical "URL" of a each document (optional) -->
<str name="carrot.url">id</str>
<!-- Field name with the logical "content" of a each document (optional) -->
<str name="carrot.snippet">features</str>

<!-- Apply highlighter to the>  
<bool name="carrot.produceSummary">true</bool>
<!-- the maximum number of labels per cluster -->
<!--<int name="carrot.numDescriptions">5</int>-->
<!-- produce sub clusters -->
<bool name="carrot.outputSubClusters">false</bool>

<!-- Configure the remaining request handler parameters. -->
<str name="defType">edismax</str>
<str name="qf">

text^0.5 features^1.0 name^1.2 sku^1.5>  
<str name="q.alt">*:*</str>
<str name="rows">100</str>
<str name="fl">*,score</str>
<arr name="last-components">

<!-- Terms Component


A component to return terms and document frequency of those
<searchComponent name="terms"/>

<!-- A request handler for demonstrating the terms component -->
<requestHandler name="/terms" startup="lazy">
<lst name="defaults">
<bool name="terms">true</bool>
<bool name="distrib">false</bool>
<arr name="components">


<!-- Query Elevation Component


a search component that enables you to configure the top
results for a given query regardless of the normal lucene
<searchComponent name="elevator" >
<!-- pick a fieldType to analyze queries -->
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>

<!-- A request handler for demonstrating the elevator component -->
<requestHandler name="/elevate" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
<arr name="last-components">

<!-- Highlighting Component

<searchComponent name="highlight">
<!-- Configure the standard fragmenter -->
<!-- This could most likely be commented out in the "default" case -->
<fragmenter name="gap"
<lst name="defaults">
<int name="hl.fragsize">100</int>

<!-- A regular-expression-based fragmenter
(for sentence extraction)
<fragmenter name="regex"
<lst name="defaults">
<!-- slightly smaller fragsizes work better because of slop -->
<int name="hl.fragsize">70</int>

<!-- allow 50% slop on fragment>  
<float name="hl.regex.slop">0.5</float>
<!-- a basic sentence pattern -->
<str name="hl.regex.pattern">[-\w ,/\n\&quot;&apos;]{20,200}</str>

<!-- Configure the standard formatter -->
<formatter name="html"
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>

<!-- Configure the standard encoder -->
<encoder name="html"
class="solr.highlight.HtmlEncoder" />

<!-- Configure the standard fragListBuilder -->
<fragListBuilder name="simple"

<!-- Configure the single fragListBuilder -->
<fragListBuilder name="single"

<!-- Configure the weighted fragListBuilder -->
<fragListBuilder name="weighted"

<!-- default tag FragmentsBuilder -->
<fragmentsBuilder name="default"
<lst name="defaults">
<str name="hl.multiValuedSeparatorChar">/</str>

<!-- multi-colored tag FragmentsBuilder -->
<fragmentsBuilder name="colored"
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[
<str name="hl.tag.post"><![CDATA[</b>]]></str>

<boundaryScanner name="default"
<lst name="defaults">
<str name="hl.bs.maxScan">10</str>
<str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>

<boundaryScanner name="breakIterator"
<lst name="defaults">
<!-- type should be one of CHARACTER, WORD(default), LINE and SENTENCE -->
<str name="hl.bs.type">WORD</str>
<!-- language and country are used when constructing Locale object.  -->
<!-- And the Locale object will be used when getting instance of BreakIterator -->
<str name="hl.bs.language">en</str>
<str name="hl.bs.country">US</str>

<!-- Update Processors

Chains of Update Processor Factories for dealing with Update
Requests can be declared, and then used by name in Update
Request Processors


<!-- Deduplication

An example dedup update processor that creates the "id" field
on the fly based on the hash code of some other fields.  This
example has overwriteDupes set to false since we are using the
id field as the signatureField and Solr will maintain
uniqueness based on that anyway.  

<updateRequestProcessorChain name="dedupe">
<bool name="enabled">true</bool>
<str name="signatureField">id</str>
<bool name="overwriteDupes">false</bool>
<str name="fields">name,features,cat</str>
<str name="signatureClass">solr.processor.Lookup3Signature</str>
<processor />
<processor />

<!-- Language>  

This example update chain>  
documents using the langid contrib. The detected language is
written to field language_s. No field name mapping is done.

The fields used for detection are text,>  
making this example suitable for detecting languages form full-text
rich documents injected via ExtractingRequestHandler.
See more about langId at http://wiki.apache.org/solr/LanguageDetection
<updateRequestProcessorChain name="langid">
<str name="langid.fl">text,title,subject,description</str>
<str name="langid.langField">language_s</str>
<str name="langid.fallback">en</str>
<processor />
<processor />

<!-- Script update processor

This example hooks in an update processor implemented using JavaScript.

See more about the script update processor at http://wiki.apache.org/solr/ScriptUpdateProcessor
<updateRequestProcessorChain name="script">
<str name="script">update-script.js</str>
<lst name="params">
<str name="config_param">example config parameter</str>
<processor />

<!-- Response Writers


Request responses will be written using the writer specified by
the 'wt' request parameter matching the name of a registered

The "default" writer is the default and will be used if 'wt' is
not specified in the request.
<!-- The following response writers are implicitly configured unless
<queryResponseWriter name="xml"
class="solr.XMLResponseWriter" />
<queryResponseWriter name="json"/>
<queryResponseWriter name="python"/>
<queryResponseWriter name="ruby"/>
<queryResponseWriter name="php"/>
<queryResponseWriter name="phps"/>
<queryResponseWriter name="csv"/>
<queryResponseWriter name="schema.xml"/>

<queryResponseWriter name="json">
<!-- For the purposes of the tutorial, JSON responses are written as
plain text so that they are easy to read in *any* browser.
If you expect a MIME type of "application/json" just remove this override.
<str name="content-type">text/plain; </str>

Custom response writers can be declared as needed...
<queryResponseWriter name="velocity" startup="lazy">
<str name="template.base.dir">${velocity.template.base.dir:}</str>


<!-- XSLT response writer transforms the XML output by any xslt file found
in Solr's conf/xslt directory.  Changes to xslt files are checked for
every xsltCacheLifetimeSeconds.  
<queryResponseWriter name="xslt">
<int name="xsltCacheLifetimeSeconds">5</int>

<!-- Query Parsers


Multiple QParserPlugins can be registered by name, and then
used in either the "defType" param for the QueryComponent (used
by SearchHandler) or in LocalParams
<!-- example of registering a query parser -->
<queryParser name="myparser"/>

<!-- Function Parsers


Multiple ValueSourceParsers can be registered by name, and then
used as function names when using the "func" QParser.
<!-- example of registering a custom function parser  -->
<valueSourceParser name="myfunc"
class="com.mycompany.MyValueSourceParser" />


<!-- Document Transformers
Could be something like:
<transformer name="db" >
<int name="connection">jdbc://....</int>

To add a constant value to all docs, use:
<transformer name="mytrans2" >
<int name="value">5</int>

If you want the user to still be able to change it with _value:something_ use this:
<transformer name="mytrans3" >
<double name="defaultValue">5</double>

If you are using the QueryElevationComponent, you may wish to mark documents that get boosted.  The
EditorialMarkerFactory will do exactly that:
<transformer name="qecBooster" />


