jxp2002 发表于 2018-11-2 06:07:32

Solr DisjunctionMax 注解

  Disjunction
  Max析取最大(并集)
  
  本质多域联合搜索,并且不同域指定不同的权重,命中时取最大得分域结果作为结果得分。与直接多域boost求和是完全不同的结果。使用起来非常复杂,需要debugquery
  看结果,反复尝试!
  http://wiki.apache.org/solr/DisMax
  http://searchhub.org/dev/2010/05/23/whats-a-dismax/
  What’sa“DisMax”?Posted
  byhossman
  Theterm“dismax”gets
  tossedaround(被抛出来)on
  theSolrlistsfrequently,whichcanbefairlyconfusingtonew
  users.Itoriginatedasashorthandnameforthe
  DisMaxRequestHandler(whichInamedafterthe
  DisjunctionMaxQueryParser,whichInamedafterthe
  DisjunctionMaxQueryclassthatitusesheavily).Inrecent
  years,theDisMaxRequestHandlerandtheStandardRequestHandlerwere
  bothrefactoredinto(重构)
  asingleSearchHandlerclass,and
  nowtheterm“dismax”usuallyreferstothe
  DisMaxQParser.
  注解:dismax现在对应于DisMaxQParser,而DismaxRequestHandler与standardRequestHandler重构到SearchHandler中
  
  ClearasMudd,
  right?
  Regardlessofwhetheryou
  usetheDisMaxRequestHandlerviatheqt=dismax
  parameter,orusetheSearchHandlerwiththeDisMaxQParservia
  defType=dismaxtheendresultisthatyour
  qparametergetsparsedbythe
  DisjunctionMaxQueryParser.
  注解:qt=dismax,采取DisMaxRequestHandler,而defType=dismax,是SearchHandler中使用DisMaxQParser,二者q的参数采取DisJunctionMaxQueryParser解析
  
  The
  originalgoalsofdismax(whichevermeaningyoumightinfer)
  haveneverchanged:
  …supportsasimplified
  versionoftheLuceneQueryParsersyntax.Quotescanbeusedto
  groupphrases(分组短语),and
  +/-canbeusedtodenotemandatory(强制性、必选的)andoptional(可选的)clauses…butallotherLucenequeryparser
  specialcharactersareescapedtosimplifytheuserexperience.The
  handlertakesresponsibilityforbuildingagoodqueryfromthe
  user’sinputusingBooleanQueriescontainingDisjunctionMaxQueries
  acrossfieldsandboostsyouspecifyItalsoallowsyoutoprovide
  additionalboostingqueries,boostingfunctions,andfiltering
  queriestoartificially(人工)affecttheoutcomeofallsearches.Theseoptionscanall
  bespecifiedasdefaultparametersforthehandlerinyour
  solrconfig.xmloroverriddentheSolrqueryURL.
  Inshort:Youworryabout
  whatfieldsandboostsyouwanttousewhenyouconfigureit,your
  usersjustgiveyouwordsw/oworryingtoomuchabout
  syntax.
  注解:dismax句柄主要负责使用布尔查询封装DisjunctionMaxQueries,同时允许手工执行query激励、函数激励、过滤query影响最终搜索结果。所有参数可以通过在solrconfig.xml中配置,作为全局查询用,也可以通过url添加参数,在每一次或者每一类查询中动态使用。
  
  Themagicofdismax(inmy
  opinion)comesfromthequerystructureitproduces.Whatit
  essentiallyboilsdowntois
  matrixmultiplication:aonecolumnmatrixofeach“chunk”of
  youruser’sinput,multipliedbyaonerowmatrixofthe
  qffieldstoproduceabigmatrixofeveryfield:chunk
  permutation(排列).
  ThematrixisthenturnedintoaBooleanQueryconsistingof
  DisjunctionMaxQueriesforeachrow
  inthematrix.DisjunctionMaxQueryisusedbecause
  it’sscoreisdeterminedbythemaximumscoreofit’s
  subclauses—insteadofthesumlikeaBooleanQuery—sonoone
  wordfromtheuserinputdominatesthefinalscore.Thebestwayto
  explainthisiswithanexample,solet’sconsiderthefollowing
  input…
span lang="EN-US">defType = dismax  
     mm = 50%
  
     qf = features^2 name^3
  
      q = +"apache solr" search server
  Firstoff,weconsiderthe
  “markup”charactersoftheparserthatappearinthis
  qstring:

[*]  whitespace–dividinginput
  stringintochunk(
  分词)
[*]  quotes–makesasinglephrase
  chunk(
  括号)
[*]  +–makesachunkmandatory
  (
  组合关系)
  Sowehave3“chunks”ofuserinput:

[*]  “apachesolr”(must
  match)
[*]  “search”(should
  match)
[*]  “server”(should
  match>
  Ifwe“multiply”thatwith
  ourqflist(features,name)wegeta
  matrixlikethis…
  features:”apache
  solr”
  name:”apache
  solr”
  (mustmatch)
  features:”search”
  name:”search”
  (shouldmatch)
  features:”server”
  name:”server”
  (shouldmatch)
  Ifwethenfactorinthe
  mmparamtodetermingthe“minimumnumberof
  ‘ShouldMatch’clausesthat(ahem)mustmatch”(50%of2==1)we
  getthefollowingquerystructure(inpsuedo-code)…
q = BooleanQuery(  
  minNumberShouldMatch => 1,
  
  booleanClauses => ClauseList(
  
    MustMatch(DisjunctionMaxQuery(
  
      PhraseQuery("features","apache solr")^2,
  
      PhraseQuery("name","apache solr")^3)
  
    ),
  
    ShouldMatch(DisjunctionMaxQuery(
  
      TermQuery("features","search")^2,
  
      TermQuery("name","search")^3)
  
    ),
  
    ShouldMatch(DisjunctionMaxQuery(
  
      TermQuery("features","server")^2,
  
      TermQuery("name","server")^3))
  
));
  
 
  
注解:boolean查询这个是最最基本的原子查询,其他高级查询都是基于这个查询的组合、封装,Dismax也是如此。从dismax qp分解过程和定义看,dismax也是分解为boolean查询,并且field激励也同一般域boost一致,但是不同的时候dismax是以最大得分作为最终得分,而一般多域独立boost时候是求和得分。
  

  Withmesofar
  right?
  Wherepeopletendtoget
  trippedup(绊倒),isinthinkingabouthowSolr’sper-fieldanalysis
  configuration(inschema.xml)impactsallofthis.Ourexample
  abovewasprettystraightforward,butletsconsiderforamoment
  whatmighthappenif:

[*]  Thename
  fieldusestheWordDelimiterFilter(单词分割符过滤器)atquerytimebutfeaturesdoesnot.
[*]  Thefeaturesfieldisconfiguredsothat“the”isastopword,but
  nameis
  not.
  Nowlet’slookatwhatwe
  getwhenourinputparametersarestructurallysimilartowhatwe
  hadbefore,butjustdifferentenoughtoforWordDelimiterFilter
  andStopFiltertocomeintoplay…
defType = dismax  
     mm = 50%
  
     qf = features^2 name^3
  
      q = +"apache solr" the search-server
  Ourresultingqueryisgoing
  tobesomethinglike…
q = BooleanQuery(  
  minNumberShouldMatch => 1,
  
  booleanClauses => ClauseList(
  
    MustMatch(DisjunctionMaxQuery(
  
      PhraseQuery("features","apache solr")^2,
  
      PhraseQuery("name","apache solr")^3)
  
    ),
  
    ShouldMatch(DisjunctionMaxQuery(
  
      TermQuery("name","the")^3)
  
    ),
  
    ShouldMatch(DisjunctionMaxQuery(
  
      TermQuery("features","search-server")^2,
  
      PhraseQuery("name","search server")^3))
  
  ));
  Theuseof
  WordDelimiterFilterhasn’tchangedthingsverymuch:featuresis
  treating“search-server”asasingleTerm,whileinthe
  namefieldwearesearchingforthephrase“search
  server”—hopefullythisshouldn’tsurpriseanyonegiventheuseof
  WordDelimiterFilterforthenamefield(presumablythat’swhyit’s
  beingused).ThisDisjunctionMaxQuerystill“makessense”,but
  otherfieldswithoddanalysisthatproduceless/moreTokensthena
  “typical”fieldforthesamethunkmightproducequeriesthat
  aren’taseasilytounderstand.Inparticularconsiderwhathas
  happenedinourexamplewiththeword“the”:Because“the”isa
  stopwordinthefeaturesfield,noQueryobjectis
  producedforthatfield/chunkcombination.ButaQueryisproduced
  forthenamefield,whichmeansthetotalnumberof
  “ShouldMatch”clausesinourtoplevelqueryisstill2soour
  minNumberShouldMatchisstill1(50%of2==1).
  Thistypeofsituationtends
  toconfusealotofpeople:since“the”isastopwordinone
  field,theydon’texpectittomatterinthefinalquery—butas
  longasatleastoneqffieldproducesaTokenforit
  (nameinourexample)itwillbeincludedinthefinal
  query,andwillcontributetothecountof“ShouldMatch”
  clauses.
  So,what’sthetakeaway
  fromallofthis?
  DisMaxisacomplicated
  creature.Whenusingit,youneedtoconsiderallofit’s
  optionscarefully,andlookatthedebugQuery=true
  outputwhileexperimentingwithdifferentquerystringsand
  differentanalysisconfigurationstomakereallysureyou
  understandhowqueriesfromyouruserswillbeparsed.
  注解:dismax构造非常复杂,使用的时候需要仔细考虑所有选项,同时,开启debugQuery=true,针对不同的查询串和分词器。
  Forqf(QueryFields),pf(PhraseFields),
  mm(Minimum‘Should’Match),andtie(TieBreaker),
  see:theSolr
  WikiDisMaxQParserPlugin.
  
  Solr:
  ForcingitemswithallquerytermstothetopofaSolrsearch
  RobotLibrarian
  
  http://robotlibrarian.billdueber.com/solr-forcing-items-with-all-query-terms-to-the-top-of-a-solr-search/
  
  LucidImaginationSolrPoweredISFDB–Part
  #10:TweakingRelevancy
  http://searchhub.org/dev/2011/06/20/solr-powered-isfdb-part-10/
  
  LucidImaginationSolrPoweredISFDB–Part
  #11:UsingDisMax
  http://searchhub.org/dev/2011/08/08/solr-powered-isfdb-part-11/
  
  http://tm.durusau.net/?p=21573
  
  Using
  Solr’sDismaxTieParameterAnotherWordForIt(tie
  breake配合断路器)
  http://java.dzone.com/articles/using-solrs-dismax-tie
  
  
  SolrPoweredISFDB–Part#11:Using
  DisMax
  http://searchhub.org/dev/2011/06/20/solr-powered-isfdb-part-10/


页: [1]
查看完整版本: Solr DisjunctionMax 注解