设为首页 收藏本站
查看: 1112|回复: 0

[经验分享] Solr DisjunctionMax 注解

[复制链接]

尚未签到

发表于 2018-11-2 06:07:32 | 显示全部楼层 |阅读模式
  Disjunction
  Max析取最大(并集)
  
  本质多域联合搜索,并且不同域指定不同的权重,命中时取最大得分域结果作为结果得分。与直接多域boost求和是完全不同的结果。使用起来非常复杂,需要debugquery
  看结果,反复尝试!
  http://wiki.apache.org/solr/DisMax
  http://searchhub.org/dev/2010/05/23/whats-a-dismax/
  What’sa“DisMax”?Posted
  byhossman
  Theterm“dismax”gets
  tossedaround(被抛出来)on
  theSolrlistsfrequently,whichcanbefairlyconfusingtonew
  users.Itoriginatedasashorthandnameforthe
  DisMaxRequestHandler(whichInamedafterthe
  DisjunctionMaxQueryParser,whichInamedafterthe
  DisjunctionMaxQueryclassthatitusesheavily).Inrecent
  years,theDisMaxRequestHandlerandtheStandardRequestHandlerwere
  bothrefactoredinto(重构)
  asingleSearchHandlerclass,and
  nowtheterm“dismax”usuallyreferstothe
  DisMaxQParser.
  注解:dismax现在对应于DisMaxQParser,而DismaxRequestHandler与standardRequestHandler重构到SearchHandler中
  
  ClearasMudd,
  right?
  Regardlessofwhetheryou
  usetheDisMaxRequestHandlerviatheqt=dismax
  parameter,orusetheSearchHandlerwiththeDisMaxQParservia
  defType=dismaxtheendresultisthatyour
  qparametergetsparsedbythe
  DisjunctionMaxQueryParser.
  注解:qt=dismax,采取DisMaxRequestHandler,而defType=dismax,是SearchHandler中使用DisMaxQParser,二者q的参数采取DisJunctionMaxQueryParser解析
  
  The
  originalgoalsofdismax(whichevermeaningyoumightinfer)
  haveneverchanged:
  …supportsasimplified
  versionoftheLuceneQueryParsersyntax.Quotescanbeusedto
  groupphrases(分组短语),and
  +/-canbeusedtodenotemandatory(强制性、必选的)andoptional(可选的)clauses…butallotherLucenequeryparser
  specialcharactersareescapedtosimplifytheuserexperience.The
  handlertakesresponsibilityforbuildingagoodqueryfromthe
  user’sinputusingBooleanQueriescontainingDisjunctionMaxQueries
  acrossfieldsandboostsyouspecifyItalsoallowsyoutoprovide
  additionalboostingqueries,boostingfunctions,andfiltering
  queriestoartificially(人工)affecttheoutcomeofallsearches.Theseoptionscanall
  bespecifiedasdefaultparametersforthehandlerinyour
  solrconfig.xmloroverriddentheSolrqueryURL.
  Inshort:Youworryabout
  whatfieldsandboostsyouwanttousewhenyouconfigureit,your
  usersjustgiveyouwordsw/oworryingtoomuchabout
  syntax.
  注解:dismax句柄主要负责使用布尔查询封装DisjunctionMaxQueries,同时允许手工执行query激励、函数激励、过滤query影响最终搜索结果。所有参数可以通过在solrconfig.xml中配置,作为全局查询用,也可以通过url添加参数,在每一次或者每一类查询中动态使用。
  
  Themagicofdismax(inmy
  opinion)comesfromthequerystructureitproduces.Whatit
  essentiallyboilsdowntois
  matrixmultiplication:aonecolumnmatrixofeach“chunk”of
  youruser’sinput,multipliedbyaonerowmatrixofthe
  qffieldstoproduceabigmatrixofeveryfield:chunk
  permutation(排列).
  ThematrixisthenturnedintoaBooleanQueryconsistingof
  DisjunctionMaxQueriesforeachrow
  inthematrix.DisjunctionMaxQueryisusedbecause
  it’sscoreisdeterminedbythemaximumscoreofit’s
  subclauses—insteadofthesumlikeaBooleanQuery—sonoone
  wordfromtheuserinputdominatesthefinalscore.Thebestwayto
  explainthisiswithanexample,solet’sconsiderthefollowing
  input…
span lang="EN-US">defType = dismax  
     mm = 50%
  
     qf = features^2 name^3
  
      q = +"apache solr" search server
  Firstoff,weconsiderthe
  “markup”charactersoftheparserthatappearinthis
  qstring:

  •   whitespace–dividinginput
      stringintochunk(
      分词)
  •   quotes–makesasinglephrase
      chunk(
      括号)
  •   +–makesachunkmandatory
      (
      组合关系)
  Sowehave3“chunks”ofuserinput:

  •   “apachesolr”(must
      match)
  •   “search”(should
      match)
  •   “server”(should
      match>
  Ifwe“multiply”thatwith
  ourqflist(features,name)wegeta
  matrixlikethis…
  features:”apache
  solr”
  name:”apache
  solr”
  (mustmatch)
  features:”search”
  name:”search”
  (shouldmatch)
  features:”server”
  name:”server”
  (shouldmatch)
  Ifwethenfactorinthe
  mmparamtodetermingthe“minimumnumberof
  ‘ShouldMatch’clausesthat(ahem)mustmatch”(50%of2==1)we
  getthefollowingquerystructure(inpsuedo-code)…
q = BooleanQuery(  
  minNumberShouldMatch => 1,
  
  booleanClauses => ClauseList(
  
    MustMatch(DisjunctionMaxQuery(
  
      PhraseQuery("features","apache solr")^2,
  
      PhraseQuery("name","apache solr")^3)
  
    ),
  
    ShouldMatch(DisjunctionMaxQuery(
  
      TermQuery("features","search")^2,
  
      TermQuery("name","search")^3)
  
    ),
  
    ShouldMatch(DisjunctionMaxQuery(
  
      TermQuery("features","server")^2,
  
      TermQuery("name","server")^3))
  
));
  
 
  
注解:boolean查询这个是最最基本的原子查询,其他高级查询都是基于这个查询的组合、封装,Dismax也是如此。从dismax qp分解过程和定义看,dismax也是分解为boolean查询,并且field激励也同一般域boost一致,但是不同的时候dismax是以最大得分作为最终得分,而一般多域独立boost时候是求和得分。
  

  Withmesofar
  right?
  Wherepeopletendtoget
  trippedup(绊倒),isinthinkingabouthowSolr’sper-fieldanalysis
  configuration(inschema.xml)impactsallofthis.Ourexample
  abovewasprettystraightforward,butletsconsiderforamoment
  whatmighthappenif:

  •   Thename
      fieldusestheWordDelimiterFilter(单词分割符过滤器)atquerytimebutfeaturesdoesnot.
  •   Thefeaturesfieldisconfiguredsothat“the”isastopword,but
      nameis
      not.
  Nowlet’slookatwhatwe
  getwhenourinputparametersarestructurallysimilartowhatwe
  hadbefore,butjustdifferentenoughtoforWordDelimiterFilter
  andStopFiltertocomeintoplay…
defType = dismax  
     mm = 50%
  
     qf = features^2 name^3
  
      q = +"apache solr" the search-server
  Ourresultingqueryisgoing
  tobesomethinglike…
q = BooleanQuery(  
  minNumberShouldMatch => 1,
  
  booleanClauses => ClauseList(
  
    MustMatch(DisjunctionMaxQuery(
  
      PhraseQuery("features","apache solr")^2,
  
      PhraseQuery("name","apache solr")^3)
  
    ),
  
    ShouldMatch(DisjunctionMaxQuery(
  
      TermQuery("name","the")^3)
  
    ),
  
    ShouldMatch(DisjunctionMaxQuery(
  
      TermQuery("features","search-server")^2,
  
      PhraseQuery("name","search server")^3))
  
  ));
  Theuseof
  WordDelimiterFilterhasn’tchangedthingsverymuch:featuresis
  treating“search-server”asasingleTerm,whileinthe
  namefieldwearesearchingforthephrase“search
  server”—hopefullythisshouldn’tsurpriseanyonegiventheuseof
  WordDelimiterFilterforthenamefield(presumablythat’swhyit’s
  beingused).ThisDisjunctionMaxQuerystill“makessense”,but
  otherfieldswithoddanalysisthatproduceless/moreTokensthena
  “typical”fieldforthesamethunkmightproducequeriesthat
  aren’taseasilytounderstand.Inparticularconsiderwhathas
  happenedinourexamplewiththeword“the”:Because“the”isa
  stopwordinthefeaturesfield,noQueryobjectis
  producedforthatfield/chunkcombination.ButaQueryisproduced
  forthenamefield,whichmeansthetotalnumberof
  “ShouldMatch”clausesinourtoplevelqueryisstill2soour
  minNumberShouldMatchisstill1(50%of2==1).
  Thistypeofsituationtends
  toconfusealotofpeople:since“the”isastopwordinone
  field,theydon’texpectittomatterinthefinalquery—butas
  longasatleastoneqffieldproducesaTokenforit
  (nameinourexample)itwillbeincludedinthefinal
  query,andwillcontributetothecountof“ShouldMatch”
  clauses.
  So,what’sthetakeaway
  fromallofthis?
  DisMaxisacomplicated
  creature.Whenusingit,youneedtoconsiderallofit’s
  optionscarefully,andlookatthedebugQuery=true
  outputwhileexperimentingwithdifferentquerystringsand
  differentanalysisconfigurationstomakereallysureyou
  understandhowqueriesfromyouruserswillbeparsed.
  注解:dismax构造非常复杂,使用的时候需要仔细考虑所有选项,同时,开启debugQuery=true,针对不同的查询串和分词器。
  Forqf(QueryFields),pf(PhraseFields),
  mm(Minimum‘Should’Match),andtie(TieBreaker),
  see:theSolr
  WikiDisMaxQParserPlugin.
  
  Solr:
  ForcingitemswithallquerytermstothetopofaSolrsearch
  RobotLibrarian
  
  http://robotlibrarian.billdueber.com/solr-forcing-items-with-all-query-terms-to-the-top-of-a-solr-search/
  
  LucidImaginationSolrPoweredISFDB–Part
  #10:TweakingRelevancy
  http://searchhub.org/dev/2011/06/20/solr-powered-isfdb-part-10/
  
  LucidImaginationSolrPoweredISFDB–Part
  #11:UsingDisMax
  http://searchhub.org/dev/2011/08/08/solr-powered-isfdb-part-11/
  
  http://tm.durusau.net/?p=21573
  
  Using
  Solr’sDismaxTieParameterAnotherWordForIt(tie
  breake配合断路器)
  http://java.dzone.com/articles/using-solrs-dismax-tie
  
  
  SolrPoweredISFDB–Part#11:Using
  DisMax
  http://searchhub.org/dev/2011/06/20/solr-powered-isfdb-part-10/



运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-629500-1-1.html 上篇帖子: Solr平台化搜索实战必知场景 下篇帖子: 垂直个性化排序之Solr如何支持
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表