设为首页 收藏本站
查看: 656|回复: 0

[经验分享] Solr平台化搜索实战必知场景

[复制链接]

尚未签到

发表于 2018-11-2 06:06:01 | 显示全部楼层 |阅读模式
  【提醒】
  这个page是个人汇总了maillist、自己在搜索平台化、通用化过程中遇到的种种需求,为了避开必要的“敬业竞争禁止等”,特地从外网搜罗并汇总代表性的需求。构成基于solr搜索“策略”参考、搜索应用查询的方案参考,但是,性能问题特别是高级用法,在大数据量时,务必压测,做到心里有底。
  这里面给出的方法绝大部分基于solr接口、配置。不针对深入定制的详细说明。针对深入定制的经验,这里找不到答案,有兴趣私下交流。
  整个汇总抛砖引入,各个点没有做系统、全面的论证,内容基本来自网络,总体方向和大点没有问题。如果发现细处不对,也请指出。谢谢!
  目录
  13.4.0得分的问题…
  1
  2
  配置方法…
  1
  3
  问题和需求…
  3
  4Payload问题…
  3
  5Customsort(score+customvalue)
  4
  6BoostQParserPlugin.
  4
  7howcanIlimitbyscorebeforesortingina
  solrquery.
  6
  8Score
  filter.
  7
  9Boostscoreforearly
  matches.
  10
  10Solr:HowcanIgetalldocumentsorderedby
  scorewithalistofkeywords?.
  11
  11Solrchangesdocument’sscorewhenitsrandom
  fieldvaluealtered.
  13
  12Relevance
  Customization.
  15
  13ModifySOLRscoring.
  15
  14Changeorderbefore
  returningdata.
  16
  15limitingthe
  totalnumberofdocumentsmatched.
  17
3.4.0  得分的问题

  (7)
  得分因子是可以调整的,但是得分因子的增加、得分公式的扩展,无法直接从solr配置插入。—-但是,可以扩展lucene的代码或者参数
  spanquery,重新一个query,插入solr,这样工作量稍大.另外,社区提供了bm25、pagerank等排序batch,对lucene
  有所以了解后,就可以直接引用了。
  (16)
  在排序上,对与去重或者对应基于时间动态性上,还没有现成的支持。去重是指排序的前几条结果,可能某个域值完全相同了,或者某几个域值完全相同,导致看起来,靠前的结果带有一些关联字段的“聚集性”,对有些应用来说,并不是最好的。
  在时间因素上动态性,也没有直接支持,也只能靠间接的按时间排序来实现。
  这个问题其实不是lucene、solr要关注的吧,应该是应用的特殊性导致的吧。

  配置方法
  全局配置schema.xml
  Similarity
  A(global)declarationcanbeusedtospecifya
  customSimilarityimplementationthatyouwantSolrtousewhen
  dealingwithyourindex.ASimilaritycanbespecifiedeitherby
  referringdirectlytothenameofaclasswithano-arg
  constructor…
  
  …orbyreferencinga
  SimilarityFactoryimplementation,whichmaytake
  optionalinitparams….
  
  P
  L
  H2
  7
  
  BeginingwithSolr4.0,Similarity
  factoriessuchasSchemaSimilarityFactory
  canalsosupportspecifyingspecific
  Similarityimplementationsonindividualfieldtypes…
  
  
  
  
  I(F)
  B
  H2
  
  
  
  
  
  SPL
  DF
  H2
  
  
  …
  
  
  Ifno(global)isconfiguredintheschema.xmlfile,
  animplicitinstanceofDefaultSimilarityFactory
  isused.
  问题和需求

  By
  DefaultComputerValue
  ByCustomScore,By
  DefaultComputerValue
  CustomScore*fa+
  DefaultComputerValue*fb
  Doc11010010*0.8+
  100*0.2=28
  Doc2199
  1*0.8+99*0.2=20.6
  Doc3398
  3*0.8+98*0.2=22
  Doc42050
  20*0.8+50*0.2=36
  Solr3.4.0
  得分代码分析
  abstractclass
  SimilarityFactory
  成员变量publicabstract
  SimilaritygetSimilarity();
Payload问题
  http://wiki.apache.org/lucene-java/Payloads
  Scoringpayloadsinvolves
  overridingtheSimilarity.scorePayload()method.Forexample,if
  onehasimplementedstoringaFloatpayload,itcouldbeusedfor
  scoringinthefollowingway:
  public float scorePayload(byte [] payload, int offset, int length) {  
    assert length == 4;
  
    int accum = ((payload[0+offset]&0xff)) |
                ((payload[1+offset]&0xff)<<8) |                ((payload[2+offset]&0xff)<<16)  |                ((payload[3+offset]&0xff)<<24);    return Float.intBitsToFloat(accum);  
  }
  Don’tforgettoactivate
  yourSimilarityimplementationusingIndexSearcher.setSimilarity().
  Also,notethateventhennotallquerieswillactuallymakeuseof
  yourmethod.Forexample,youwillneedtouseBoostingTermQuery
  insteadofTermQuery.QueryParsercurrently(Lucene2.3.2)always
  usesTermQueryandyouwillneedtoextendQueryParserand
  overwritegetFieldQuery().
  Note,thatisjustone
  possiblewayofscoringapayload.Payloadsareapplication
  specific.ForexamplepayloadTokenFiltersseethepayloadpackage
  inthecontrib/Analysismodule.
Customsort(score+custom  value)

  http://grokbase.com/t/lucene/solr-user/08b25j6ked/custom-sort-score-custom-value
  Hi,
  Iwanttoimplementacustomsortin
  Solrbasedonacombinationofrelevance(Solrgivesmeityet
  =>score)andacustomvalueI’vecalculated
  previouslyforeachdocument.Iseetwooptions:
  1.Useafunctionquery(I’musinga
  DisMaxRequestHandler).
  2.CreateacomponentthatsetSortSpecwithasortthathasa
  custom
  ComparatorSource(similartoQueryElevationComponent).
  Thefirstoptionhastheproblem:
  Whiletherelevancevaluechangesfor
  everyquery,mycustomvalueisconstantforeachdoc.Itimplies
  queries
  withdocumentsthathavehighrelevancearelessaffectedwithmy
  custom
  value.Ontheotherhand,querieswithlowrelevanceareaffecteda
  lotwithmycustomvalue.Canitbeproportionalwithafunction
  query?(i.e.docswithlowrelevancearelessaffectedbymycustom
  value).
  Thesecondoptionhastheproblem:
  Solrscoreisn’tnormalized.Ineeditnormalizedinordertoapply
  mycustomvalueinthesortValuefunctionin
  ScoreDocComparator.Whatdoyouthink?What’sthebestoptionin
  thatcase?Anotheroption?
  Thankyouinadvance,
  George
BoostQParserPlugin
  http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/search/BoostQParserPlugin.html
  org.apache.solr.search
  Class
  BoostQParserPlugin
  http://stackoverflow.com/questions/3035831/solr-lucene-scorer
  Scorerarepartsoflucene
  Queriesviathe‘weight’querymethod.
  Inshort,theframework
  callsQuery.weight(..).scorer(..).Havealookat
  http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Query.html
  http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Weight.html
  http://lucene.apache.org/jva/2_4_0/api/org/apache/lucene/search/Scorer.html
  TouseyourownQueryclass
  inSolr,you’llneedtoimplementyourownsolrQueryParserPlugin
  thatusesyourownQParserthatgeneratesyourpreviously
  implementedluceneQuery.YouthencanuseitinSolrspecified
  here:
  http://wiki.apache.org/solr/SolrPlugins#QParserPlugin
  Thispartonimplementation
  shouldstaysimpleasthisisjustsomeglueingcode.
  Enjoyhacking
  Solr!
  share|improvethis
  answer
  answeredJun14’10at
  10:33
  Youcanoverridethelogic
  solrscoreruses.SolrusesDefaultSimilarityclassforscoring.1)
  makeaclassextendingDefaultSimilarity.2)overridethefunctions
  tf(),idf()etcaccordingtoyourneed.
public>
CustomSimilarity extends DefaultSimilarity {  
public CustomSimilarity()
  
{
  
super();
  
}
  
public float tf(int
  
freq) {
  
//your
  
code
  
return (float)
  
1.0;
  
}

  
public float>  
docFreq, int numDocs) {
  
//your code
  
return (float)
  
1.0;
  
}
  
}
  3)Aftercreatingaclass
  compileandmakeajar.4)putthejarinlibfolderof
  correspondingindexorcore.5)Changetheschema.xmlof
  correspondingindex.CustomSimilarity”/>
  Youcancheckoutvarious
  factorsaffectingscorehere
  Foryourrequirementyoucan
  createbucketsifyourscoreisinspecificrange.Alsoreadabout
  fieldboosting,documentboostingetc.Thatmightbehelpfulin
  yourcase.
  http://stackoverflow.com/questions/11748487/how-can-i-filter-solr-results-by-custom-score
  HowcanIfilterSOLRresultsbycustomscore
  I’musingsolrfunction
  queriestogeneratemyowncustomscore.Iachievethisusing
  somethingalongtheselines:
q=_val_:"my_custom_function()"  Thispopulatesthescore
  fieldasexpected,butitalsoincludesdocumentsthatscore0.I
  needawaytofiltertheresultssothatscoresbelowzeroarenot
  included.
  IrealizethatI’musing
  scoreinanon-standardwayandthatnormallythescorethat
  lucene/solrproduceisnotabsolute.However,producingmyown
  scoreworksreallywellformyneeds.
  I’vetriedusing{!frange
  l=0}butthiscausesthescoreforalldocumentstobe
  “1.0″.
  Isuspectpseudo-fields
  couldbeused,butsincesolr4isstillalpha,I’mlookingfora
  waytodoitusingSolr3.1.
  howcanIlimitbyscore
  beforesortinginasolrquery

  Iamsearching“product
  documents”.Inotherwords,mysolrdocumentsareproductrecords.
  Iwanttogetsaythetop50matchingproductsforaquery.ThenI
  wanttobeabletosortthetop50scoringdocumentsbynameor
  price.I’mnotseeingmuchonhowtodothis,sincesortingby
  score,thenbynameorpricewon’treallyhelp,sincescoresare
  floats.
  Iwouldn’tmindifIcould
  dosomethinglikemapthescorestoranges(likeascoreof
  8.0-8.99wouldgointhe8bucketscore),thensortbyrange,then
  bynames,butsincethereisbasicallynonormalizationtoscoring,
  thiswouldstillmakethingsabitharder.
  Tl;drHowdoIexcludelow
  scoringdocumentsfromthesolrresultsetbeforesorting?solr
  scoring
  share|improvethis
  question
  askedDec7’10at
  22:21
  3
  Answers
  Youcanusefrangeto
  achievethis,aslongasyoudon’twanttosortonscore(inwhich
  caseIguessyoucouldjustdothefilteringontheclientside).
  Yourquerywouldbesomethingalongthelinesof:
  q={!frange
  l=5}query($qq)&qq=[awesome
  product]&sort=priceasc
  Setthelargumentinthe
  q-frange-parametertothelowerboundyouwanttofilterscoreon,
  andreplacetheqqparameterwithyouruserquery.
  answeredDec8’10at
  10:23
  KarlJohansson
  1,046310
  thanks,sinceIcangeta
  reasonablefrangefromthefirsttimetheresultsaredisplayed
  sortedbyscorealone,thisworksgreat!–ZakDec9’10at
  18:40
  Idon’tthinkyoucansimply
  excludelowscoringdocumentsfromthesolrresultsetbefore
  sorting
  becausetherelevancescore
  isonlymeaningfulforagivencombinationofsearchqueryand
  resultingdocumentlist.I.e.scoresareonlymeaningfulwithina
  givensearchandyoucannotsetsomethresholdforall
  searches.
  IfyouwereusingJava(or
  PHP)youcouldgetthetop50documentsandthenre-sortthislist
  inyourprogramminglanguagebutIdon’tthinkyoucandoitwith
  justSOLR.
  Anyway,Iwouldrecommend
  youdon’tgodownthisrouteofre-sortingtheresultsfromSOLR,
  asitwillsimplyconfusetheuser.Peopleexpectsearchresultsto
  belikeGoogle(andmostothersearchengines),whereresultscome
  backinsomeformofTFIDFranking.
  Havingsaidthat,youcould
  usesomeothercriteriatoseparatedocumentswiththesame
  relevancescoresbyaddinganindex-timeboostfactorbasedona
  pricerangescale.
  I’dsuggestyouuseSOLRto
  itsstrengthsandusefacets.Provideapricerangefacetonthe
  left(likeEbay,Amazon,etal.)and/oraproductcategoryfacet,
  etc.Alsoprovidea“sort”widgettoallowtheresultstobesorted
  byproductname,iftheuserwantsit.
  [EDIT]thisquestionmight
  alsobeuseful:
  Digg-likesearchresult
  rankingwithLucene/Solr?
  AsobservedbyKarl
  Johansson,youcoulddothefilteringontheclientside:loadthe
  first50rowsoftheresponse(sortedbyscoredesc)andthen
  manipulatetheminJSforexample.
  ThejQueryDataTablesplugin
  worksfantasticallyforthatkindofthing:sorting,sortingon
  multiplecolumns,dynamicfiltering,etc.—andwithonly50rows
  itwouldbeveryfasttoo,sothatuserscan“play”withthe
  sortingandfilteringuntiltheyfindwhattheywant.
Scorefilter
  http://lucene.472066.n3.nabble.com/score-filter-td493438.html
  Hello,Isthereawaytosetascorefilter?Itried
  “+score:[1.2TO*]”butitdidnotwork.
  Manythanks,
  What’sthemotivationfor
  wantingtodothis?ThereasonIask,is
  scoreisarelativethingdeterminedbyLucenebasedonyourindex
  statistics.
  Itisonlymeaningfulforcomparingtheresultsofaspecificquery
  withaspecificinstanceoftheindex.Inotherwords,it
  isn’tusefultofilteronb/cthereisnowayofknowingwhata
  goodcutoffvaluewouldbe.So,youwon’tbeable
  todoscore:[1.2TO*]becausescoreisa
  notanactualField.
  Thatbeingsaid,you
  probablycouldimplementaHitCollectorattheLuceneleveland
  somehowhookitintoSolrtodowhatyouwant.Or,ofcourse,just
  stopprocessingtheresultsinyourappafteryouseeascorebelow
  acertainvalue.Naturally,thisstill
  meansyouhavetoretrievetheresults.
  
  Re:scorefilter
  Inmycase,forexample
  searchingabook.Someofthereturneddocumentsarewithhigh
  relevance(score>3),butsomeofdocumentwithlow
  score(nnn).Thiscauses
  someproblemforpagination.ForexampleifIonly
  needtodisplaythefirst10recordsIneedtoretrieveall1000
  documentstofigureoutthenumberofmeaningfuldocumentswhich
  havescore>nnn.
  Thx,
  Kevin
  What’sthemotivationfor
  wantingtodothis?ThereasonIask,is
  scoreisarelativethingdeterminedbyLucenebasedonyourindex
  statistics.
  Itisonlymeaningfulforcomparingtheresultsofaspecificquery
  withaspecificinstanceoftheindex.Inotherwords,it
  isn’tusefultofilteronb/cthereisnowayofknowingwhata
  goodcutoffvaluewouldbe.So,youwon’tbeable
  todoscore:[1.2TO*]becausescoreisanotanactual
  Field.
  Thatbeingsaid,you
  probablycouldimplementaHitCollectorattheLuceneleveland
  somehowhookitintoSolrtodowhatyouwant.Or,ofcourse,just
  stopprocessingtheresultsinyourappafteryouseeascorebelow
  acertainvalue.Naturally,thisstill
  meansyouhavetoretrievetheresults.
  -Grant
  Re:scorefilter
  Atwhatpointdoyoudraw
  theline?
  0.01istoolow,butwhatabout0.5or0.3?Infact,theremaybe
  querieswhere0.01isrelevant.
  Relevanceisatrickything
  andputtinginarbitrarycutoffsisusuallynotagoodthing.An
  alternativemightbetoinsteadlookatthedifferencebetween
  scoresandseeifthegapislargerthansomedelta,buteventhat
  issubjecttothevagariesofscoring.
  Whatkindofrelevance
  testinghaveyoudonesofartocomeupwith
  thosevalues?Seealso
  http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/
  Re:scorefilter
  Justdidsomeresearch.It
  seemsthatit’sdoablewithadditionalcodeaddedtoSolrbutnot
  outofbox.Thankyou,Grant.
  Atwhatpointdoyoudraw
  theline?
  0.01istoolow,butwhatabout0.5or0.3?Infact,theremaybe
  querieswhere0.01isrelevant.
  Relevanceisatrickything
  andputtinginarbitrarycutoffsisusuallynotagoodthing.An
  alternativemightbetoinsteadlookatthedifferencebetween
  scoresandseeifthegapislargerthansomedelta,buteventhat
  issubjecttothevagariesofscoring.
  Whatkindofrelevance
  testinghaveyoudonesofartocomeupwiththose
  values?See
  also
  http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/
  Re:scorefilter
  Don’tbotherdoingthis.It
  doesn’twork.
  Thisseemslikeagoodidea,
  somethingthatwouldbeusefulforalmosteveryLucene
  installation,butitisn’tinLucenebecauseitdoesnotworkin
  therealworld.
  Afewproblems:
  *Someuserswantevery
  matchanddon’tcarehowmanypagesofresultstheylook
  at.
  *Someusersareverybadat
  creatingqueriesthatmatchtheirinformationneeds.Othersare
  merelybad,notverybad.Thegoodmatchesfortheirqueryareon
  top,butthegoodmatchesfor
  theirinformationneedare
  onthethirdpage.
  *Misspellingscanputthe
  rightmatch(partialmatch)atthebottom.Ididthisyesterdayat
  mylibrarysite,typeing“KatherineKerr”insteadofthecorrect
  “KatharineKerr”.
  Theirsearchengineshowed
  nomatches(grrr),soIhadtosearchagainwith“Kerr”.
  *Mostusersdonotknowhow
  torepairtheirqueries,likeIdidwith“KatherineKerr”,changing
  itto“Kerr”.Eveniftheydo,youshouldn’tmakethem.Justshow
  theweaklyrelevantresults.
  *Documentshaveerrors,
  justlikequeries.Ifindbaddataonoursiteaboutonceamonth,
  andwehaveprofessionaleditors.Westillhaven’tfixedourentry
  for“BettyPage”toread“BettiePage”.
  *Peoplemayusenon-title
  wordsinthequery,likesearchingfor“batman”whentheywant“The
  DarkKnight”.
  So,don’tdothis.Ifyou
  areforcedtodoit,makesurethatyoumeasureyoursearchquality
  beforeandafteritisimplemented,becauseitwillgetworse.Then
  youcanstopdoingit.
  wunder
  Re:scorefilter
  +1.Ofcourseitis
  doable,butthatdoesn’tmeanyoushould,whichiswhatIwas
  tryingtosaybefore,(butwastypingonmyiPodsoitwasn’tfast)
  andwhichWalterhasdoneso.Itisentirely
  conceivabletomethatsomeonecouldsearchforaverycommonword
  suchthatthescoreofallrelevant(andthus,“good”)documents
  arebelowyourpredefinedthreshold.
  Atanyrate,proceedatyour
  ownperil.
  Toimplementit,lookintotheSearchComponent
  functionality.
  Re:scorefilter
  HelloGrant,
  Ineedtoframeaquerythat
  isacombinationoftwoquerypartsandIusea‘function’queryto
  preparethesame.Somethinglike:
  q={!type=funcq.op=AND
  df=text}product(query($uq,0.0),query($cq,0.1))
  where$uqand$cqaretwo
  queries.
  Now,Iwantasearchresult
  returnedonlyifIgetahiton$uq.So,Ispecifydefaultvalueof
  $uqqueryas0.0inorderforthefinalscoretobezeroincases
  where$uqdoesn’trecordahit.Eventhough,thescoringworksas
  expected(i.e,documentthatdon’tmatch$uqhaveascoreofzero),
  allthedocumentsarereturnedassearchresults.Isthereawayto
  filtersearchresultsthathaveascoreofzero?
  Thanksforyour
  help,
  Debdoot
  Re:scorefilter
  :Ineedtoframeaquery
  thatisacombinationoftwoquerypartsandIusea‘function’
  querytopreparethesame.Somethinglike:
  :q={!type=funcq.op=AND
  df=text}product(query($uq,0.0),query($cq,0.1))
  :where$uqand$cqaretwo
  queries.
  :
  :Now,Iwantasearch
  resultreturnedonlyifIgetahiton$uq.So,Ispecifydefault
  valueof$uqqueryas0.0inorderforthefinalscoretobezero
  incaseswhere$uqdoesn’trecordahit.Eventhough,thescoring
  worksasexpected(i.e,documentthatdon’tmatch$uqhaveascore
  ofzero),allthedocumentsare
  returnedassearchresults.Isthereawaytofiltersearchresults
  thathaveascoreofzero?
  a)youcouldwrapyourquery
  in{!frange}..butthatwillmakeeverything
  thatdoeshavea
  value>0.0getthesamefinalscore
  b)youcouldusean
  fq={!frange}thatrefersbacktoyouroriginal$q
  c)youcouldjustuseanfq
  thatrefersdirectlytoyour$uqsincethat’s
  whatyousayyouactaully
  wanttofilteroninthefirstplace..
  uq=…
  cq=…
  q={!type=funcq.op=AND
  df=text}product(query($uq,0.0),query($cq,0.1))
  fq={!v=uq}
Boostscoreforearlymatches
  Solr–Howtoboostscoreforearlymatches?
  upvote1downvote
  favorite
  HowcanIboostthescore
  fordocumentsinwhichmyquerymatchesaparticularfieldearlier.
  Forexample,searchingfor“superman”shouldgive“superman
  returns”ahigherscorethan“thereismysuperman”.Isthis
  possible?
  Uh,storethefirstfew
  wordsexplicitlyinanotherfield,andboostmatchesonthisfield.
  –aitchnyuAug22at9:45
  Theproblemthereisthat
  thesizeofthequerycanvaryfromsay3characterstosay100
  characters,andsodetermininghowmanywords/charstoindex
  separatelycanbedifficult.–techfoobarAug22at9:49
  Secondly,supposeiindex
  thefirst25characters,andonerecordhas“mysupermanblah..”
  andanotherrecordhas“supermanreturnsblah..”–bothwillmatch
  thequery“superman”andbothwillbeboostedwheniboostthis
  secondaryfield.–techfoobarAug22at9:50
  2Answers
  Thankyoufortheanswer.
  Butisolvedittodaybyusingtheapproachi’veoutlinedinmy
  answer.–techfoobarAug22at18:33
  Butthisisnotgoingto
  workifthewordsdonotoccurattheverystart.Maywanttocheck
  outpayloadsaswellwhereucanaddindextimesuggestionsaslaid
  downinthesecondoption.–JayendraAug22at18:35
  Willcheckthatoutaswell.
  However,thecurrentsolutioncanbemadetoworktoalargeextent
  byfinetuningthepsparametertomakeitmorelenient.I
  currentlyuse2(distbetween2termsinthepf)anditseemstobe
  workingquitewellformymediumsizeddataset(1000sofrecords,
  greatlyvaryingincontent).Willcheckoutyourpointandletyou
  knowifithelped.–techfoobarAug22at18:38
  up
  vote0downvoteaccepted
  SolveditmyselfafterreadingaLOTaboutthisonline.What
  specificallyhelpedmewasareplyonnabblewhichgoeslike(I
  useddismax,soexplainingthathere):
  Createaseparatefieldnamedsay‘nameString’whichstoresthe
  valueas“START“
  Changethesearchqueryto“START“
  AddthenewfieldnameStringasoneofthefieldstolookininthe
  queryfieldsparam(qf)
  Whilesearchingusetheparameterpf(phrasefield)asthenew
  fieldnameStringwithaphraseslopof1or2(lowervalueswould
  meanstrictersearching)
  Yourfinalqueryparamswill
  besomethinglike:
  q=_START_
  defType=dismax
  qf=name
  nameString
  pf=nameString
  ps=2
  Solr:HowcanIgetall
  documentsorderedbyscorewithalistof
  keywords?

  IhaveaSolr3.1database
  containingEmailswithtwofields:
  datetime
  text
  ForthequeryIhavetwo
  parameters:
  dateoftoday
  keywordarray(“importantthing”,“importanttoo”,“notso
  important,butmorethanaverage”)
  Isitpossibletocreatea
  queryto
  1.
  getALLdocumentsofthisdayAND
  2.
  sortthembyrelevancybyorderingthemsothattheemailwith
  containsmostofmykeywords(importantthings)scores
  best?
  Thepartwiththedateis
  notverycomplicated:
  fq=datetime[YY-MM-DDT00:00:00.000ZTO
  YY-MM-DDT23:59:59.999Z]
  Iknowthatyoucanboost
  thekeywordsthisway:
  q=text:”firstkeyword”^5OR
  text:”secondone”^2ORtext:”minusscoring”^0.5OR
  text:”*”
  ButhowdoIonlyusethe
  keywordstosortthislistandgetALLentriesinsteadofdoinga
  realyqueryandgetonlyafewentriesback?
  Thanksforhelp!
  2Answers
  Youneedtospecifyyour
  termsinthemainqueryandthenchangeyourdatequerytobea
  filterqueryontheseresultsbyaddingthefollowing.
  fq=datetime[YY-MM-DDT00:00:00.000ZTO
  YY-MM-DDT23:59:59.999Z]
  Soyoushouldhavesomething
  likethis:
  q=&fq=datetime[YY-MM-DDT00:00:00.000ZTO
  YY-MM-DDT23:59:59.999Z]
  Edit:Alittlemoreabout
  filterqueries(assuggestedbyrfreak).
  FromSolrWiki–FilterQuery
  Guidance–“Now,whatisafilterquery?Itissimplyapartofa
  querythatisfactoredoutforspecialtreatment.Thisisachieved
  inSolrbyspecifyingitusingthefq(filterquery)parameter
  insteadoftheq(mainquery)parameter.Thesameresultcouldbe
  achievedleavingthatquerypartinthemainquery.Thedifference
  willbeinqueryefficiency.That’sbecausetheresultofafilter
  queryiscachedandthenusedtofilteraprimaryqueryresult
  usingsetintersection.”
  Theseshouldbesortedby
  relevancyscorealready,thatisjustthedefaultbehaviorofSolr.
  Youcanseethescorebyaddingthatfield.
  fl=*,score
  IfyouusetheFull
  InterfaceforMakeAQueryontheAdminInterfaceonyourSolr
  installationathttp:////admin/form.jspyouwillseewhereyoucan
  specifythefilterquery,fields,andotheroptions.Youcancheck
  outtheSolrWikiformoredetailsontheoptionsandhowtheyare
  used.
  Ihopethatthishelps
  you.
  +!Thefilterqueryisan
  excellentsuggestion.Youmayconsideraddingabitaboutthe
  advantageofusingthefilterquerythere.–rfeakMay27’11at
  14:55
  Thankyou!Thefilterquery
  isworkingasexpected.ButunfortunatelyIstilldontknowhowto
  handlethekeywordsbecausetheyfiltertheemailsinsteadofonly
  sortthem.–DanielMay27’11at16:06
  Sortingbyrelevanceisdefaultbehavioronsolr/lucene.If
  yourresultsareunsatisfied,trytoputthekeywordsin
  quotes
  //Edit:Folowingtheanswer
  fromPaigeCook,usesomethinklikethat
  q=”important
  thing”&fq=datetime[YY-MM-DDT00:00:00.000ZTO
  YY-MM-DDT23:59:59.999Z]
  //2.ndupdate.Bythinking
  aboutthisanswer:quotesarenotangoodidea,becauseinthis
  caseyouwillonlyreceive“importantthing”mails,butno
  “importanttoo”
  ThePointis:whatkeywords
  youareusing.Because:searchingfor—importantthing—results
  inthehighestscoresfor“importantthing”mails.Butlucenedoes
  notknow,howtoscore“importanttoo”or“notsoimportant,but
  morethanaverage”inrelationtoyourkeywords.Anotheridea
  wouldbesearchingonlyfor“important”.Butthefield-values
  “importandthing”and“importandtoo”givesnearlythesamescore
  values,because50%ofthesearchedkeywords(inthiskey:
  “imported”)arepartofthefield-value.Soprobablyyouhaveto
  changeyourkeywords.Itcouldworkafterchangeing“importendto”
  into“alsoanimportantmail”,togetthebeastratioof
  search-word“important”andfield-valueinordertoscorethe
  shortestMail-discriptontothehighestvalue.
  Thanksforyouranswer!You
  pointexactlytomyproblembecausethekeywordsfilterthe
  documentsinsteadofonlysortingthemallaninfluencingthe
  relevancyscore.Idonotknowhowtohandlethis.–DanielMay27
  ’11at16:13
  Wasthispostusefulto
  you?
  Solrchangesdocument’s
  scorewhenitsrandomfieldvaluealtered

  http://stackoverflow.com/questions/6254587/solr-changes-documents-score-when-its-random-field-value-altered
  1downvote
  favorite
  Ineedtonavigateforthand
  backinSolrresultssetorderedbyscoreviewingdocumentsoneby
  one.Tovisualisethat,firstalistofdocumenttitlesis
  presentedtouser,thenheorshecanclickoneofthetitletosee
  moredetailsandthenneedstohaveanopportunitytomovetothe
  nextdocumentintheoriginallistwithoutgettingbackand
  clickinganothertitle.
  Duringviewingdocumentsget
  changed:theirdynamicfieldismodified(orcreatedisnotexists
  yet)tomarkthatdocumenthasalreadybeenviewed(usedinother
  search).
  TheproblemIfaceisthat
  whenthedocumentisalteredandre-indexedtokeepthosechanges,
  sometimes(andnotalways,whichisverydisturbing)itsplacein
  theresultssetforthesamequerychanges(inotherwords,it’s
  scorechangesasthatdoesn’thappenwhenbrowsingresultssorted
  byoneofthedocuments’fields).So,“Previous”/“Next”
  navigationdoesn’tworkproperly.
  I’mnotusinganycustom
  weightingorboostersonfieldsforscorecalculation.Also,that
  dynamicfieldchangedduringbrowsingdoesn’tparticipateinthe
  queryusedtogettherecordsetbrowsed.
  So,thequestionsare:can
  themodificationofthedocument’sfieldnotincludedinthequery
  changeitsrelevancescore?Andifitcan,thenhowcanIcontrol
  that?
  UPDATE
  Ididsometestsandcanadd
  thefollowing:
  1.
  Documentchangesitsplaceintheresultsetevenifnofieldis
  amended–justrequestingthedocumentandre-indexingitwithout
  anychangestoitsfieldsmakesittakeanotherplacenexttimethe
  samequeryoverthesameindexisexecuted.
  2.
  Thathappenseveniftheresultsetissortedexplicitly
  (“first_nameDESC”),soscore(whichdependsontheupdatedate)is
  notinvolved.Thedocumentstaysthesame,itsfieldresultsetis
  sortedbyisthesame,yetitspositionchanges.
  Stillhavenoideahowto
  avoidthat.
  2Answers
  InSolr,ifyourfieldis
  “indexed”,itwillhaveaneffectontherelevancyranking
  (“stored”fieldsshowupinsearchresultsbutarenotnecessarily
  searchable).Ifthefieldsinquestionaren’tmarkedasindexed
  thenyouaregoodtogo.Notethat“indexed”and“stored”arenot
  necessarilythesame,henceyouconfusionaboutresultslists
  changingeventhoughnotallfieldsareshown(afieldcanbe
  “indexed”andnot“stored”aswell).
  InthiscaseIthinkyou
  wantyour“viewed”fieldtobe“stored”butnot“indexed”.Ifyou
  reallywanttocontrolthequery,youcanusecopyFieldtocopythe
  relevantresultsintoasinglesearchablefield.Youcanalsoboost
  termsordocumentssothatcertainfieldsare“lessimportant”to
  thesearchquery.
  Ifyouwanttoseehowthe
  relevancyrankingsarecalculated,youcanadd“debugQuery=on”to
  theendofyourSolrQuery(seetheRelevancyFAQformore
  info).
  However,allthatbeing
  said,Iwouldrecommendyoucacheyoursearchresultquery(at
  leastforthefirstpageforyourresults),sinceyouwillalways
  haveresultschanging(documentsadded,removedbyotherusers,
  etc).YourbestbetistodesignaUIthatanticipatesthis,orat
  leastbatchesauser’squery.
  Thanks,forsomereasonI
  wassurechangestofieldsnotparticipatinginthequerydon’t
  affectthecalculatedscore.Inmycaseitisnecessarytohave
  thisfieldindexedasthereisanotherquerywhereIneedtofilter
  documentssearchingonlyviewedoronlynotviewedbefore.Caching
  isalsonotsuitableasusersissupposedtonavigatethroughthe
  wholeresultset,notonlythroughthepage(well,cachingstill
  possibleandtobehonestbearableintermsofresourcesbutjust
  notelegant).I’lltrytoboostthefieldbeingsearchedandtell
  ifthatworks.–YuriyJun7’11at7:45
  Justnoticedthatitalso
  happenswhentheresultsaresortedbyotherfieldthanscore.How
  that’spossible?Ithoughtiforderingisspecifiedandscoreis
  notintheclauseexplicitly(say,orderingislike“first_name
  DESC”),itdoesn’tinfluencetheordering.However,itseemsit
  does.HowcanIgetridofthat?–YuriyJun8’11at
  14:11
  Okay,lookslikeboosting
  works,buthasnoeffect.IfIboostthefieldIamsearchingin,
  allthematchesareboostedequallyandstilltherecently
  re-indexeddocumentsgetsomedeltaintheirrelevancewhichmakes
  difference.Thereshouldbeawaytoexcludethedateoflast
  updatefromtheorderingcompletelybutIcan’tfindityet…–
  YuriyJun8’11at14:50
  feedback
  I’ve
  foundthesolutionwhichdoesn’teliminatetheproblemcompletely
  butmakesitmuchlesslikelytohappen.
  Sotheproblemhappenswhen
  thedocumentsaresortedbysomefieldandthereisanumberof
  themwiththesamevalueinthisfield(e.g.resultsetissorted
  byfirstname,andthereare100entriesfor“John”).
  Thisiswhentheindexed
  timegetsinvolved–apparentlySolrusesittosortthedocuments
  whentheirmainsortingfieldsareidentical.Tomakethiscase
  muchlessprobable,youneedtoaddmoresortingfields,e.g.
  “first_namedesc”shouldbecome“first_namedesc,last_namedesc,
  register_dateasc”.
  Also,addingdocument’s
  uniqueidasthelastsortingfieldshouldremovetheproblem
  completely(thesetofsortingfieldswillneverbeidenticalfor
  anytwodocumentsintheindex).
  share|improvethis
  answer
RelevanceCustomization
  http://lucene.472066.n3.nabble.com/Relevance-Customization-td501310.html
  Hiall.
  Iwanttoknowifits
  possibletocustomizethesolrrelevance,somehing
  likethis:
  1–Icreateastaticscore
  foreachdocumentandindexit.
  2–Ichangetherelevance
  toScore(Solr)+Score(Static)wherethesolrscoreisequalto30%
  ofthetotalscore.Mixingthetwoscoresintoonlyone.
  Thisisdefferentofsorting
  byminestaticsocreandafterbysolrscorebecauseIdon’twant
  tokillsolrscore,justgiveitalittleless
  importance.
  Thereisawaytodo
  this?
  Thank’s
  Re:RelevanceCustomization
  Itcanbedonewith
  somethinglikeq=yourQuery_val_:yourStaticScoreField
  http://wiki.apache.org/solr/FunctionQuery#fieldvalue
  Butthisaddssolrscore
  withstaticscore.Iamnotsurehowtoget30%ofsolrscore.May
  besomethinglike?
  q=yourQuery^0.3_val_:yourStaticScoreField^0.7
Modify  SOLRscoring

  Hieverybody,
  I’musingSOLRwithaschema
  (forexample)likethis:parutiondate,date,
  indexed,notstored
  fulltext,stemmed,indexed,
  notstored
  Iknowit’spossibleto
  orderbyafieldormore,butIwanttoorderbyscoreandmodify
  the“scrore”"formula.I’llwantkeeptheSOLR
  scorebutaddanewparameterintheformulatoboostthescoreof
  themostrecentdocument.
  Whatisthebestwaytodo
  this?
  Thanks.
  Excuseformy
  english.
  RE:modifySOLRscoring
  Ibelieveyoucanusea
  functionquerytodothis:
  http://wiki.apache.org/solr/FunctionQuery
  ifyouembedthefollowing
  inyourquery,youshouldgetaboostformorerecentdate
  values:
  _val_:”ord(dateField)”
  Where“dateField”isthe
  fieldnameofthedateyouwanttouse.
  Re:modifySOLRscoring
  http://lucene.472066.n3.nabble.com/modify-SOLR-scoring-td497348.html
  Iaminterestedinavery
  similartopiclikeyours.Iwanttomodifythefieldnamed“score”
  andthedocumentboostbutnotreindextheallfieldssinceitwouldtaketo
  muchpower.
  Pleaseletmeknowifyou
  findasolutiontothis.
  Kindly
  Changeorderbefore
  returningdata

  http://stackoverflow.com/questions/4965172/change-order-before-returning-data
  Isthereanywaytochange
  orderofresultinSOLR.E.gwhenIqueryinSOLRiwillget1000
  recordswithhighestscore,theninthose1000recordsIwilluse
  myown
  functiontochangeorderagainandjustget10recordsof
  thoserecords.Icanget1000recordsandprocessbyphporjava,
  butIhavetotransfer1000recordsfromSOLRservertowebserver
  andIdontwantthat,Ijustwanttoget10recordsafterchanging
  orderandusepaging.IsSOLRsupportthiskindofcustom
  function?
  Answers
  Ifyoufunctioncanbe
  appliedwhentherecordsareinitiallyindexed,youcandoitthere
  andaddtheresultasavalueontherecord.Thensorttheresult
  setbytheprecalculated
  value.Ifnot,ihaven’tworkedwithitdirectly,butthis
  threadseemstohavetheansweryou’relookingfor
  HiMycaseisveryspecial,
  Ihadpreindexscoreindatabasealready.Letmegiveoneexample,
  Ihaveshoppingsite,whenIsearchforTVLCD32inch,Igotmany
  resultfromsomedifferentbranchlikeLG,Toshiba…andmay
  resultforLGappearconsequentlyIwanttoseparateite.gIdont
  want3resultsforLGsitnexttogether,CurrentlyIget1000best
  records(baseonscore)andchangetheorderagainusingPHP,nowI
  wanttomovethisjobtoSOLR(Idontwanttransferdatatomuch
  betweenSOLRandWebserver,Ijustneed10recordstodisplay)–
  user612433Feb11’11at3:45
  Yesyoucancreateacolumn
  withtheinfoyouwanttobetakenintoaccountintothe
  score.
  Forex,fora“popularity”
  column,yourquerywouldbe:
  yourquery&&
  _val_:”popularity”^0.7
  0.7beingtheboostfactor
  intothefinalscore.youcanalsofiltertheresultsettoget
  le***esults:
  yourquery&&
  fq=popularity:[10TO*]
limitingthetotalnumberofdocuments  matched

  http://search-lucene.com/m/4AHNF17wIJW1/
  Re:limitingthetotal
  numberofdocumentsmatched
  YonikSeeley2010-07-17,
  00:55
  OnWed,Jul14,2010at5:46
  PM,Paul
  wrote:
  Ithoughtofanother
  waytodoit,butIstillhaveonethingIdon’tknowhowtodo.I
  coulddothesearchwithoutsortingforthe50thpage,thenlookat
  therelevancyscoreonthefirstitemonthatpage,thenrepeatthe
  search,butaddscore>thatrelevancyasa
  parameter.Isitpossibletodoasearchwith“score:[5to*]“?
  Itdidn’tworkinmyfirst
  attempt.
  frangecouldpossiblehelp(rangequeryonanarbitrary
  function).
  http://www.lucidimagination.com/blog/tag/frange/
  Soperhapssomething
  like
  q={!frange
  l=0.85}query($qq)
  qq=
  where0.85isthelower
  boundyouwantforscoresandqqisthenormalrelevancy
  query
  -Yonik
  http://www.lucidimagination.com
  OnWed,Jul14,2010at5:34
  PM,Paul
  wrote:
  Iwashopingforaway
  todothispurelybyconfigurationandmakingthecorrectGET
  requests,butifthereisawaytodoitbycreatingacustom
  RequestHandler,IsupposeIcouldplungeintothat.Wouldthat
  yieldthebestresults,andwouldthatbeparticularly
  difficult?
  >>OnWed,Jul14,2010at
  4:37PM,Nagelberg,Kallin
  Soyouwanttotakethetop
  1000sortedbyscore,thensortthosebyanotherfield.It’sa
  strangecase,andIcan’tthinkofacleanwaytoaccomplishit.
  Youcoulddoitintwoqueries,wherethefirstisbyscoreandyou
  onlyrequestyourIDstokeepitsnappy,thendoasecondquery
  againsttheIDsandsortbyyourotherfield.1000seemslikealot
  forthatapproach,butwhoknowsuntilyoutryitonyour
  data.
  >>>-Kallin
  Nagelberg
  >>>Subject:
  limitingthetotalnumberofdocumentsmatched
  I’dliketolimitthetotal
  numberofdocumentsthatarereturnedforasearch,particularly
  whenthesortorderisnotbasedonrelevancy.Inotherwords,if
  theusersearchesforaverycommonterm,theymightgettensof
  thousandsofhits,andiftheysortby“title”,thenveryhigh
  relevancydocumentswillbeinterspersedwithverylowrelevancy
  documents.I’dliketosetalimittothe1000mostrelevant
  documents,thensortthosebytitle.Isthereawaytodo
  this?
  IguessIcouldalways
  retrievethetop1000documentsandsortthemintheclient,but
  thatseemsparticularlyinefficient.Ican’tfindanyotherwayto
  dothis,though.



运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-629499-1-1.html 上篇帖子: lucene solr小知识点 下篇帖子: Solr DisjunctionMax 注解
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表