设为首页 收藏本站
查看: 763|回复: 0

[经验分享] Custom SOLR Search Components

[复制链接]

尚未签到

发表于 2015-7-17 08:43:41 | 显示全部楼层 |阅读模式
  I've been building some custom search components for SOLR lately, so wanted to share a couple of things I learned in the process. Most likely this is old hat to people who have been doing this for a while, but thought I'd share, just in case it benefits someone...

Passing State
  In a previous post, I described a custom SOLR search handler returns layered search results for a given query term (and optional filters). As I went further, though, I realized that I needed to return information relating to facets and category clusters as well. Of course, I could have added this stuff into the handler itself, but splitting the logic across a chain of search components seemed to be more preferable, readability and reusability wise, so I went that route.
  So the first step was to refactor my custom SearchHandler into a SearchComponent. Not much to do there, except to subclass SearchComponent instead of RequestHandlerBase and move the handleRequestBody(SolrQueryRequest,SolrQueryResponse) to a process(ResponseBuilder) method. The request and response objects are accessible from the ResponseBuilder as properties, ie, ResponseBuilder.req and ResponseBuilder.rsp. I then declared this component and an enclosing handler in solrconfig.xml, something like this:




1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26



  

value1
value2



1
2




explicit
*,score,id
xml


component1
component2




  I've also added a second component to the chain above (just so I don't have to show this snippet again later), hope its not too confusing. Obviously there can be multiple components before and after my search handler turned search component, but for the purposes of this discussion, I'll keep things simple and just concentrate on this one other component and pretend that it has multiple unique (and pertinent) requirements.
  Now, assume that the second component needed data that was already available, or can be easily generated by component1. Its actually true in my case, since I needed a BitSet of document ids in the search results in my second component, which I could easily get by collecting them while looping through the SolrDocumentList of results in my first component. So it seemed kind of wasteful to compute this again. So I updated this snippet of code in component1's process() method (what used to be my handleRequestBody() method):




1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19



  public void process(ResponseBuilder rb) throws IOException {
...
// build and write response
...
OpenBitSet bits = new OpenBitSet(searcher.maxDoc());
List slice = new ArrayList();
for (Iterator it = results.iterator(); it.hasNext(); ) {
SolrDocument sdoc = it.next();
...
bits.set(Long.valueOf((Integer) sdoc.get("id")));
if (numFound >= start && numFound < start + rows) {
slice.add(sdoc);
}
numFound++;
}
...
rsp.add("response", results);
rsp.add("_bits", bits);
}

  In my next component (component2), I simply grab the OpenBitSet data structure by name from the NamedList, use them to generate the result for this component, stick the result back into the response, and discard the temporary data. The last is so that the data does not appear on the response XML (for both aesthetic and performance reasons).




1
2
3
4
5
6
7
8
9
10
11
12
13
14
15



  public void process(ResponseBuilder rb) throws IOException {
Map cres = new HashMap();
NamedList nl = rb.rsp.getValues();
OpenBitSet bits = (OpenBitSet) nl.get("_bits");
if (bits == null) {
logger.warn("Component 1 must write _bits into response");
rb.rsp.add(COMPONENT_NAME, cres);
return;
}
// do something with bits and generate component response
doSomething(bits, cres);
// stick the result into the response and delete temp data
rb.rsp.add("component2_result", cres);
rb.rsp.getValues().remove("_bits");
}

  Before I did this, I investigated if I could subclass the XmlResponseWriter to ignore NamedLists with "hidden" names (ie names prefixed with underscore), but the XmlResponseWriter calls XMLWriter which does the actual XML generation, and XMLWriter is final (at least in SOLR 1.4.1). Good thing too, forced me to look for and find a simpler solution :-).
  So there you have it - a simple way to pass data between components in a SOLR Search RequestHandler. Note that it does mean that component2 is always dependent on component1 (or some other component that produces the same data) upstream to it, so these components are no longer truly reusable pieces of code. But this can be useful if you really need it and you document the requirement (or complain about it if not met, as I've done here).

Reacting to a COMMIT
  The second thing I needed to do in component2 was to give it some reference data that it would need to compute its results. The reference data is generated from the contents of the index, and the generation is fairly heavyweight, so you don't want to do this on every request.
  Now one of the cool things about SOLR is its built-in incremental indexing feature (one of the main reasons we considered using SOLR in the first place), so you can POST data to a running SOLR instance followed by a COMMIT, and voila: your searcher re-opens with the new data.
  Of course, this also means that if we want to provide accurate information, the reference data should be regenerated whenever the searcher is reopened. The way I went about doing this is mostly derived from how the SpellCheckerComponent does it, in order to regenerate its dictionaries -- by hooking into the SOLR event framework.
  To do this, my component2 implements SolrCoreAware in addition to extending SearchComponent. This requires me to implement the inform(SolrCore) method, which is invoked by SOLR after the init(NamedList) but before prepare(ResponseBuilder) and process(ResponseBuilder). In the inform(SolrCore) method, I register a listener for the firstSearcher and newSearcher events (described in more detail here).
  I then build the inner listener class, which implements SolrEventListener, which requires me to provide implementations for newSearcher() and postCommit() methods. Since my listener is a query-side listener, I provide an empty implementation for postCommit(). The newSearcher() method contains the code to generate the reference sets. Here is the relevant snippet of code from the component.




1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47



public class MyComponent2 extends SearchComponent implements SolrCoreAware {
private RefData refdata; // this needs to be regenerated on COMMIT
@Override
public void init(NamedList args) {
...
}
@Override
public void inform(SolrCore core) {
listener = new MyComponent2Listener();
core.registerFirstSearcherListener(listener);
core.registerNewSearcherListener(listener);
}
@Override
public void prepare(ResponseBuilder rb) throws IOException {
...
}
@Override
public void process(ResponseBuilder rb) throws IOException {
...
// do something with refdata
...
}
private class MyComponent2Listener implements SolrEventListener {
@Override
public void init(NamedList args) { /* NOOP */ }
@Override
public void newSearcher(SolrIndexSearcher newSearcher,
SolrIndexSearcher currentSearcher) {
RefData copy = new RefData();
copy = generateRefData(newSearcher);
refdata.clear();
refdata.addAll(copy);
}
@Override
public void postCommit() { /* NOOP */ }
}
...
}

  Notice that I have registered the listener to listen on both firstSearcher and newSearcher events. This way, it gets called on SOLR startup (reacting to a firstSearcher event), and again each time the searcher is reopened (reacting to a newSearcher event).
  One other thing... since the generation of RefData takes some time, its best to have the listener's newSearcher method build a copy and then repopulate the refdata variable from the copy, that way the component continues to use the old data until the new one is available.
  And thats pretty much it for today. Till next time.
  http://sujitpal.blogspot.com/2011/04/custom-solr-search-components-2-dev.html

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-87453-1-1.html 上篇帖子: solr学习笔记-linux下配置solr 下篇帖子: [solr]
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表