设为首页 收藏本站
查看: 1622|回复: 0

[经验分享] Realtime Search: Solr vs Elasticsearch

[复制链接]

尚未签到

发表于 2015-7-16 12:06:37 | 显示全部楼层 |阅读模式
Realtime Search: Solr vs Elasticsearch | Socialcast Engineering
Realtime Search: Solr vs Elasticsearch
Ryan SonnekRyan Sonnek
Tuesday May 31st, 2011 by Ryan Sonnek
19 comments
Tweet
What is Elasticsearch?
Elasticsearch is REST based, distributed search engine powered by the excellent Lucene library. The built in JSON + HTTP API provides an elegant platform perfect for integrating with (ex: the elastic_searchable ruby gem).  It’s simple, scalable and “cool, bonsai cool“.
Why is it better than Solr?
First of all, let’s set the record straight: Solr is fast. I’m serious…it’s really fast! Solr is the defacto search engine for a reason. It’s stable, reliable and out of the box, it outperforms nearly every search solution for basic vanilla searches (including Elasticsearch).
Unfortunately, it is really easy to break Solr as well. All it takes is to performing searches while concurrently updating the index with new content. This is a pretty serious problem if you need to update your search index regularly.
Now throw a few million documents into the index and Solr will be buckling at the knees while Elasticsearch doesn’t break a sweat!
It is painfully apparent that Solr’s architecture was not built for realtime search applications. The demands of realtime web applications require delivery of updates in near realtime as new content is generated by users. The distributed nature of Elasticsearch allows it to keep up with concurrent search + index requests without skipping a beat.
Realworld Results…
After transitioning our search infrastructure from Solr to Elasticsearch, we saw an instant ~50x improvement in search performance!
And now for something a bit more interesting…
The typical realtime search architecture goes something like this:
    index user content into the search engine
    perform set of queries against search engine to determine if content matches particular criteria
    perform specific logic notifying registered channels that new content is available
Elasticsearch can support this model quite well, but it also offers a feature that turns this entire workflow on it’s head.
Introducing: Percolation!
Elasticsearch percolation is similar to webhooks. The idea is to have Elasticsearch notify your application when new content matches your filters instead of having to constantly poll the search engine to check for new updates.
The new workflow looks like this:
    register specific query (percolation) in Elasticsearch
    index new content (passing a flag to trigger percolation)
    the response to the indexing operation will contain the matched percolations
This is the perfect architecture for realtime search and a true gamechanger.
The Bottom Line
Solr may be the weapon of choice when building standard search applications, but Elasticsearch takes it to the next level with an architecture for creating modern realtime search applications. Percolation is an exciting and innovative feature that singlehandedly blows Solr right out of the water. Elasticsearch is scalable, speedy and a dream to integrate with. Adios Solr, it was nice knowing you.
Tagged: search
Comments
    David says:
    Cool article. Now, i know why I love ES ! ;-)
    Commented on May 31, 2011
    jrawlings says:
    Was the ‘Search Fresh Index while Idle’ performed against an elasticsearch 5 shard index (the default setup for a newly created index) or a single shard index?
    Commented on May 31, 2011
        Ryan Sonnek
        Ryan Sonnek says:
        @jrawlings these benchmarks are for the “out of the box” vanilla install of Elasticsearch and Solr so yes, this is using the 5 shard index setting.
        Commented on May 31, 2011
    umad says:
    Elasticsearch is a peach, when it doesn’t break. I’ve had so many nightmares trying to recover from a broken elasticsearch cluster that I wouldn’t recommend it to anyone.
    I guess for small sites it’s ok. For serious business, I’ll stick with solr.
    It would be nice to see a comparison with riaksearch as well.
    Commented on May 31, 2011
        Ryan Sonnek
        Ryan Sonnek says:
        @umad in our experience, the exact opposite is true. We pushed Solr so hard to try and support realtime search that we constantly had to deal with Java out of memory issues. Elasticsearch is much more stable (even for a beta application) and runs *so* much smoother.
        I’m not sure what you classify a “small” site. Our search index contains millions of documents and we’re performing hundreds of requests per minute and Elasticsearch has not had a single hiccup yet.
        Commented on June 1, 2011
    Philip Ingram says:
    That percolation business is awesome. Webhooks make updating realtime data sources easy, and it’s brilliant that Elasticsearch takes that approach. Thanks for sharing.
    Commented on May 31, 2011
    Ben says:
    Good blog post. What were some of the parameters around index sizes (per shard) and commit rates? We have some massive warming times on our solr indexes that requires us to batch our adds before a commit, certainly not a position to be in with real time search though. I can see how without tuning and default cache warming you might run into bunches of overlapping warming searchers.
    Commented on May 31, 2011
    MarcMarc says:
    And why not using master-slave configuration in SOLR? Isn`t that perfect solution for sepearating add doc/query operations?
    Commented on June 1, 2011
        Ryan Sonnek
        Ryan Sonnek says:
        @MarcMarc master-slave really isn’t an option for realtime search applications. The current Solr replication solution is not synchronous so once your update operation is complete on the master, the data is not yet available on all slaves for subsequent searches.
        Introducing master-slave for the search index also introduces a lot of operational complexity that if you can avoid, you really should. :)
        Commented on June 1, 2011
    Vlad Zloteanu says:
    Ryan, what was the commit strategy you used with Solr? Commit after each request, autocommit after X secs, autocommit after X docs? This can greatly impact update performance. See http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs, http://blog.raspberry.nl/2011/04/08/solr-update-performance/ and http://www.elevatedcode.com/articles/2009/01/14/speeding-up-solr-indexing/
    Commented on June 1, 2011
        Ryan Sonnek
        Ryan Sonnek says:
        @vlad we require all content to be immediately available for searches after indexing, so we commit after each update operation. this the nature of the beast when building a true realtime search application and as you point out is not the “preferred” way to integrate with Solr.
        Commented on June 1, 2011
    Otis Gospodnetic says:
    Nice post. You’ll need to compare ES and Solr once Solr starts making use of the underlying Lucene NRT mechanism.
    Just to make it clear to readers not familiar with the underlying details:
    It is Lucene that adds the NRT support. ES uses it, while Solr does not use it yet, which is different from Solr using the same Lucene API as ES and doing it/still performing poorly.
    Commented on June 1, 2011
    Peter Bengtsson says:
    Being a Xapian fan as of many years I’d love to see Xapian benchmarked against ES.
    Commented on June 1, 2011
    Andy says:
    What’s the difference between “search fresh index” and “search full index”?
    Were you running Solr and ElasticSearch on the same hardware?
    Commented on June 1, 2011
        Ryan Sonnek
        Ryan Sonnek says:
        @andy the fresh index benchmarks are done against an empty/clean index. the “full index” benchmarks were done after populating the index with a few million documents. The index is never technically “full”, but it was just a quick way of getting more realistic and real world benchmarks.
        Commented on June 1, 2011
    db says:
    Interesting that umad says he had so many issues with broken clusters, that he stopped recommending ES for production usage. We’ve been running in production for 6 months with significant traffic volume on behalf of demanding clients.
    There have been some nice robustness improvements in ES 0.16
    We evaluated Solr vs ES and for our data with a wide range of queries, ES was significantly faster than Solr. Tuning Solr is challenging.
    David
    Commented on June 7, 2011
    Steven Hildreth says:
    Solr doesn’t support GeoPolygons either, so if you need spatial searches look to ElasticSearch.
    Commented on August 24, 2011
    David says:
    Field collapsing (grouping, or whatever you call it) is still awaited in ES, but exists in Solr.
    This is in some particular use cases a must have feature (think about SKUs in an index and search results must be products (and not SKU)
    Commented on September 16, 2011

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-87288-1-1.html 上篇帖子: solr+facet学习笔记 下篇帖子: Solr查询详解
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表