Realtime Search: Solr vs Elasticsearch

neversoft · 发表于 2015-7-16 12:06:37

Realtime Search: Solr vs Elasticsearch | Socialcast Engineering

Realtime Search: Solr vs Elasticsearch
Ryan SonnekRyan Sonnek
Tuesday May 31st, 2011 by Ryan Sonnek
19 comments
Tweet
What is Elasticsearch?
Elasticsearch is REST based, distributed search engine powered by the excellent Lucene library. The built in JSON + HTTP API provides an elegant platform perfect for integrating with (ex: the elastic_searchable ruby gem).  It’s simple, scalable and “cool, bonsai cool“.
Why is it better than Solr?
First of all, let’s set the record straight: Solr is fast. I’m serious…it’s really fast! Solr is the defacto search engine for a reason. It’s stable, reliable and out of the box, it outperforms nearly every search solution for basic vanilla searches (including Elasticsearch).
Unfortunately, it is really easy to break Solr as well. All it takes is to performing searches while concurrently updating the index with new content. This is a pretty serious problem if you need to update your search index regularly.
Now throw a few million documents into the index and Solr will be buckling at the knees while Elasticsearch doesn’t break a sweat!
It is painfully apparent that Solr’s architecture was not built for realtime search applications. The demands of realtime web applications require delivery of updates in near realtime as new content is generated by users. The distributed nature of Elasticsearch allows it to keep up with concurrent search + index requests without skipping a beat.
Realworld Results…
After transitioning our search infrastructure from Solr to Elasticsearch, we saw an instant ~50x improvement in search performance!
And now for something a bit more interesting…
The typical realtime search architecture goes something like this:
index user content into the search engine
perform set of queries against search engine to determine if content matches particular criteria
perform specific logic notifying registered channels that new content is available
Elasticsearch can support this model quite well, but it also offers a feature that turns this entire workflow on it’s head.
Introducing: Percolation!
Elasticsearch percolation is similar to webhooks. The idea is to have Elasticsearch notify your application when new content matches your filters instead of having to constantly poll the search engine to check for new updates.
The new workflow looks like this:
register specific query (percolation) in Elasticsearch
index new content (passing a flag to trigger percolation)
the response to the indexing operation will contain the matched percolations
This is the perfect architecture for realtime search and a true gamechanger.
The Bottom Line
Solr may be the weapon of choice when building standard search applications, but Elasticsearch takes it to the next level with an architecture for creating modern realtime search applications. Percolation is an exciting and innovative feature that singlehandedly blows Solr right out of the water. Elasticsearch is scalable, speedy and a dream to integrate with. Adios Solr, it was nice knowing you.
Tagged: search
Comments
David says:
Cool article. Now, i know why I love ES ! ;-)
Commented on May 31, 2011
jrawlings says:
Was the ‘Search Fresh Index while Idle’ performed against an elasticsearch 5 shard index (the default setup for a newly created index) or a single shard index?
Commented on May 31, 2011
      Ryan Sonnek
      Ryan Sonnek says:
      @jrawlings these benchmarks are for the “out of the box” vanilla install of Elasticsearch and Solr so yes, this is using the 5 shard index setting.
      Commented on May 31, 2011
umad says:
Elasticsearch is a peach, when it doesn’t break. I’ve had so many nightmares trying to recover from a broken elasticsearch cluster that I wouldn’t recommend it to anyone.
I guess for small sites it’s ok. For serious business, I’ll stick with solr.
It would be nice to see a comparison with riaksearch as well.
Commented on May 31, 2011
      Ryan Sonnek
      Ryan Sonnek says:
      @umad in our experience, the exact opposite is true. We pushed Solr so hard to try and support realtime search that we constantly had to deal with Java out of memory issues. Elasticsearch is much more stable (even for a beta application) and runs *so* much smoother.
      I’m not sure what you classify a “small” site. Our search index contains millions of documents and we’re performing hundreds of requests per minute and Elasticsearch has not had a single hiccup yet.
      Commented on June 1, 2011
Philip Ingram says:
That percolation business is awesome. Webhooks make updating realtime data sources easy, and it’s brilliant that Elasticsearch takes that approach. Thanks for sharing.
Commented on May 31, 2011
Ben says:
Good blog post. What were some of the parameters around index sizes (per shard) and commit rates? We have some massive warming times on our solr indexes that requires us to batch our adds before a commit, certainly not a position to be in with real time search though. I can see how without tuning and default cache warming you might run into bunches of overlapping warming searchers.
Commented on May 31, 2011
MarcMarc says:
And why not using master-slave configuration in SOLR? Isn`t that perfect solution for sepearating add doc/query operations?
Commented on June 1, 2011
      Ryan Sonnek
      Ryan Sonnek says:
      @MarcMarc master-slave really isn’t an option for realtime search applications. The current Solr replication solution is not synchronous so once your update operation is complete on the master, the data is not yet available on all slaves for subsequent searches.
      Introducing master-slave for the search index also introduces a lot of operational complexity that if you can avoid, you really should. :)
      Commented on June 1, 2011
Vlad Zloteanu says:
Ryan, what was the commit strategy you used with Solr? Commit after each request, autocommit after X secs, autocommit after X docs? This can greatly impact update performance. See http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs, http://blog.raspberry.nl/2011/04/08/solr-update-performance/ and http://www.elevatedcode.com/articles/2009/01/14/speeding-up-solr-indexing/
Commented on June 1, 2011
      Ryan Sonnek
      Ryan Sonnek says:
      @vlad we require all content to be immediately available for searches after indexing, so we commit after each update operation. this the nature of the beast when building a true realtime search application and as you point out is not the “preferred” way to integrate with Solr.
      Commented on June 1, 2011
Otis Gospodnetic says:
Nice post. You’ll need to compare ES and Solr once Solr starts making use of the underlying Lucene NRT mechanism.
Just to make it clear to readers not familiar with the underlying details:
It is Lucene that adds the NRT support. ES uses it, while Solr does not use it yet, which is different from Solr using the same Lucene API as ES and doing it/still performing poorly.
Commented on June 1, 2011
Peter Bengtsson says:
Being a Xapian fan as of many years I’d love to see Xapian benchmarked against ES.
Commented on June 1, 2011
Andy says:
What’s the difference between “search fresh index” and “search full index”?
Were you running Solr and ElasticSearch on the same hardware?
Commented on June 1, 2011
      Ryan Sonnek
      Ryan Sonnek says:
      @andy the fresh index benchmarks are done against an empty/clean index. the “full index” benchmarks were done after populating the index with a few million documents. The index is never technically “full”, but it was just a quick way of getting more realistic and real world benchmarks.
      Commented on June 1, 2011
db says:
Interesting that umad says he had so many issues with broken clusters, that he stopped recommending ES for production usage. We’ve been running in production for 6 months with significant traffic volume on behalf of demanding clients.
There have been some nice robustness improvements in ES 0.16
We evaluated Solr vs ES and for our data with a wide range of queries, ES was significantly faster than Solr. Tuning Solr is challenging.
David
Commented on June 7, 2011
Steven Hildreth says:
Solr doesn’t support GeoPolygons either, so if you need spatial searches look to ElasticSearch.
Commented on August 24, 2011
David says:
Field collapsing (grouping, or whatever you call it) is still awaited in ES, but exists in Solr.
This is in some particular use cases a must have feature (think about SKUs in an index and search results must be products (and not SKU)
Commented on September 16, 2011

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] Realtime Search: Solr vs Elasticsearch

浏览过的版块

扫码加入运维网微信交流群