设为首页 收藏本站
查看: 555|回复: 0

[经验分享] Auto Complete with Redis

[复制链接]

尚未签到

发表于 2015-7-21 12:07:07 | 显示全部楼层 |阅读模式
  http://oldblog.antirez.com/post/autocomplete-with-redis.html
  Hello! This isn't just a blog post, it is actually the Redis weekly update number eight, but we can't tell this to Google: naming posts like Redis Weekly update #... will not play well with Google indexing, so I'm starting to use blog post titles more predictive of what people will actually type into a search engine, in this specific case autocomplete redis, probably.
Too bad that I'm forced to play the game of the SE0 L00z3r, but it's for a good reason. I guess this will also work well when articles are submitted in places like Hacker News, Programming Reddit and so forth. Readers will have a simpler way to tell if they want to actually click or not on the link. Ok... end of the off topic.
I hope the topic of this post will be very interesting for many Redis users, and for many non Redis users that experimented the pain of modeling auto complete for web or mobile applications using a database that is not designed to make this a simple task.
Auto complete is a cool and useful feature for users, but it can be non trivial to implement it with good performances when the set of completions is big.

Simple auto complete
  
In the simplest auto completion schema you have:


  • A set of N words, that are in some way notable words that you want to auto complete. This can be for instance the names of the biggest cities in the world in a subscription form, or the most frequent search terms in a search service. We'll try to complete a list of female names. You can find the raw file name here: female-names.txt.
  • A prefix string, that is, what your user is writing in the input field.
  
Given the prefix, you want to extract a few words that start with the specified prefix.
But in what order? In many applications (like the list of cities in a web form) this order can just be lexicographic, and this is more simple and memory efficient to model in Redis, so we'll start with it, trying to evolve our approach to reach more interesting results.
So for instance if the prefix is mar, what I want to see in the completion list is mara, marabel, marcela, and so forth.
Since we want a scalable schema, we also require that this entries are returned in O(log(N)) time in the average case. A O(N) approach is not viable for us. We'll see the space complexity, that is another very important meter, in a moment.

A little known feature of sorted sets
  
The order of elements into a sorted set is, of course, given by the score. However this is not the only ordering parameter since sorted set elements may have the same score. Elements with the same score are sorted lexicographically, as you can see in this example:

redis> zadd zset 0 foo
(integer) 1
redis> zadd zset 0 bar
(integer) 1
redis> zadd zset 0 x
(integer) 1
redis> zadd zset 0 z
(integer) 1
redis> zadd zset 0 a
(integer) 1
redis> zrange zset 0 -1
1. "a"
2. "bar"
3. "foo"
4. "x"
5. "z"

  This is a very important feature of sorted sets, for instance it guarantees O(log(N)) operations even when large clusters of elements have the same score. And this will be the foundation for our code completion strategy.
Well... that's not enough, but fortunately there is another cool feature of sorted sets, that is the ZRANK command. With this command given an element you can know the position (that is, the index) of an element in the sorted set. So what should I do if I want to know what words follow "bar" in my sorted set? (note that indexes are zero based)

redis> zrank zset bar
(integer) 1
redis> zrange zset 2 -1
1. "foo"
2. "x"
3. "z"

  Hey that seems pretty close to what we want in our first completion strategy... but we need a final trick.

Completion in lexicographical ordering
  So what we are able to do is looking up a word in the sorted set and obtain a list of words lexicographically following this word. This is a bit different from what we actually need, as we want to complete words just having a prefix of the word (or sentence, and in general any string).
We can have all this paying a price, that is: more space.
What we do is the following: instead of adding only full words to the sorted set, we add also all the possible prefixes of any given word. We also need a way to distinguish actual words from prefixes. So this is what we do:


  • For every word, like "bar", we add all the prefixes: "b", "ba", "bar".
  • For every word we finally add the word itself but with a trailing "*" character,
  
so that we can tell this is an actual word (of course you can use a different marker, even binary to avoid collisions). So we also add "bar*". If I add the word "foo", "bar", and "foobar" using the above rules, this is what I obtain:

redis> zrange zset 0 -1
1. "b"
2. "ba"
3. "bar"
4. "bar*"
5. "f"
6. "fo"
7. "foo"
8. "foo*"
9. "foob"
10. "fooba"
11. "foobar"
12. "foobar*"

  Now we are very close to perform the completion!
If the user is writing "fo" we do the following:

redis> zrank zset fo
(integer) 5
redis> zrange zset 6 -1
1. "foo"
2. "foo*"
3. "foob"
4. "fooba"
5. "foobar"
6. "foobar*"

  We just need to get all the items marked with the trailing "*', so we'll be able to show "foo" and "foobar" as possible completions.

Let's try it with a bigger dictionary
  We'll use a list of 4960 female names, female-names.txt and a Ruby script to check how well our approach is working.










[table]

[tr]
[td]123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051[/td]
[td]

# compl1.rb - Redis autocomplete example

# download female-names.txt from http://antirez.com/misc/female-names.txt



require 'rubygems'

require 'redis'



r = Redis.new



# Create the completion sorted set

if !r.exists(:compl)

    puts "Loading entries in the Redis DB\n"

    File.new('female-names.txt').each_line{|n|

        n.strip!

        (1..(n.length)).each{|l|

            prefix = n[0...l]

            r.zadd(:compl,0,prefix)

        }

        r.zadd(:compl,0,n+"*")

    }

else

    puts "NOT loading entries, there is already a 'compl' key\n"

end



# Complete the string "mar"



def complete(r,prefix,count)

    results = []

    rangelen = 50 # This is not random, try to get replies < MTU size

    start = r.zrank(:compl,prefix)

    return [] if !start

    while results.length != count

        range = r.zrange(:compl,start,start+rangelen-1)

        start += rangelen

        break if !range or range.length == 0

        range.each {|entry|

            minlen = [entry.length,prefix.length].min

            if entry[0...minlen] != prefix[0...minlen]

                count = results.count

                break

            end

            if entry[-1..-1] == "*" and results.length != count

                results

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-89060-1-1.html 上篇帖子: redis hash 下篇帖子: redis ltrim命令
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表