设为首页 收藏本站
查看: 680|回复: 0

[经验分享] Dropbox python开发中6点教训(每十五分钟同步100万个文件)

[复制链接]

尚未签到

发表于 2017-5-6 14:27:16 | 显示全部楼层 |阅读模式
  
Dropbox saves one million files every 15 minutes,  more tweets than even Twitterers tweet. That mind blowing statistic was revealed by Rian Hunter, a Dropbox Engineer, in his presentation How Dropbox Did It and How Python Helped at PyCon 2011.
The first part of the presentation is some Dropbox lore, origin stories and other foundational myths. We learn that Dropbox is a startup company located in San Francisco that has probably one of the most popular file synchronization and sharing tools in the world, shipping Python on the desktop and supporting millions of users and growing every day
About half way through the talk turns technical. Not a lot of info on how Dropbox handles this massive scale was dropped, but there were a number of good lessons to ponder:

  • Use Python

    • 99.9 % of their code is in Python. Used on the server backend; desktop client, website controller logic, API backend, and analytics.
    • Can't use Python on the Android due to memory constraints.
    • Runs on a single code base using Python. Dropbox runs on Windows, Mac, Linux using tools like PyObjs, WxPython, types, py2exe, py2app, PyWin32.
    • Pros: 

      • Developers talk to each other and express ideas in Python
      • Easy to learn, easy to read, easy to write, easy for new people to pick up.

    • Cons: 

      • Don't be silly. 
      • OK, it can use too much memory and be too slow. Not a big deal on the server side, just buy bigger machines. On the client side you can't get an old Power PC user to upgrade.
      • Coding in a mixed environment of Python and C creates problems because it's hard to profile across the language boundaries like you want to do when fixing memory and CPU problems.
      • Memory fragmentation issues are reason why scripting languages may not be a good idea for long running processes.


  • Just Work Baby

    • Shouldn't matter what file system you are on, what OS you are using, what applications you are using. The product should always just work.
    • Python helped them iterate fast through all the different error cases they experienced on the wide variety of platforms they support.

  • Release Early

    • Code something in a day and release it. Python makes that easy.

  • Use C for Inner Loops - Optimizing CPU is easy

    • A way to handle the too slow problem.
    • Optimize inner loops to reduce CPU time. 
    • 44% of overhead when looping in Python vs C (2.88s vs 1.61)
    • Python VM bytecode dispatches are really slow. 
    • Many tools exist for profiling CPU. 
    • CPU optimizations are usually limited to small code sections.

  • Poll - Polling 30 Milion Clients All Over the World Doesn't Scale 

    • Created an HTTP notification structure to avoid polling the server on the client site.

  • Custom Memory Allocator - Optimizing Memory is Hard

    • This was there biggest problem for a while. Could use huge amounts of memory and the memory would never be freed. For large sync they could use up to 1.5GB, now they rarely use more than 100MB.
    • Hard because: 

      • Few tools exist for profiling memory for Python and C
      • Memory bloat has so many causes: leaks in Python and C code; memory fragmentation; inefficient use of memory.

    • Fixing obvious memory inefficiencies didn't help. They thought there was a memory leak, but there wasn't.
    • Problem turned out to be memory fragmentation. Memory fragmentation is what happens when different sized memory blocks are continually being deleted and allocated. What happens is contiguous blocks of memory can no longer be allocated. CPython doesn't have a garbage collector, so all this memory simply wasn't able to be allocated and the heap continually grew so memory requests could be satisfied.
    • Solution was to create a custom allocator. The file meta-data object grows a lot when doing transfers, so the obvious low hanging fruit was to create a custom allocator in C using mmap.

Future Directions

  • Dropbox on toasters. File sharing on toasters will be really big.
  • They see folders as a unifying metaphor for storing, organizing, and accessing data in the cloud and on any device, anywhere, anytime. 
Related Articles 

  • Hackers News Thread
  • Dropbox Blog
  • Slidedeck for the Talk
  • Dropbox - Startup Lessons Learned by Drew Houston.

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-373924-1-1.html 上篇帖子: 《简明python教程》总结(五)-- 面向对象编程,输入/输出,异常 下篇帖子: java与python之编程之对比:一个简单的代码模板生成程序
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表