设为首页 收藏本站
查看: 348|回复: 0

[经验分享] 转:Python in Google(notes took in PyCon)

[复制链接]

尚未签到

发表于 2017-5-3 08:46:23 | 显示全部楼层 |阅读模式
TITLE OF PAPER: Python at Google
URL OF PRESENTATION: --not available--
PRESENTED BY: Greg Stein
REPRESENTING: Google
CONFERENCE: PyCon 2005
DATE: March 25, 2005
LOCATION: Marvin Theater
--------------------------------------------------------------------------
REAL-TIME NOTES / ANNOTATIONS OF THE PAPER:
{If you've contributed, add your name, e-mail & URL at the bottom}
[ A new copy of the O'Reilly Python Success Stories booklet will be produced
    Contact Stephan Diebel @ pythonology.org ]
"Python has been an important part of Google since the beginning, and remains so as the system grows and evolved. Today dozens of Google engineers use Python, and we're looking for more people with skils in this language"
-- Peter Norvig, Director of Search Quality at Google
My background
    Python developer
        10 years
        Contributed to Python itself
        Authored a number of modules and applications   
            ViewCVS
    Open Source Guy
        Contributed to numerous projects (including Python)
        Current chairman of the Apache Software Foundation
        ViewCVS, written entirely in Python
        Contributed to Subversion, Apache server
    "We consider Python to be our 'secret sauce'"
    --Paul Everitt, talking about Digital Creations, circa 1996
    This is a recognition of how Python can help a business.
My view of Python in the workplace
    Python at eShop
        1995 "What in the world is Python?"
        1996 "This is great stuff."
        (MS acquired eShop in '96)
    Python at Microsoft
        1996: "It's called what?"
        1997: "You actually shipped Python code?" (MerchantServer 1.0)
        1998: "Nice prototype. We'll rewrite it in the next version." And they
            did, in C++.
Python in the workplace (continued)
    Python at CollabNet
        2001: "No, we don't really use Python here." (they used Java)
        2003: "Definitely! Write that in Python"
        Python caught on here like a virus, moving from developer to developer.
    Python at Google
        2004 "Of *course* we use Python.  Why wouldn't we?"
Changing attitudes over time
    Small companies eventually "Got it" ahead of the curve
        Champion was needed
    Larger Companies follow Python's growth curve
        Supporting environment was needed
A number of factors made Python possible in larger organizations:
It is now possible. Here's why:
    Python had to grow for it to become "business acceptable"
        Large enough talent pool - "where are we going to be able to find these people?"
        Support services: Books, Consulting, World Wide Web
        Follow the trailblazers
    Python passed the tipping point years ago
        Not a problem to incorporate it into your business, lots of support,   
        consulting
Business advantage
    "These are some of the reasons we use Python at Google"
        Highly adaptable
            Changing requirements
            - You need a language that is very flexible, so you can adapt your tools during development
            Changes in computing environment
        Rapid development
            For new and experienced developers
            The market moves very very quick; you want to be able to keep up with it. If it takes two years for you to respond to something that is needed today, you're behind the curve.
    Easy to maintain - most important point in Greg's viwe
        You can come back a year later, look at that code, and understand what
            is going on
Google's programming environment
    Primary Languages
        C++
        Java
        Python
        If you want to write a piece of something else, like Perl, you have to
            almost get special permission.  (Exceptions in ops, but for actual
            product stuff, see above)
    Miscellaneous
        Some Perl used by Operations (others almost have to get permission to use Perl)
        PHP creeeps in for internal webapps
        Saw Ruby sneaking around
        Small amount of C#
        In actual progress stuff, C++, Java, Python
SWIG is your friend
    SWIG: Simplified Wrapper Interface Generator
        www.swig.org
        Started by David Beazley
    Multi-language environment
        A lot of people at Google don't know Python and produce C++ code.
        SWIG pulls these "islands together"--they have a lot of stuff lying
            around written in various languages. SWIG examines a C++ header file
            and auto-generates Python bindings
            So for all of our libraries that we have - for parsing HTML,
                crawling HTTP and so on - they are made available to Python
                using SWIG.
            Good for Google programmers who use C++ but don't know Python
        Very fast mechanism for integration
    Integrated into build system
        Makes it very easy for us to add a rule into our build system to just add a library into our python dependancy module
Where do we use it?
    Across our internal network
    Across a system lifecycle
    Live Services
Basic Network
    <diagram servers="" to="" infrastructure="" through="" pushing="" development="" of="">
Some usage to support development
    Wrappers for Version control (Perforce) (JB note: Perforce can output
    marshalled Python objects -- very cool, extremely useful for scripting.  Also see svn SWIG mention in Q&A)
        They improved branch management.
        Running unit tests on checkin
        People "earn" their ability to check in after then understand code
            guidelines, etc.
        Automatically enforce style guidelines
    Build System (itself written in Python)
    Packaging
        We've got giant bundles of code and giant bundles of data which need to
        be delivered up to the servers.
        Packaging system is built in Python
        Third generation of this system
        Ability to roll back a version
        We can keep iterating and moving forward because we're building all this stuff in Python
Some usage in the network infrastructure
    Binary/data pusher
        Figures out best way to send stuff from one place to
            another -- dev to data center, etc
        We're on third/fourth generation of this, keep increasing the scale of
        the problem. Python's making that possible - able to iterate quickly
    Package repository
Some usage on production servers
    Monitoring
        Is this thing still alive? Is it running? Does it think it's healthy? Is
        it seeing problems with the hard disk? Is the CPU temperature fine?
        All of this information is gathered with a little Python program running on the server, then collected by another Python program.
    Auto-restart
Complete the Lifecycle
    Log reporting
        We generate a "large" amount of log information
    Data is pulled back from the servers
    Analyzed using lots of Python tools
        Ad group needs to spot fraudulent clicks. This is a constant cat-and-            
            mouse game with the script kiddies writing fraudulent ad clickers.
        Easy to alter the reports based on ever-changing needs
        Every time we find some way people are fraudulently clicking our ads, we
        patch that hole. It's a continuous process.
Python-based servics
    Google Groups
        "Python Old-timers" David Jeske and Brandon Long (of eGroups and
            Neotonic/ClearSilver) are the leads on Groups.
        All built using Python code
        Highly pythonic
        They didn't use that giant mountain of C++ stuff
    code.google.com
        Stein and DiBona
    Others? We have so much going on...
How code.google.com was built (block diagram)
        /\ \/
    Front end Stuff
        /\ \/
    code.google.com
        SWIG
     Google Stuff
The funky front end stuff deals with denial of service attacks, reporting, blocking IPs known to be bad
    We get to take advantage because we've wrapped this
The HTTP server it's built on has all of the reporting and monitoring things on it - the "Google Stuff"
code.google.com
    goopy package - support for functional style programming
        Functional stuff to start with
        Place to put future modules
Closing
    We have a lot of Python code, covering a broad range of needs.
    Python has helped Google for many, many years.
    SWIG is underrated.
        I saw a little rant on Guido's blog (Guido shakes head) - it's kind of difficult to get your head wrapped around it but when you need access to some library of functionality from Python you don't need to go and bulid it yourself - you can use SWIG to wrap it automatically. This fits the Python ideal of smart reuse.
    We are now starting to open-source some of the pile.
Questions and Answers (a good 25 minutes for these)
Q: When are you going to open source the build system? (Guido)
A: I don't know.  If I recall, Greg has talked about it
   Chris DiBona: We're thinking of releasing some of our wrappers around
   Perforce first
Q: About SWIG, have you looked at the Boost::Python library?
A: I did see that come up recently; I don't think we use it a lot but it has
   been mentioned. I'll take a closer look at it.
Q: What about ctypes?
A: I saw that a while ago on a different project.  As far as I know we don't use
   it, SWIG works well with our build system
Q: elaborates on ctypes/SWIG differences. While SWIG will build a
   Python wrapper for a given C lib, ctypes will let you dynamically load up a C
   lib and call its functions.
A: calldll does something similar for windows environment
Q: Do you do anything in regard to network monitoring / SNMP with Python?
A: We do have a very large internal network, lots of traffic, the Ops guys do
   have monitors to watch the flow, have to schedule moving large (100 GB or 1
   TB-size) files.
Q: (Alex Martelli - who is starting at Google in three days) Back to the
   wrapping issue.  SWIG and ctypes will not help at all with C++ templates -
   Boost is better in this regard.  SWIG has been extended to support templets
   recently.
A: We do use some templates, but we normally try to avoid them and use SWIG. In
   that sense, SWIG works well for us. Some of the template stuff I'd like  
   better access to, and I end up having to do some extra goo to get things
   working.
Q: What is missing from the Python ecosystem?
A: (Anna Ravenscroft, Alex's wife, yells "Alex") But we've solved that problem.
   Today they are mostly using Python 2.2,  trying to figure out how to use
   Python 2.3 -- big upgrade problem
Q: How do you evangelize people who are happy with C++ and SQL and don't seem to
   want to try Python?
A: We make it easy to use any of the languages, and don't really force people to
   use a different language. The different applications are based on what the
   team understands best. We make it easy for all of these things to interact -
   if you have a server written in Java we have a custom RPC system that helps
   bridge the gap and communicate with other servers.
Q: How many software engineers roughly does Google employ (Steve Holden)?
A: I do know that the public employee count is over 3,000 employees as of
   December, but I don't know the break-out in terms of numbers of engineers.
   It's hundreds of engineers but I can't really say any more.
   Some of the apps written in Java (blogger) can communicate with C++ using
   RPC
, so not using Python is not a problem
Q: You must have masses of linguistic data (terabytes). How do you access that
   data so fast?
A: Yes. I don't know, I don't work in that area.  As far as speed, "we just
   throw servers at it."
Q: Within Google, is there anything for which Python is considered inappopriate?
A: Is there anything where Python is not appropriate? Well yeah, something like
   our indexing system where we scan the web pages and produce an index. Python
   is good, and fast, and IronPython is even faster, but it's not fast enough.
   We use C for that.
   For other things, it's based on the engineering team. We make it possible for
   the teams to use what language they like.
   Personally, I'd like to see more Python, so some of the things I've been
   doing have been working on enabling that.
Q: What kind of bug-tracking system do you use?
A: Bug tracking.  Our system is not that good.
   We have one, anybody in their right mind has one
   Bugzilla derivative
   MS has an awesome bug tracking system
   Even what I had at collab.net was better
   Google's looking at different options for fixing that system.
Q: I want to jump in with another comment on wrapping. I have a plotting library
   in C++ with heavy use of templates and I tried wrapping it in three different
   things (cxx, Boost, and SWIG). SWIG is actually pretty good now, swig
   template support is much better than it used to be. Boost makes things way,
   way too big.
A: Based on this feedback it seems like Boost is capable in certain environments
   and is definitely worth looking at. Need to evaluate before using.
Q: SWIG performance in real time environment?
A: It is a non-issue. However, I was challenged about this at MS: someone said
   "Python won't be fast enough!" I said, "how fast does it have to be?  1000
   pages per second?"  He couldn't say. So I said "then just don't worry about
   unless it proves too slow."
   We did go ahead and rewrite some of Python the stuff into ActiveX COM objects
   and ASP and... it was slower (laughter and applause).
   Much time in Python is spent outside the interpreter loop; much time is
   spent, e.g., in the String object, which is written in C.
   [On code.google.com] There's still that Global Interpreter Lock in there, but  
   I still saw some SERIOUS page performance on that thing. Don't be afraid of
   bringing Python into your projects.. Your bottleneck will be the network
   bandwidth (some person on a 56kbps line), not Python
Q: Mentioned a number of languages used at Google. We use Python because it's
   terser (among other reasons). Can you speculate on lines of code in various  
   languages at Google? (Do you even know total lines of code at Google?)
A: I have no idea. It's a LOT.
   Joke from audience: the code counter is still running!
   C++ is probably the majority, probably followed by Python.
   C++, Python, Java - gut feeling
Q: Five years from now, if people are right about Moore's law, more
   multiprocessor systems. What about the getting rid of the Global Interpreter
   Lock project that you did a few years ago?
A: Wow. Yeah, that was a few years ago.   Back in '96 I made a few patches to
   Python 1.4 to get rid of the GIL.  We used that at MS to make free threaded
   COM objects. We were getting a lot of lock contention.  We had to protect
   different data structures - like in Python there are pools of frame objects
   which had to be protected (??). Things were blocking around those pools. For
   2 processors there was a bonus, but for 3 or 4 it was actually slower.
   Free threading - Python's thread state was one of the benefits from that set
   of patches. sys.exc_info was another.
   The Global Interpreter Lock hasn't actually been a problem.
Q: Every once in a while, you are going to introduce a bug into the system. How
   do you guys debug across the language boundaries?
A: We don't have any particular tools, or antyhing like that. Have libraries for   
   logging. My favorite technique is adding print statements (applause/
   laughter). It would be wonderful if we had special tools but we don't.
   Some people ask what IDE they should use for cross language Java/Python
   development. Eclipse is quite good, but even that doesn't have any cross-
   language stuff.
Q: Do you have any current hobby projects that you are working on that you can  
   talk about?
A: Stuff outside Google they can't tell me not to talk about.
    Subversion based wiki (subwiki)
        svn exposes its libraries to Python via SWIG
        You could build a new svn client or interact with a server from Python
        ViewCVS does this
        subwiki uses the svn repository to store the wiki pages
    Googly stuff - mostly code.google.com
Q: What does Google have to say about web application frameworks
A: It's a tough one. Lot of stuff set up in C++. code.google.com was not built
   using an off-the-shelf framework; we used Google's custom HTTP server.
   GMail is not written in Python. I don't actually know if it's C++ or Java. (Chris DiBona: it's Java.)
Q: Followup - is there anything that Google can contribute (via open source) in the web framework arena?
A: Got a lot of stuff we've been talking about moving into the open soruce arena. Sturelies on Google-specific stuff, won't be interesting outside of Google.
Q: Tim O'Reilly talked about Google redefining applications.  In this view we're
   sort of moving away from Google 1.0.  When you upgrade, what sort of staging
   environment do you have?
A: We definitely have staging environments. One of the things built in to the
   systems I talked about for moving things out. The main web server -
   www.google.com - is a BIG chunk of code and data - because we have
   translations and stuff for everything. In any case, they're called canary
   servers (chuckles from crowd) - we put stuff on the canary servers and see if
   they're going to fall over. Also, because we get so much traffic we can turn
   a knob and expose something like 1% of our traffic to those servers. If they
   don't fall over, we expose some more.
   The "turning the knob" is a little command line tool written in Python.
Q: (Alex Martelli) Prompted by your mention of unwrapping pieces so they can be
   open source. It actually sounds like something that's a very good software
   engineering exercise, because it forces decoupling from your proprietary
   stuff. Even if we never open source the actual pieces, just having done the
   unwrapping seems like a big advantage.
A: It would be a big advantage if we were distributing code. For us, a 50 MB
   executable is not a problem, though you'd never try to push that to a client
   too often. While it would be an interesting engineering exercise and would
   improve the code it has not been a priority.
Chris DiBona followup: Opening your code tends to make it better, for example in our (?)malloc library we said it worked faster for these situations, and when we looked at it we found a bug in our code.
--------------------------------------------------------------------------
REFERENCES: {as documents / sites are referenced add them below}
http://www.swig.org
http://code.google.com
--------------------------------------------------------------------------
QUOTES:
"We don't do that at Microsoft; we ship C++ code"
"Python passed the tipping point years ago"
"[You can] read [Python] in 2 hours, program in it in 2 days, be productive for the company in 2 weeks."
"We use a LOT of SWIG"
"We've got quite a few servers..." (laughter)
"I've worked in large environments before, but nothing on the order of this"
"We have a lot of log data"
"Today we're using primarily Python 2.2 deployed on our servers, but we're trying to work out how to move to Python 2.3."
"Our bug tracking system is not that good"
"Pushing bits out to some guy on a 56k modem IS your bottle neck. Pulling records out of a database is your bottleneck. It's very rarely going to by Python."
"I think we probably have more Python code than we have Java" - a guess
"I think we probably have more Python than we do Java, because of all of those tools and things for supporting the environment, wrappers and all these things."
"Mr. Ascher.  That's Dr. Ascher, to you."
"My favourite debugging environment is PRINT."
--------------------------------------------------------------------------
CONTRIBUTORS: {add your name, e-mail address and URL below}
Ted Leung </diagram>

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-372320-1-1.html 上篇帖子: 量化分析师的Python日记【第1天:谁来给我讲讲Python?】 下篇帖子: 恭喜 Python成为2010年度编程语言
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表