设为首页 收藏本站
查看: 1067|回复: 0

[经验分享] python pycurl的用法

[复制链接]

尚未签到

发表于 2015-4-22 02:51:05 | 显示全部楼层 |阅读模式
  转自http://www.angryobjects.com/2011/10/15/http-with-python-pycurl-by-example/



HTTP with Python – PycURL by Example


  A colleague of mine recently remarked something along the lines of “whenever I need to do HTTP client stuff in any language, I usually go look for cURL bindings straight away”, and I’m starting to agree on that one. HTTP is hard to do right, and cURL has a track record of doing HTTP right. If you don’t know what cURL is, take a look at http://curl.haxx.se
  Up to the task of doing some serious HTTP stuff and Python being the language of choice in a recent little project that I did, I naturally went to look for the most popular HTTP client implementation in Python. urllib2 seemed to be the most popular library in that domain. I started fiddling with it for a bit, and while I got more and more confused and annoyed (they tend to go hand in hand, don’t they?) with the API design of this particular module, I ran into an astonishing shortcoming: it doesn’t currently let you do HTTPS over a proxy. Seriously? This was the moment that I hit Google to search for Python cURL bindings. I felt lucky, and this brought me to PycURL.
  Using PycURL, I was able to implement my use cases in a snap. Then I wondered – why isn’t this module more popular among Python developers? The PycURL website already states the reasons – it doesn’t have a very Pythonic interface as it is a thin layer over libcurl (which is implemented in C), and it takes some knowledge of the cURL C API to put it to effective use. The first reason I can’t really help with, but I’ll be focusing on the second – gaining some knowledge of the underlying API. I’ll try to explain how this API works by implementing a number of use cases using PycURL.
  The simplest use case possible would probably be to retrieve some content from a URL, so let’s get our feet wet.





import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'http://news.ycombinator.com')
c.perform()
  This will perform a GET request on the given URL, and spit out the contents of the response body to stdout. In case it isn’t obvious enough to you already, we’re instantiating a new Curl object, call setopt() on it to influence cURL’s behavior, and call perform() to execute the actual HTTP request. This is a typical example of how using PycURL works; instantiate, set cURL instructions, perform.
  We probably want to catch the output so we can do interesting things with it before dumping it to stdout, so here I’ll show you how to catch the output into a variable instead.





import pycurl
import cStringIO
buf = cStringIO.StringIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://news.ycombinator.com')
c.setopt(c.WRITEFUNCTION, buf.write)
c.perform()
print buf.getvalue()
buf.close()
  Here we’re using a string buffer to let cURL write the response to. By setting the WRITEFUNCTION instruction and pointing it to the write method of our string buffer, we can catch the output into the buffer once perform() is called. You can just as well use StringIO instead of cStringIO – but the latter is faster. Again, pretty much any behavior is defined using setopt(). Need to have the request go through a proxy and add connect and socket read timeouts? Here goes:





import pycurl
import cStringIO
buf = cStringIO.StringIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://news.ycombinator.com')
c.setopt(c.WRITEFUNCTION, buf.write)
c.setopt(c.CONNECTTIMEOUT, 5)
c.setopt(c.TIMEOUT, 8)
c.setopt(c.PROXY, 'http://inthemiddle.com:8080')
c.perform()
print buf.getvalue()
buf.close()
  Not necessarily Pythonic, but pretty simple, right? Wait, how do we know what options we can use, and know what they do? Easy, you go to http://curl.haxx.se/libcurl/c/curl_easy_setopt.html, find the option you’re looking for, and set it using setopt(), minus the ‘CURLOPT_’ part (the PycURL module sets it for you so you won’t get bored of typing ‘CURLOPT_’ all the time).
  We’ve just scratched the surface of what cURL can do. Here is how you perform a POST request.





import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'http://myfavpizzaplace.com/order')
c.setopt(c.POSTFIELDS, 'pizza=Quattro+Stagioni&extra=cheese')
c.perform()
  By setting the POSTFIELDS option, the request automatically becomes a POST request. The POST data is obviously set in the value for this option, in the form of a query string containing the variables to send; their values need to be URL-encoded (using urllib.urlencode() helps in demanding cases). So you think something is going wrong, and you want to take a look at the raw request as it’s being carried out? The VERBOSE option will help you there.





import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'http://myfavpizzaplace.com/order')
c.setopt(c.POSTFIELDS, 'pizza=Quattro+Stagioni&extra=cheese')
c.setopt(c.VERBOSE, True)
c.perform()
  Setting the VERBOSE cURL option will print verbose information to stdout, so you can see exactly what is going on – from setting up the connection to creating the HTTP request, to the headers and the response that comes back after the request. This is very useful while programming with cURL – maybe even more so with sequences of requests between which you want to preserve cookie state. This is actually fairly simple to implement, let’s take a look.





import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'http://myappserver.com/ses1')
c.setopt(c.COOKIEFILE, '')
c.setopt(c.VERBOSE, True)
c.perform()
c.setopt(c.URL, 'http://myappserver.com/ses2')
c.perform()
  Let’s assume here that myappserver.com/ses1 intializes a session by using a session cookie, and possibly sets other cookies as well, which will be needed to proceed to myappserver.com/ses1, so they will need to be sent back to the server upon the second request. What happens in the code above is interesting, in the sense that we’re creating a Curl object once, and use it to perform two requests. When used like this, all the options that are set on the object are taken in regard upon subsequent requests, until they are overridden during the time the handle exists. In this case we’re setting the COOKIEFILE option, which can normally be used to provide a path to a cookie file that contains data to be sent as a HTTP Cookie header. However, we set its value to an empty string, which has the effect that cURL is made cookie-aware, and will catch cookies and re-send cookies upon subsequent requests. Hence, we can keep state between requests on the same cURL handle intact. Since we use the VERBOSE option as well, the code above will show you exactly what happens during the request. This approach can be used to simulate, for example, login sessions or other flows throughout a web application that need multiple HTTP requests without having to bother with maintaining cookie (and also session cookie) state. Performing multiple HTTP requests on one cURL handle also has the convenient side effect that the TCP connection to the host will be reused when targeting the same host multiple times, which can obviously give you a performance boost. In production code you obviously want to set more options for handling e.g. timeouts and providing some level of fault tolerance. Here’s a more extensive version of the previous example.





import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'http://myappserver.com/ses1')
c.setopt(c.CONNECTTIMEOUT, 5)
c.setopt(c.TIMEOUT, 8)
c.setopt(c.COOKIEFILE, '')
c.setopt(c.FAILONERROR, True)
c.setopt(c.HTTPHEADER, ['Accept: text/html', 'Accept-Charset: UTF-8'])
try:
c.perform()
c.setopt(c.URL, 'http://myappserver.com/ses2')
c.setopt(c.POSTFIELDS, 'foo=bar&bar=foo')
c.perform()
except pycurl.error, error:
errno, errstr = error
print 'An error occurred: ', errstr
  In this example we explicitly specify timeouts (which you should always do in real world situations), set some custom HTTP headers, and set the FAILONERROR cURL option to let cURL fail when a HTTP error code larger than or equal to 400 was returned. PycURL will throw an exception when this is the case, which allows you to gracefully deal with such situations.
  This should be enough information to get you started on using cURL through Python. There’s a whole world of functionality inside cURL that you can use; for example, one of its very powerful features is the ability to execute multiple requests in parallel which can (when using PycURL) be done by using the CurlMulti object (http://pycurl.sourceforge.net/doc/curlmultiobject.html). I hope you’ll agree that using PycURL is fairly simple, really powerful and pretty fast.


  posted in Programming, Python by objectified
Follow comments via the RSS Feed | Leave a comment



4 Comments to "HTTP with Python – PycURL by Example"


  •   Oren wrote:
      I think you will like Requests. http://docs.python-requests.org/en/latest/index.html
    Link | October 19th, 2011 at 1:48 am

  •   objectified wrote:
      @Oren: Yes I know Requests, I knew this one would come up DSC0000.gif It’s certainly more pythonic than PycURL, but I’m not sure if it’s more powerful or even that much easier to use. Like I said in the post, HTTP is hard to get right and I guess I put some trust in a library that has a track record like cURL has.
    Link | October 19th, 2011 at 4:06 am

  •   nicolas wrote:
      If you like requests and curl you should checkout human_curl on github. Supports more use cases than requests, same syntax, relies on curl for http awesomeness.
    Link | October 20th, 2011 at 8:51 am

  •   objectified wrote:
      @nicolas: human_curl looks absolutely great, thanks for the tip!
    Link | October 20th, 2011 at 9:47 am


运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-59292-1-1.html 上篇帖子: python import的用法 下篇帖子: Python使用Pygtk和Py2exe打包遇到的问题
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表