pengjunling 发表于 2015-12-15 09:50:48

Python抓取网页,提交post的简单流程


[*]#!/usr/bin/env python

[*]#-*- coding:utf-8-*-
[*]
[*]import urllib
[*]import urllib2
[*]
[*]from ntlm import HTTPNtlmAuthHandler
[*]
[*]
[*]
[*]URL = "https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F"
[*]USER = ""
[*]PASSWD = ""
[*]POSTDATA={}
[*]HEADERS = {'User-Agent':'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1;Trident/4.0;SLCC2;.NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; aff-kingsoft-ciba; .NET4.0C; .NET4.0E)'}
[*]
[*]#构造句柄
[*]passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
[*]passman.add_password(None, URL, USER, PASSWD)
[*]'''NTLM是NT LAN Manager的缩写。NTLM是Windows NT早期版本的标准安全协议。'''
[*]auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)
[*]

[*]#打开网页
[*]opener = urllib2.build_opener(auth_NTLM)
[*]urllib2.install_opener(opener)
[*]response = urllib2.urlopen(URL)
[*]print response.read()
[*]
[*]#提交Request
[*]Request = urllib2.Request(URL, urllib.urlencode(POSTDATA),HEADERS)
[*]response2 = opener.open(Request)
[*]retCode = response2.getcode()
[*]print response2.read(),retCode
页: [1]
查看完整版本: Python抓取网页,提交post的简单流程