221rrere 发表于 2016-6-27 09:19:25

python BeautifulSoup获取网页正文


通过BeautifulSoup库的get_text方法找到网页的正文:

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python
#coding=utf-8

#HTML找出正文

import requests
from bs4 import BeautifulSoup

url='http://www.baidu.com'
html=requests.get(url)

soup=BeautifulSoup(html.text)
print soup.get_text()



页: [1]
查看完整版本: python BeautifulSoup获取网页正文