python 爬虫利器优美的BeautifulSoup

ct38 · 发表于 2018-8-4 10:25:06

>>> from bs4 import BeautifulSoup　　
>>> html_doc = """
　　
... <html><head><title>The Dormouse's story</title></head>
　　
...
　　
... The Dormouse's story
　　
...
　　
... Once upon a time there were three little sisters; and their names were
　　
... <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
　　
... <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
　　
... <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
　　
... and they lived at the bottom of a well.
　　
...
　　
... ...
　　
... """
　　
>>> soup = BeautifulSoup(html_doc)
　　
>>> soup.head()
　　
[<title>The Dormouse's story</title>]
　　
>>> soup.title
　　
<title>The Dormouse's story</title>
　　
>>> soup.title.string
　　
u"The Dormouse's story"
　　
>>> soup.body.b
　　
The Dormouse's story
　　
>>> soup.body.a
　　
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
　　
>>> soup.get_text()
　　
u"... The Dormouse's story\n... \n... The Dormouse's story\n... \n... Once upon a time there were three little sisters; and their names were\n... Elsie,\n... Lacie and\n... Tillie;\n... and they lived at the bottom of a well.\n... \n... ...\n... "
　　
>>> soup.find_all('a')
　　
[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
　　
>>> for key in soup.find_all('a'):
　　
... print key.get('class'),key.get('href')
　　
...
　　
['sister'] http://example.com/elsie
　　
['sister'] http://example.com/lacie
　　
['sister'] http://example.com/tillie

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] python 爬虫利器优美的BeautifulSoup

浏览过的版块

扫码加入运维网微信交流群