Python 用BeautifulSoup 解析Html

1. 安装Beautifulsoup4

 

pip install beautifulsoup4
pip install lxml
pip install html5lib

lxml 和 html5lib 是解析器

 

 

2. html

 

 

<!-- This is the example.html file. -->

<html><head><title>The Website Title</title></head>
<body>
<p>Download my <strong>Python</strong> book from <a href="http://inventwithpython.com">my website</a>.</p>
<p class="slogan">Learn Python the easy way!</p>
<p>By <span id="author">Al Sweigart</span></p>
</body></html>

 

上面的html保存html文件

3.开始解析

 

import bs4

exampleFile = open('example.html')
exampleSoup = bs4.BeautifulSoup(exampleFile.read(),'html5lib')
elems = exampleSoup.select('#author')
type(elems)
print (elems[0].getText())

结果输出 Al Sweigart

BeautifulSoup 使用select 方法寻找元素,类似jquery的css选择器

 

soup.select(‘div’) ———————–所有为<div>的元素

soup.select(‘#author’)—————–id为author的元素

soup.select(‘.notice’)——————class 为notice的元素

 

参考《Python 编程快速上手—–让繁琐工作自动化》

 

http://www.waitingfy.com/archives/1818

1818

Leave a Reply

Name and Email Address are required fields.
Your email will not be published or shared with third parties.