7月

Python 用BeautifulSoup 解析Html

Contents

1. 安装Beautifulsoup4
2. html
3.开始解析
- Related Posts

1. 安装Beautifulsoup4

pip install beautifulsoup4
pip install lxml
pip install html5lib

lxml 和 html5lib 是解析器

2. html

<!-- This is the example.html file. -->

<html><head><title>The Website Title</title></head>
<body>
<p>Download my <strong>Python</strong> book from <a href="http://inventwithpython.com">my website</a>.</p>
<p class="slogan">Learn Python the easy way!</p>
<p>By <span id="author">Al Sweigart</span></p>
</body></html>

上面的html保存html文件

3.开始解析

import bs4

exampleFile = open('example.html')
exampleSoup = bs4.BeautifulSoup(exampleFile.read(),'html5lib')
elems = exampleSoup.select('#author')
type(elems)
print (elems[0].getText())

结果输出 Al Sweigart

BeautifulSoup 使用select 方法寻找元素，类似jquery的css选择器

soup.select(‘div’) ———————–所有为<div>的元素

soup.select(‘#author’)—————–id为author的元素

soup.select(‘.notice’)——————class 为notice的元素

参考《Python 编程快速上手—–让繁琐工作自动化》

http://www.waitingfy.com/archives/1818

Post Views: 2