5月

python xpath 简单语法

Contents

1.例子
2.例子2
3.例子3
4.例子4
- Related Posts

1.例子

item['price'] = response.xpath('//span[@class="p-price"]/span[2]/text()').extract_first()

1.//开头表示如果我们不想定义它的父元素，就用//表示
2.匹配具体的属性值,这里是class=p-price的
3.xpath一级级用/来
4.[2] 表示第二个元素，xpath中第一个是1，不是0
5.获取标签内的文字用text()

2.例子2

item['color'] = response.xpath('//div[@id="choose-attr-1"]/div[@class="dd"]/div[contains(@class, \'item\')]/@data-value').extract()

1.如果有2个class，你只指定了一个class是无法匹配的，要用到contains进行匹配
2.属性用@xx, 比如图片的data-value，用@data-value

3.例子3

items = response.xpath('//div[@id="plist"]/ul/li[@class="gl-item"]')
        for product in items:
            item = JdsplashItem()
            item['price'] = product.xpath('.//strong[@class="J_price"]/i/text()').extract_first()
            # item['price'] = product.css('.J_price i::text').extract_first()
            item['img_url'] = product.css('.p-img img::attr("src")').extract_first()
            yield item

1. 如果div和li之间还有ul，也要写上，不能之间用//div[@id=”plist”]/li[@class=”gl-item”]来匹配，xpath是一层一层来解析的
2. 循环的时候，如果是从父元素开始匹配，要用.//，不能用//，//是全局的

4.例子4

strYear = item.xpath('string(.//div[@class="bd"]/p)').extract_first()

用string得到一个标签内的所有去除标签的文字
http://www.waitingfy.com/archives/3752

Post Views: 2

3752

scrapy | RSS 2.0 | Respond | Trackback |

python xpath 简单语法

1.例子

2.例子2

3.例子3

4.例子4

Related Posts

Leave a Reply