1. 301错误
301是重定向,在settings加这个就可以了,默认是False
MEDIA_ALLOW_REDIRECTS =True
2.403错误
403是禁止访问的错误,我这边是因为对方对Referer进行了判断,如果是空就会403,在process_request中的request中加Referer.用目标网址替换这边的xxxxx
def process_request(self, request, spider): # Called for each request that goes through the downloader # middleware. # Must either: # - return None: continue processing this request # - or return a Response object # - or return a Request object # - or raise IgnoreRequest: process_exception() methods of # installed downloader middleware will be called agent = random.choice(agents) request.headers["User-Agent"] = agent #request.meta["proxy"] = proxyServer #request.headers["Proxy-Authorization"] = proxyAuth request.headers['Referer'] = 'xxxxx; return None
http://www.waitingfy.com/archives/3290
关于User-Agent和proxy设置,可以参考上一篇的文章 《Scrapy middleware 设置随机User-Agent 和 proxy》