python爬虫请求头的设置方法

2025-01-15 技术教程

不懂python爬虫请求头的设置方法？其实想解决这个问题也不难，下面让小编带着大家一起学习怎么去解决，希望大家阅读完这篇文章后大所收获。

一、requests设置请求头:

importrequestsurl="http://www.targetweb.com"headers={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8','Cache-Control':'max-age=0','Connection':'keep-alive','Referer':'http://www.baidu.com/','User-Agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/53.0.2785.104Safari/537.36Core/1.53.4882.400QQBrowser/9.7.13059.400'}res=requests.get(url,headers=headers)#图片下载时要用到字节流，请求方式如下#res=requests.get(url,stream=True,headers)

二、Selenium+Chrome请求头设置:

fromseleniumimportwebdriveroptions=webdriver.ChromeOptions()options.add_argument('lang=zh_CN.UTF-8')#设置中文options.add_argument('user-agent="Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/53.0.2785.104Safari/537.36Core/1.53.4882.400QQBrowser/9.7.13059.400"')#设置头部browser=webdriver.Chrome(chrome_options=options)url="http://www.targetweb.com"browser.get(url)browser.quit()

三、selenium+phantomjs请求头设置：

fromseleniumimportwebdriverfromselenium.webdriver.common.desired_capabilitiesimportDesiredCapabilitiesdes_cap=dict(DesiredCapabilities.PHANTOMJS)des_cap["phantomjs.page.settings.userAgent"]=("Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/53.0.2785.104Safari/537.36Core/1.53.4882.400QQBrowser/9.7.13059.400")browser=webdriver.PhantomJS(desired_capabilities=des_cap)url="http://www.targetweb.com"browser.get(url)browser.quit()

四、爬虫框架scrapy设置请求头：

在settings.py文件中添加如下：

DEFAULT_REQUEST_HEADERS={'accept':'image/webp,*/*;q=0.8','accept-language':'zh-CN,zh;q=0.8','referer':'https://www.baidu.com/','user-agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/53.0.2785.104Safari/537.36Core/1.53.4882.400QQBrowser/9.7.13059.400',}

五、Python异步Aiohttp请求头设置:

importaiohttpurl="http://www.targetweb.com"headers={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8','Cache-Control':'max-age=0','Connection':'keep-alive','Referer':'http://www.baidu.com/','User-Agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/53.0.2785.104Safari/537.36Core/1.53.4882.400QQBrowser/9.7.13059.400'}asyncwithaiohttp.ClientSession(headers=headers)assession:asyncwithsession.get(url)asresp:print(resp.status)print(awaitresp.text())

感谢你能够认真阅读完这篇文章，希望小编分享python爬虫请求头的设置方法内容对大家有帮助，同时也希望大家多多支持亿速云，关注亿速云行业资讯频道，遇到问题就找亿速云，详细的解决方法等着你来学习!