python爬虫请求头的设置方法
不懂python爬虫请求头的设置方法?其实想解决这个问题也不难,下面让小编带着大家一起学习怎么去解决,希望大家阅读完这篇文章后大所收获。
一、requests设置请求头:
importrequestsurl="http://www.targetweb.com"headers={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8','Cache-Control':'max-age=0','Connection':'keep-alive','Referer':'http://www.baidu.com/','User-Agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/53.0.2785.104Safari/537.36Core/1.53.4882.400QQBrowser/9.7.13059.400'}res=requests.get(url,headers=headers)#图片下载时要用到字节流,请求方式如下#res=requests.get(url,stream=True,headers)
二、Selenium+Chrome请求头设置:
fromseleniumimportwebdriveroptions=webdriver.ChromeOptions()options.add_argument('lang=zh_CN.UTF-8')#设置中文options.add_argument('user-agent="Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/53.0.2785.104Safari/537.36Core/1.53.4882.400QQBrowser/9.7.13059.400"')#设置头部browser=webdriver.Chrome(chrome_options=options)url="http://www.targetweb.com"browser.get(url)browser.quit()
三、selenium+phantomjs请求头设置:
fromseleniumimportwebdriverfromselenium.webdriver.common.desired_capabilitiesimportDesiredCapabilitiesdes_cap=dict(DesiredCapabilities.PHANTOMJS)des_cap["phantomjs.page.settings.userAgent"]=("Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/53.0.2785.104Safari/537.36Core/1.53.4882.400QQBrowser/9.7.13059.400")browser=webdriver.PhantomJS(desired_capabilities=des_cap)url="http://www.targetweb.com"browser.get(url)browser.quit()
四、爬虫框架scrapy设置请求头:
在settings.py文件中添加如下:
DEFAULT_REQUEST_HEADERS={'accept':'image/webp,*/*;q=0.8','accept-language':'zh-CN,zh;q=0.8','referer':'https://www.baidu.com/','user-agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/53.0.2785.104Safari/537.36Core/1.53.4882.400QQBrowser/9.7.13059.400',}
五、Python异步Aiohttp请求头设置:
importaiohttpurl="http://www.targetweb.com"headers={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8','Cache-Control':'max-age=0','Connection':'keep-alive','Referer':'http://www.baidu.com/','User-Agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/53.0.2785.104Safari/537.36Core/1.53.4882.400QQBrowser/9.7.13059.400'}asyncwithaiohttp.ClientSession(headers=headers)assession:asyncwithsession.get(url)asresp:print(resp.status)print(awaitresp.text())
感谢你能够认真阅读完这篇文章,希望小编分享python爬虫请求头的设置方法内容对大家有帮助,同时也希望大家多多支持亿速云,关注亿速云行业资讯频道,遇到问题就找亿速云,详细的解决方法等着你来学习!
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。