Python Requests爬虫中如何求取关键词页面

2024-10-30 技术教程

小编给大家分享一下PythonRequests爬虫中如何求取关键词页面，相信大部分人都还不怎么了解，因此分享这篇文章给大家参考一下，希望大家阅读完这篇文章后大有收获，下面让我们一起去了解一下吧！

需求:爬取搜狗首页的页面数据

importrequestsif__name__=='__main__':#step1:搜索Urlurl='https://123.sogou.com/'#step2:发起请求#get方法会返回一个响应对象response=requests.get(url=url)#step3:获取响应数据,text返回的是字符串形式的响应数据page_text=response.textprint(page_text)#step4:持久化存储withopen('./sogou.html','w',encoding='utf-8')asfp:fp.write(page_text)print("爬取数据结束")importrequestsif__name__=='__main__':#step1:搜索Urlurl='https://123.sogou.com/'#step2:发起请求#get方法会返回一个响应对象response=requests.get(url=url)#step3:获取响应数据,text返回的是字符串形式的响应数据page_text=response.textprint(page_text)#step4:持久化存储withopen('./sogou.html','w',encoding='utf-8')asfp:fp.write(page_text)print("爬取数据结束")

使用UA伪装求取关键词页面

importrequestsif__name__=='__main__':#UA伪装:将对应的User-Agent封装到一个字典中headers={'User-Agent':'Mozilla/5.0(WindowsNT10.0;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/98.0.4758.9Safari/537.36'}url='https://www.sogou.com/sie?'#处理url携带的参数:封装到字典中kw=input('enteraword:')param={'query':kw}#对指定的url发起的请求对应的url是携带参数的,并且请求过程中处理了参数response=requests.get(url=url,params=param,headers=headers)#headers是伪装params输入关键词page_text=response.text#以文本的形式输出fileName=kw+'.html'#存储为网页形式withopen(fileName,'w+',encoding='utf-8')asfp:fp.write(page_text)#写入fpprint(fileName,"保存成功！！")

以上是“PythonRequests爬虫中如何求取关键词页面”这篇文章的所有内容，感谢各位的阅读！相信大家都有了一定的了解，希望分享的内容对大家有所帮助，如果还想学习更多知识，欢迎关注亿速云行业资讯频道！