【Python】备份itpub博客
itpub改版了,新版备份博客代码如下,思路和上一版备份思路一样
#-*-coding:utf-8-*-importreimporturllib2importrequests,refrombs4importBeautifulSoupasbspforpageinrange(1,30):###这儿就是输入你希望下载的页数,输入你的总页数吧url='http://blog.itpub.net/29096438/list/%d/'%page####循环不同的页text=urllib2.urlopen(url).read()pattern=r'<atarget=_blankhref="/29096438/viewspace-[0-9]*/"class="w750"><pclass="title">.*</p></a>'regex=re.compile(pattern)urlList=re.findall(regex,text)####通过正则表达式找到所有文章的href,此时的href是带上标题的fortinurlList:i=t.split('=')[2].replace('class','').replace('"','').strip('')newi=re.sub('/29096438','http://blog.itpub.net/29096438',i).decode('utf-8')fname2=t.split('=')[-1].split('>')[1].split('<')[0]+'.html'.replace('','')#printnewi,fname2try:r=requests.get(newi,headers={'User-Agent':'Mozilla/5.0(Linux;U;Android2.3.6;en-us;NexusSBuild/GRK39F)AppleWebKit/533.1(KHTML,likeGecko)Version/4.0MobileSafari/533.1'})soup=bsp(r.content,"html.parser")cont=soup.find('div',{'class':'preview-main'})f=open(fname2,'w')f.write(str(cont))f.close()printfname2,r,'备份成功'except:pass###上面的user-agent随机从下面取一个,我懒的写随机choice了agents=['Mozilla/5.0(Linux;U;Android2.3.6;en-us;NexusSBuild/GRK39F)AppleWebKit/533.1(KHTML,likeGecko)Version/4.0MobileSafari/533.1','AvantBrowser/1.2.789rel1(http://www.avantbrowser.com)','Mozilla/5.0(Windows;U;WindowsNT6.1;en-US)AppleWebKit/532.5(KHTML,likeGecko)Chrome/4.0.249.0Safari/532.5','Mozilla/5.0(Windows;U;WindowsNT5.2;en-US)AppleWebKit/532.9(KHTML,likeGecko)Chrome/5.0.310.0Safari/532.9','Mozilla/5.0(Windows;U;WindowsNT5.1;en-US)AppleWebKit/534.7(KHTML,likeGecko)Chrome/7.0.514.0Safari/534.7','Mozilla/5.0(Windows;U;WindowsNT6.0;en-US)AppleWebKit/534.14(KHTML,likeGecko)Chrome/9.0.601.0Safari/534.14','Mozilla/5.0(Windows;U;WindowsNT6.1;en-US)AppleWebKit/534.14(KHTML,likeGecko)Chrome/10.0.601.0Safari/534.14','Mozilla/5.0(Windows;U;WindowsNT6.1;en-US)AppleWebKit/534.20(KHTML,likeGecko)Chrome/11.0.672.2Safari/534.20','Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/534.27(KHTML,likeGecko)Chrome/12.0.712.0Safari/534.27','Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/535.1(KHTML,likeGecko)Chrome/13.0.782.24Safari/535.1','Mozilla/5.0(WindowsNT6.0)AppleWebKit/535.2(KHTML,likeGecko)Chrome/15.0.874.120Safari/535.2','Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/535.7(KHTML,likeGecko)Chrome/16.0.912.36Safari/535.7','Mozilla/5.0(Windows;U;WindowsNT6.0x64;en-US;rv:1.9pre)Gecko/2008072421Minefield/3.0.2pre','Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.9.0.10)Gecko/2009042316Firefox/3.0.10','Mozilla/5.0(Windows;U;WindowsNT6.0;en-GB;rv:1.9.0.11)Gecko/2009060215Firefox/3.0.11(.NETCLR3.5.30729)','Mozilla/5.0(Windows;U;WindowsNT6.0;en-US;rv:1.9.1.6)Gecko/20091201Firefox/3.5.6GTB5','Mozilla/5.0(Windows;U;WindowsNT5.1;tr;rv:1.9.2.8)Gecko/20100722Firefox/3.6.8(.NETCLR3.5.30729;.NET4.0E)','Mozilla/5.0(WindowsNT6.1;rv:2.0.1)Gecko/20100101Firefox/4.0.1','Mozilla/5.0(WindowsNT6.1;Win64;x64;rv:2.0.1)Gecko/20100101Firefox/4.0.1','Mozilla/5.0(WindowsNT5.1;rv:5.0)Gecko/20100101Firefox/5.0','Mozilla/5.0(WindowsNT6.1;WOW64;rv:6.0a2)Gecko/20110622Firefox/6.0a2','Mozilla/5.0(WindowsNT6.1;WOW64;rv:7.0.1)Gecko/20100101Firefox/7.0.1','Mozilla/5.0(WindowsNT6.1;WOW64;rv:2.0b4pre)Gecko/20100815Minefield/4.0b4pre','Mozilla/4.0(compatible;MSIE5.5;WindowsNT5.0)','Mozilla/4.0(compatible;MSIE5.5;Windows98;Win9x4.90)','Mozilla/5.0(Windows;U;WindowsXP)GeckoMultiZilla/1.6.1.0a','Mozilla/2.02E(Win95;U)','Mozilla/3.01Gold(Win95;I)','Mozilla/4.8[en](WindowsNT5.1;U)','Mozilla/5.0(Windows;U;Win98;en-US;rv:1.4)GeckoNetscape/7.1(ax)','Mozilla/5.0(hp-tablet;Linux;hpwOS/3.0.2;U;de-DE)AppleWebKit/534.6(KHTML,likeGecko)wOSBrowser/234.40.1Safari/534.6TouchPad/1.0',]
备份如下
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。