php爬虫实战(抓取美拍视频)
抓取页面
地址:http://www.meipai.com/medias/hot
publicfunctiongetContentByFilegetcontents($url){$content=file_get_contents($url);return$content;}
然后我们会获取到整个页面的代码,接下来就是从代码中提取出视频的地址 标题 图片等关键信息
2.提取
我们发现视频的主要代码集中在以下代码中
<liclass="prno-selectloadingJ_media_list_item"itemscopeitemtype="http://schema.org/VideoObject"><imgsrc="https://cache.yisu.com/upload/information/20200310/52/108717.jpg!thumb320"width="300"height="300"class="dbpapai"alt="手撕包菜。包菜撕片装洗净备用。热锅入油五花肉下锅煸炒出油,多余的油盛出。放酱油肉上色,盛出。之前的油倒锅内,放蒜辣椒炒香,下包菜继续翻炒,倒适量酱油老抽五香粉。再下之前炒好的五花肉翻炒,放适量盐,出锅前放鸡精淋入适量香醋即可非常香啊,超级下饭。喜欢的点赞奥#美食##家常菜#"itemprop="thumbnail"><divid="w517161790"class="content-l-videocontent-l-media-wrapprcp"data-id="517161790"data-video="http://mvvideo1.meitudata.com/5734040ae2dec950.mp4"><divclass="layer-blackpa"></div><ahidefocushref="/media/517161790"target="_blank"class="content-l-ppa"title="手撕包菜。包菜撕片装洗净备用。热锅入油五花肉下锅煸炒出油,多余的油盛出。放酱油肉上色,盛出。之前的油倒锅内,放蒜辣椒炒香,下包菜继续翻炒,倒适量酱油老抽五香粉。再下之前炒好的五花肉翻炒,放适量盐,出锅前放鸡精淋入适量香醋即可非常香啊,超级下饭。喜欢的点赞奥#美食##家常菜#"><metaitemprop="url"content="/media/517161790"><iclass="iconicon-item-play"></i><strongclass="js-convert-emoji"itemprop="description">哈喇嘎子流成河</strong></a></div><divclass="pr"itemscopeitemtype="http://schema.org/AggregateRating"><ahidefocushref="/user/62299474"class="dblh58"><imgsrc="https://cache.yisu.com/upload/information/20200310/52/108718.jpg!thumb60"width="28"height="28"class="avatarm10"title="小优Lucky"alt="小优Lucky"></a><pclass="content-namepa"><ahidefocushref="/user/62299474"class="content-name-ajs-convert-emoji"title="小优Lucky"itemprop="author">小优Lucky</a></p><divclass="content-likepa"data-id="517161790"><iclass="iconicon-like"></i><spanitemprop="ratingCount">3060</span></div><ahidefocushref="/media/517161790"data-sc="1"target="_blank"class="conten-commandpa"data-id="517161790"><iclass="iconicon-command"></i><spanitemprop="reviewCount">100</span></a></div></li>
通过正则匹配
publicfunctionextracturl($page){$matches=array();$voide=array();$mainurl="";$list=array();$j=0;$pat="/<liclass=\"prno-selectloadingJ_media_list_item\".*?>.*?<\/li>/ism";preg_match_all($pat,$page,$matches,PREG_PATTERN_ORDER);for($i=0;$i<count($matches[0]);$i++){$pat1="/data-video=\"(.*?)\"/ism";preg_match_all($pat1,$matches[0][$i],$voide,PREG_PATTERN_ORDER);$myvoide=$voide[1][0];$pat2="/src=\"(.*?)\"/ism";preg_match_all($pat2,$matches[0][$i],$img,PREG_PATTERN_ORDER);$myimg=$img[1][0];$pat3="/<strongclass=\"js-convert-emoji\".*?>(.*?)<\/strong>/ism";preg_match_all($pat3,$matches[0][$i],$title,PREG_PATTERN_ORDER);$mytitle=$title[1][0];$list[$j++]=array('voide'=>$myvoide,'title'=>$mytitle,'img'=>$myimg);}return$list;}}
全部代码
<?phpclassCutecrawler{publicfunctiongetContentByFilegetcontents($url){$content=file_get_contents($url);return$content;}publicfunctionextracturl($page){$matches=array();$voide=array();$mainurl="";$list=array();$j=0;$pat="/<liclass=\"prno-selectloadingJ_media_list_item\".*?>.*?<\/li>/ism";preg_match_all($pat,$page,$matches,PREG_PATTERN_ORDER);for($i=0;$i<count($matches[0]);$i++){$pat1="/data-video=\"(.*?)\"/ism";preg_match_all($pat1,$matches[0][$i],$voide,PREG_PATTERN_ORDER);$myvoide=$voide[1][0];$pat2="/src=\"(.*?)\"/ism";preg_match_all($pat2,$matches[0][$i],$img,PREG_PATTERN_ORDER);$myimg=$img[1][0];$pat3="/<strongclass=\"js-convert-emoji\".*?>(.*?)<\/strong>/ism";preg_match_all($pat3,$matches[0][$i],$title,PREG_PATTERN_ORDER);$mytitle=$title[1][0];$list[$j++]=array('voide'=>$myvoide,'title'=>$mytitle,'img'=>$myimg);}return$list;}}$url="http://www.meipai.com/medias/hot";$crawler=newCutecrawler();$content=$crawler->getContentByFilegetcontents($url);$c=$crawler->extracturl($content);var_dump($c);?>
最后结果:
array(24){[0]=>array(3){["voide"]=>string(51)"http://mvvideo2.meitudata.com/5737fd5caeb838981.mp4"["title"]=>string(27)"老师那些年常说的话"["img"]=>string(58)"https://cache.yisu.com/upload/information/20200310/52/108720.jpg!thumb320"}[1]=>array(3){["voide"]=>string(50)"http://mvvideo2.meitudata.com/5737fceabf873602.mp4"["title"]=>string(21)"女友突然冷落你"["img"]=>string(58)"http://mvimg2.meitudata.com/5736d25d0aa5d8991.jpg!thumb320"}[2]=>array(3){["voide"]=>string(51)"http://mvvideo2.meitudata.com/5737f300131e18596.mp4"["title"]=>string(27)"女明星之间的内心戏"["img"]=>string(58)"https://cache.yisu.com/upload/information/20200310/52/108722.jpg!thumb320"}[3]=>array(3){["voide"]=>string(51)"http://mvvideo2.meitudata.com/5737eb9d0bfc92046.mp4"["title"]=>string(24)"真替老师感到悲剧"["img"]=>string(57)"https://cache.yisu.com/upload/information/20200310/52/108723.jpg!thumb320"}
接下来。。。你可以存入数据库
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。