Python爬虫完整代码拿走不谢

对于新手做Python爬虫来说是有点难处的,前期练习的时候可以直接套用模板,这样省时省力还很方便。

使用Python爬取某网站的相关数据,并保存到同目录下Excel。

直接上代码:

import re
import urllib.error
import urllib.request

import xlwt
from bs4 import BeautifulSoup


def main():
    baseurl ="http://jshk.com.cn"

    datelist = getDate(baseurl)
    savepath=".\\jshk.xls"
    saveDate(datelist,savepath)

    # askURL("http://jshk.com.cn/")

findlink = re.compile(r'')
findimg = re.compile(r' 
   "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"}
    request=urllib.request.Request(url,headers=head)
    html=""
    try:
        response=urllib.request.urlopen(request)
        html=response.read().decode("utf-8")
        # print(html)
    except urllib.error.URLError as e:
        if hasattr(e,"code"):
            print(e.code)
        if hasattr(e,"reason"):
            print(e.reason)

    return html

def saveDate(datalist,savepath):
    workbook = xlwt.Workbook(encoding='utf-8')
    worksheet = workbook.add_sheet('电影',cell_overwrite_ok=True)
    col =("电影详情","图片","影片","评分","评价数","概况")
    for i in range(0,5):
        worksheet.write(0,i,col[i])
    for i in range(0,250):
        print("第%d条" %(i+1))
        data=datalist[i]
        for j in range(0,5):
            worksheet.write(i+1,j,data[j])

    workbook.save(savepath)



if __name__ == '__main__':
    main()
    print("爬取完毕")

直接复制粘贴就行。

若要更改爬取网站,则需要更改URL以及相应的html格式(代码中的“item”)。

本文来自网络,不代表协通编程立场,如若转载,请注明出处:https://net2asp.com/1b701b9fa5.html