介绍代码仅作学习使用,如若使用本站代码进行商业用途等牟利带来的后果,本站不负有责任

wallhaven

爬取代码

注意事项

  • 依赖:requestslxml
  • 修改:你的下载目录路径
'''*************************************************
Copyright (Python), 2020-,Literature Tech. Co., Ltd.
source:    None
Author:    Written by Literature
Version:   1.0
Date:      2020.07.17
Description:  
Others:   None
Function List:  main
History:  The first edition 2020.05.28
*************************************************'''
import requests
from lxml import etree

class Spider:
    def __init__(self):
        self.toplist_image =[] # 初始化一级URL列表
        self.a = 0 # 防止名字重复,定义数量值
        self.file_name = "" # 文件名
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36",
            "Connection": "keep-alive",
            "Upgrade-Insecure-Requests": "1",
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36",
            "Accept": " text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
            "Accept-Encoding": "gzip,deflate",
            "Accept-Language": "zh-CN,zh;q=0.8"
        } # 加上反扒请求头
    def req(self):
        """请求数据"""
        print("正在获取壁纸...")
        for i in range(1,200): # 定义爬取的页数
            url = "https://wallhaven.cc/toplist?page=" + f"{i}" # 定义初始 URL,由于网页原因,定义变量i进行翻页操作
            result = requests.get(url).content # 发起请求
            html = etree.HTML(result)
            title = html.xpath('//a[@class = "preview"]/@href')#  用XPATH解析网页,提取需要的一级URL,返回一个列表
            for url1 in title:
                self.toplist_image.append(url1)# 把一级URL添加到toplist_image列表中


    def download(self):
        for i in self.toplist_image: # 遍历一级URL列表
            res = requests.get(i).content # 再次请求
            html = etree.HTML(res)
            title = html.xpath('//div[@class = "scrollbox"]/img/@src') # 获取二级URL列表,返回列表
            self.file_name = "/你的下载路径/"+f"{self.a+1}.jpg" # 定义图片本地存储路径和名字
            self.a+=1
            print(f"正在下载-壁纸{self.a}.jpg")
            for img in title: # 遍历二级URL列表
                resa = requests.get(img) # 请求二级URL
                with open(self.file_name, mode="wb") as file:
                    file.write(resa.content) # 写入本地文件
                    file.close() # 关闭
s = Spider()
s.req()
s.download()

代码运行效果



【腾讯云】星星海SA2云服务器618钜惠,1核2G 首年95元【点击查看】。


文章: 《利用Python爬取wallhaven.cc的图片》
联系方式:
文章链接:https://wxiou.cn/index.php/archives/102/
除特别注明外,文章均为Literature原创,转载时请注明本文出处及文章链接
Last modification:July 24th, 2020 at 06:55 pm
如果觉得我的文章对你有用,请随意赞赏