Python crawler actual combat details: crawling home of pictures

2020-11-06 01:17:00 itread01


The text and pictures in this article are from the Internet , Just for learning 、 Communication use , It doesn't have any commercial use , The copyright belongs to the original author , If you have any problem, please contact us in time for handling

How to use python To implement a crawler ?

  • Simulation browser
    Request and access to website information
    Extract the information we want from the source data Data screening
    Store the screened data

What tools are needed to complete a crawler

  • Python3.6
  • pycharm Professional version

Target site

Home of pictures



Crawler code

Import tool

python Self contained standard library

import ssl


System library Automatically create storage folder

import os


Download the package

import urllib.request


Network Library Third party package

import requests


Web page selector

from bs4 import BeautifulSoup


Default request https The website doesn't need certificate authentication

ssl._create_default_https_context = ssl._create_unverified_context


Simulation browser

headers = {
        'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',


Automatically create a folder

if not os.path.exists('./ Illustration material /'):
    os.mkdir('./ Illustration material /')


Request operation

url = 'https://www.tupianzj.com/meinv/mm/meizitu/'
html = requests.get(url, headers=headers).text


Do data extraction for the original data of the page

soup = BeautifulSoup(html, 'lxml')
images_data = soup.find('ul', class_='d1 ico3').find_all_next('li')
for image in images_data:
    image_url = image.find_all('img')
    for _ in image_url:
        print(_['src'], _['alt'])



    urllib.request.urlretrieve(_['src'], './ Illustration material /' + _['alt'] + '.jpg')





