当前位置：网站首页>Python crawler actual combat details: crawling home of pictures

Python crawler actual combat details: crawling home of pictures

2020-11-06 01:17:00 【itread01】

Preface

The text and pictures in this article are from the Internet , Just for learning 、 Communication use , It doesn't have any commercial use , The copyright belongs to the original author , If you have any problem, please contact us in time for handling

How to use python To implement a crawler ？

Simulation browser
Request and access to website information
Extract the information we want from the source data Data screening
Store the screened data

What tools are needed to complete a crawler

Python3.6
pycharm Professional version

Target site

Home of pictures

https://www.tupianzj.com/

Crawler code

Import tool

python Self contained standard library

import ssl

System library Automatically create storage folder

import os

Download the package

import urllib.request

Network Library Third party package

import requests

Web page selector

from bs4 import BeautifulSoup

Default request https The website doesn't need certificate authentication

ssl._create_default_https_context = ssl._create_unverified_context

Simulation browser

headers = {
    'User-Agent':
        'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
}

Automatically create a folder

if not os.path.exists('./ Illustration material /'):
    os.mkdir('./ Illustration material /')
else:
    pass

Request operation

url = 'https://www.tupianzj.com/meinv/mm/meizitu/'
html = requests.get(url, headers=headers).text

Do data extraction for the original data of the page

soup = BeautifulSoup(html, 'lxml')
images_data = soup.find('ul', class_='d1 ico3').find_all_next('li')
for image in images_data:
    image_url = image.find_all('img')
    for _ in image_url:
        print(_['src'], _['alt'])

Download

try:
    urllib.request.urlretrieve(_['src'], './ Illustration material /' + _['alt'] + '.jpg')
except:
    pass

Renderings

版权声明
本文为[itread01]所创，转载请带上原文链接，感谢

当前位置：网站首页>Python crawler actual combat details: crawling home of pictures

Python crawler actual combat details: crawling home of pictures

Preface

Crawler code

边栏推荐

猜你喜欢

随机推荐