当前位置:网站首页>Python crawler actual combat details: crawling home of pictures
Python crawler actual combat details: crawling home of pictures
2020-11-06 01:17:00 【itread01】
Preface
The text and pictures in this article are from the Internet , Just for learning 、 Communication use , It doesn't have any commercial use , The copyright belongs to the original author , If you have any problem, please contact us in time for handling
How to use python To implement a crawler ?
- Simulation browser
Request and access to website information
Extract the information we want from the source data Data screening
Store the screened data
What tools are needed to complete a crawler
- Python3.6
- pycharm Professional version
Target site
Home of pictures
https://www.tupianzj.com/
Crawler code
Import tool
python Self contained standard library
import ssl
System library Automatically create storage folder
import os
Download the package
import urllib.request
Network Library Third party package
import requests
Web page selector
from bs4 import BeautifulSoup
Default request https The website doesn't need certificate authentication
ssl._create_default_https_context = ssl._create_unverified_context
Simulation browser
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36', }
Automatically create a folder
if not os.path.exists('./ Illustration material /'): os.mkdir('./ Illustration material /') else: pass
Request operation
url = 'https://www.tupianzj.com/meinv/mm/meizitu/' html = requests.get(url, headers=headers).text
Do data extraction for the original data of the page
soup = BeautifulSoup(html, 'lxml') images_data = soup.find('ul', class_='d1 ico3').find_all_next('li') for image in images_data: image_url = image.find_all('img') for _ in image_url: print(_['src'], _['alt'])
Download
try: urllib.request.urlretrieve(_['src'], './ Illustration material /' + _['alt'] + '.jpg') except: pass
Renderings
版权声明
本文为[itread01]所创,转载请带上原文链接,感谢
边栏推荐
- Filecoin最新动态 完成重大升级 已实现四大项目进展!
- 条码生成软件如何隐藏部分条码文字
- 10 easy to use automated testing tools
- 数据产品不就是报表吗?大错特错!这分类里有大学问
- Flink on paasta: yelp's new stream processing platform running on kubernetes
- 人工智能学什么课程?它将替代人类工作?
- After brushing leetcode's linked list topic, I found a secret!
- 业内首发车道级导航背后——详解高精定位技术演进与场景应用
- GDB除錯基礎使用方法
- Sort the array in ascending order according to the frequency
猜你喜欢
Didi elasticsearch cluster cross version upgrade and platform reconfiguration
直播预告 | 微服务架构学习系列直播第三期
阿里云Q2营收破纪录背后,云的打开方式正在重塑
Subordination judgment in structured data
ipfs正舵者Filecoin落地正当时 FIL币价格破千来了
Don't go! Here is a note: picture and text to explain AQS, let's have a look at the source code of AQS (long text)
DRF JWT authentication module and self customization
Want to do read-write separation, give you some small experience
简直骚操作,ThreadLocal还能当缓存用
哇,ElasticSearch多字段权重排序居然可以这么玩
随机推荐
连肝三个通宵,JVM77道高频面试题详细分析,就这?
Use of vuepress
如何将数据变成资产?吸引数据科学家
ES6 essence:
In depth understanding of the construction of Intelligent Recommendation System
直播预告 | 微服务架构学习系列直播第三期
人工智能学什么课程?它将替代人类工作?
恕我直言,我也是才知道ElasticSearch条件更新是这么玩的
Why do private enterprises do party building? ——Special subject study of geek state holding Party branch
Python自动化测试学习哪些知识?
Keyboard entry lottery random draw
深度揭祕垃圾回收底層,這次讓你徹底弄懂她
WeihanLi.Npoi 1.11.0/1.12.0 Release Notes
加速「全民直播」洪流,如何攻克延时、卡顿、高并发难题?
关于Kubernetes 与 OAM 构建统一、标准化的应用管理平台知识!(附网盘链接)
GDB除錯基礎使用方法
GUI 引擎评价指标
How long does it take you to work out an object-oriented programming interview question from Ali school?
全球疫情加速互联网企业转型,区块链会是解药吗?
向北京集结!OpenI/O 2020启智开发者大会进入倒计时