当前位置:网站首页>Python crawler actual combat details: crawling home of pictures
Python crawler actual combat details: crawling home of pictures
2020-11-06 01:17:00 【itread01】
Preface
The text and pictures in this article are from the Internet , Just for learning 、 Communication use , It doesn't have any commercial use , The copyright belongs to the original author , If you have any problem, please contact us in time for handling
How to use python To implement a crawler ?
- Simulation browser
Request and access to website information
Extract the information we want from the source data Data screening
Store the screened data
What tools are needed to complete a crawler
- Python3.6
- pycharm Professional version
Target site
Home of pictures
https://www.tupianzj.com/
Crawler code
Import tool
python Self contained standard library
import ssl
System library Automatically create storage folder
import os
Download the package
import urllib.request
Network Library Third party package
import requests
Web page selector
from bs4 import BeautifulSoup
Default request https The website doesn't need certificate authentication
ssl._create_default_https_context = ssl._create_unverified_context
Simulation browser
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36', }
Automatically create a folder
if not os.path.exists('./ Illustration material /'): os.mkdir('./ Illustration material /') else: pass
Request operation
url = 'https://www.tupianzj.com/meinv/mm/meizitu/' html = requests.get(url, headers=headers).text
Do data extraction for the original data of the page
soup = BeautifulSoup(html, 'lxml') images_data = soup.find('ul', class_='d1 ico3').find_all_next('li') for image in images_data: image_url = image.find_all('img') for _ in image_url: print(_['src'], _['alt'])
Download
try: urllib.request.urlretrieve(_['src'], './ Illustration material /' + _['alt'] + '.jpg') except: pass
Renderings
版权声明
本文为[itread01]所创,转载请带上原文链接,感谢
边栏推荐
- WeihanLi.Npoi 1.11.0/1.12.0 Release Notes
- Dapr實現分散式有狀態服務的細節
- Character string and memory operation function in C language
- 01 . Go语言的SSH远程终端及WebSocket
- GDB除錯基礎使用方法
- (1) ASP.NET Introduction to core3.1 Ocelot
- 采购供应商系统是什么?采购供应商管理平台解决方案
- 加速「全民直播」洪流,如何攻克延时、卡顿、高并发难题?
- Filecoin主网上线以来Filecoin矿机扇区密封到底是什么意思
- 嘗試從零開始構建我的商城 (二) :使用JWT保護我們的資訊保安,完善Swagger配置
猜你喜欢
选择站群服务器的有哪些标准呢?
怎么理解Python迭代器与生成器?
Computer TCP / IP interview 10 even asked, how many can you withstand?
连肝三个通宵,JVM77道高频面试题详细分析,就这?
全球疫情加速互联网企业转型,区块链会是解药吗?
直播预告 | 微服务架构学习系列直播第三期
哇,ElasticSearch多字段权重排序居然可以这么玩
比特币一度突破14000美元,即将面临美国大选考验
This article will introduce you to jest unit test
How to select the evaluation index of classification model
随机推荐
Top 10 best big data analysis tools in 2020
Subordination judgment in structured data
Polkadot series (2) -- detailed explanation of mixed consensus
选择站群服务器的有哪些标准呢?
How to demote a domain controller in Windows Server 2012 and later
Process analysis of Python authentication mechanism based on JWT
大数据应用的重要性体现在方方面面
连肝三个通宵,JVM77道高频面试题详细分析,就这?
人工智能学什么课程?它将替代人类工作?
DevOps是什么
Cos start source code and creator
合约交易系统开发|智能合约交易平台搭建
hadoop 命令总结
WeihanLi.Npoi 1.11.0/1.12.0 Release Notes
The difference between Es5 class and ES6 class
Details of dapr implementing distributed stateful service
01 . Go语言的SSH远程终端及WebSocket
嘗試從零開始構建我的商城 (二) :使用JWT保護我們的資訊保安,完善Swagger配置
恕我直言,我也是才知道ElasticSearch条件更新是这么玩的
Deep understanding of common methods of JS array