当前位置:网站首页>Python crawler actual combat details: crawling home of pictures
Python crawler actual combat details: crawling home of pictures
2020-11-06 01:17:00 【itread01】
Preface
The text and pictures in this article are from the Internet , Just for learning 、 Communication use , It doesn't have any commercial use , The copyright belongs to the original author , If you have any problem, please contact us in time for handling
How to use python To implement a crawler ?
- Simulation browser
Request and access to website information
Extract the information we want from the source data Data screening
Store the screened data
What tools are needed to complete a crawler
- Python3.6
- pycharm Professional version
Target site
Home of pictures
https://www.tupianzj.com/
Crawler code
Import tool
python Self contained standard library
import ssl
System library Automatically create storage folder
import os
Download the package
import urllib.request
Network Library Third party package
import requests
Web page selector
from bs4 import BeautifulSoup
Default request https The website doesn't need certificate authentication
ssl._create_default_https_context = ssl._create_unverified_context
Simulation browser
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36', }
Automatically create a folder
if not os.path.exists('./ Illustration material /'): os.mkdir('./ Illustration material /') else: pass
Request operation
url = 'https://www.tupianzj.com/meinv/mm/meizitu/' html = requests.get(url, headers=headers).text
Do data extraction for the original data of the page
soup = BeautifulSoup(html, 'lxml') images_data = soup.find('ul', class_='d1 ico3').find_all_next('li') for image in images_data: image_url = image.find_all('img') for _ in image_url: print(_['src'], _['alt'])
Download
try: urllib.request.urlretrieve(_['src'], './ Illustration material /' + _['alt'] + '.jpg') except: pass
Renderings
版权声明
本文为[itread01]所创,转载请带上原文链接,感谢
边栏推荐
- Every day we say we need to do performance optimization. What are we optimizing?
- Polkadot series (2) -- detailed explanation of mixed consensus
- DevOps是什么
- Python自动化测试学习哪些知识?
- 向北京集结!OpenI/O 2020启智开发者大会进入倒计时
- 中小微企业选择共享办公室怎么样?
- PHPSHE 短信插件说明
- C++和C++程序员快要被市场淘汰了
- 100元扫货阿里云是怎样的体验?
- [performance optimization] Nani? Memory overflow again?! It's time to sum up the wave!!
猜你喜欢
直播预告 | 微服务架构学习系列直播第三期
Just now, I popularized two unique skills of login to Xuemei
hadoop 命令总结
JetCache埋点的骚操作,不服不行啊
TRON智能钱包PHP开发包【零TRX归集】
Tool class under JUC package, its name is locksupport! Did you make it?
Network security engineer Demo: the original * * is to get your computer administrator rights! 【***】
Computer TCP / IP interview 10 even asked, how many can you withstand?
(1)ASP.NET Core3.1 Ocelot介紹
Don't go! Here is a note: picture and text to explain AQS, let's have a look at the source code of AQS (long text)
随机推荐
JetCache埋点的骚操作,不服不行啊
嘘!异步事件这样用真的好么?
Filecoin的经济模型与未来价值是如何支撑FIL币价格破千的
直播预告 | 微服务架构学习系列直播第三期
Deep understanding of common methods of JS array
Cos start source code and creator
Skywalking series blog 5-apm-customize-enhance-plugin
事半功倍:在没有机柜的情况下实现自动化
Network programming NiO: Bio and NiO
Programmer introspection checklist
C++和C++程序员快要被市场淘汰了
給萌新HTML5 入門指南(二)
Synchronous configuration from git to consult with git 2consul
Troubleshooting and summary of JVM Metaspace memory overflow
Just now, I popularized two unique skills of login to Xuemei
Wiremock: a powerful tool for API testing
xmppmini 專案詳解:一步一步從原理跟我學實用 xmpp 技術開發 4.字串解碼祕笈與訊息包
Using Es5 to realize the class of ES6
从海外进军中国,Rancher要执容器云市场牛耳 | 爱分析调研
多机器人行情共享解决方案