当前位置:网站首页>Crawler career from scratch (I): crawl the photos of my little sister ① (the website has been disabled)
Crawler career from scratch (I): crawl the photos of my little sister ① (the website has been disabled)
2022-07-03 09:18:00 【fishfuck】
List of articles
Preface
Start with this article , We will crawl through several articles in a row (url :https://imoemei.com/) All the pictures of my little sister . With this example, let's learn simple python Reptiles .
See related articles
A reptilian career from scratch ( Two ): Crawling for a picture of my little sister ②
A reptilian career from scratch ( 3、 ... and ): Crawling for a picture of my little sister ③
Display the page that needs to be crawled


Thought analysis
1. Page source analysis
First, let's check the source code of the page


Found his picture url All in a class called entry-content Of div In block , Then our goal is to take out p Label under src, This is the address of each picture , Then save it to the computer .
2. Reptilian thinking
Direct use request Get the whole page , Reuse BeatutifulSoup Parse web pages , Take out all the picture links , Finally, it's preserved
The crawler code
1. development environment
development environment :win10 python3.6.8
Using tools :pycharm
Using third party libraries :requests、os、BeatutifulSoup
2. Code decomposition
(1). Import and stock in
import requests
import os
from bs4 import BeautifulSoup
(2). Get the address of each picture
target_url = "https://imoemei.com/zipai/6288.html"
r = requests.get(url=target_url)
html = BeautifulSoup(r.text, 'html5lib')
entry_content = html.find('div', class_='entry-content')
img_list = entry_content.find_all('img')
for img in img_list:
img_url = img.get('src')
result = requests.get(img_url).content
(3). Save the picture to the specified folder
num = 0
name = html.find('h1').text
path = ' picture '
if not os.path.exists(path):
os.mkdir(path)
f = open(path + '/' + name + str(num) + '.jpg', 'wb')
f.write(result)
num += 1
print(' Downloading {} The first {} A picture '.format(name, num))
3. The overall code
import requests
import os
from bs4 import BeautifulSoup
target_url = "https://imoemei.com/zipai/6288.html"
r = requests.get(url=target_url)
html = BeautifulSoup(r.text, 'html5lib')
entry_content = html.find('div', class_='entry-content')
img_list = entry_content.find_all('img')
img_urls = []
num = 0
name = html.find('h1').text
for img in img_list:
img_url = img.get('src')
result = requests.get(img_url).content
path = ' picture '
if not os.path.exists(path):
os.mkdir(path)
f = open(path + '/' + name + str(num) + '.jpg', 'wb')
f.write(result)
num += 1
print(' Downloading {} The first {} A picture '.format(name, num))
Crawling results



You can see , This climb was very successful
边栏推荐
- LeetCode 438. 找到字符串中所有字母异位词
- 【毕业季|进击的技术er】又到一年毕业季,一毕业就转行,从动物科学到程序员,10年程序员有话说
- LeetCode 1089. 复写零
- 数字化管理中台+低代码,JNPF开启企业数字化转型的新引擎
- Solve POM in idea Comment top line problem in XML file
- State compression DP acwing 91 Shortest Hamilton path
- STM32F103 can learning record
- Computing level network notes
- Uc/os self-study from 0
- 【点云处理之论文狂读前沿版10】—— MVTN: Multi-View Transformation Network for 3D Shape Recognition
猜你喜欢

Introduction to the basic application and skills of QT

2022-1-6 Niuke net brush sword finger offer
![[kotlin learning] classes, objects and interfaces - classes with non default construction methods or attributes, data classes and class delegates, object keywords](/img/ee/d982fd9e1f2283e09ad1a81d0b61b5.png)
[kotlin learning] classes, objects and interfaces - classes with non default construction methods or attributes, data classes and class delegates, object keywords

我們有個共同的名字,XX工

【点云处理之论文狂读前沿版9】—Advanced Feature Learning on Point Clouds using Multi-resolution Features and Learni

LeetCode 438. Find all letter ectopic words in the string

LeetCode 75. Color classification

数字化管理中台+低代码,JNPF开启企业数字化转型的新引擎
![[point cloud processing paper crazy reading classic version 13] - adaptive graph revolutionary neural networks](/img/61/aa8d0067868ce9e28cadf5369cd65e.png)
[point cloud processing paper crazy reading classic version 13] - adaptive graph revolutionary neural networks

【点云处理之论文狂读经典版13】—— Adaptive Graph Convolutional Neural Networks
随机推荐
Too many open files solution
LeetCode 1089. Duplicate zero
Find the combination number acwing 886 Find the combination number II
干货!零售业智能化管理会遇到哪些问题?看懂这篇文章就够了
低代码前景可期,JNPF灵活易用,用智能定义新型办公模式
Tag paste operator (#)
【点云处理之论文狂读经典版8】—— O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis
2022-2-14 learning the imitation Niuke project - send email
一个优秀速开发框架是什么样的?
Basic knowledge of network security
LeetCode 508. The most frequent subtree elements and
2022-2-13 learning xiangniuke project - version control
Gaussian elimination acwing 883 Gauss elimination for solving linear equations
Go language - Reflection
网络安全必会的基础知识
Vscode connect to remote server
Method of intercepting string in shell
LeetCode 438. 找到字符串中所有字母异位词
Shell script kills the process according to the port number
Instant messaging IM is the countercurrent of the progress of the times? See what jnpf says