当前位置:网站首页>Crawler career from scratch (I): crawl the photos of my little sister ① (the website has been disabled)
Crawler career from scratch (I): crawl the photos of my little sister ① (the website has been disabled)
2022-07-03 09:18:00 【fishfuck】
List of articles
Preface
Start with this article , We will crawl through several articles in a row (url :https://imoemei.com/) All the pictures of my little sister . With this example, let's learn simple python Reptiles .
See related articles
A reptilian career from scratch ( Two ): Crawling for a picture of my little sister ②
A reptilian career from scratch ( 3、 ... and ): Crawling for a picture of my little sister ③
Display the page that needs to be crawled


Thought analysis
1. Page source analysis
First, let's check the source code of the page


Found his picture url All in a class called entry-content Of div In block , Then our goal is to take out p Label under src, This is the address of each picture , Then save it to the computer .
2. Reptilian thinking
Direct use request Get the whole page , Reuse BeatutifulSoup Parse web pages , Take out all the picture links , Finally, it's preserved
The crawler code
1. development environment
development environment :win10 python3.6.8
Using tools :pycharm
Using third party libraries :requests、os、BeatutifulSoup
2. Code decomposition
(1). Import and stock in
import requests
import os
from bs4 import BeautifulSoup
(2). Get the address of each picture
target_url = "https://imoemei.com/zipai/6288.html"
r = requests.get(url=target_url)
html = BeautifulSoup(r.text, 'html5lib')
entry_content = html.find('div', class_='entry-content')
img_list = entry_content.find_all('img')
for img in img_list:
img_url = img.get('src')
result = requests.get(img_url).content
(3). Save the picture to the specified folder
num = 0
name = html.find('h1').text
path = ' picture '
if not os.path.exists(path):
os.mkdir(path)
f = open(path + '/' + name + str(num) + '.jpg', 'wb')
f.write(result)
num += 1
print(' Downloading {} The first {} A picture '.format(name, num))
3. The overall code
import requests
import os
from bs4 import BeautifulSoup
target_url = "https://imoemei.com/zipai/6288.html"
r = requests.get(url=target_url)
html = BeautifulSoup(r.text, 'html5lib')
entry_content = html.find('div', class_='entry-content')
img_list = entry_content.find_all('img')
img_urls = []
num = 0
name = html.find('h1').text
for img in img_list:
img_url = img.get('src')
result = requests.get(img_url).content
path = ' picture '
if not os.path.exists(path):
os.mkdir(path)
f = open(path + '/' + name + str(num) + '.jpg', 'wb')
f.write(result)
num += 1
print(' Downloading {} The first {} A picture '.format(name, num))
Crawling results



You can see , This climb was very successful
边栏推荐
- 低代码前景可期,JNPF灵活易用,用智能定义新型办公模式
- Simple use of MATLAB
- Digital statistics DP acwing 338 Counting problem
- [untitled] use of cmake
- AcWing 786. Number k
- Liteide is easy to use
- Move anaconda, pycharm and jupyter notebook to mobile hard disk
- Solve POM in idea Comment top line problem in XML file
- Common penetration test range
- [point cloud processing paper crazy reading classic version 11] - mining point cloud local structures by kernel correlation and graph pooling
猜你喜欢
![[point cloud processing paper crazy reading classic version 12] - foldingnet: point cloud auto encoder via deep grid deformation](/img/62/edb888200e3743b03e5b39d94758f8.png)
[point cloud processing paper crazy reading classic version 12] - foldingnet: point cloud auto encoder via deep grid deformation

我們有個共同的名字,XX工

Install third-party libraries such as Jieba under Anaconda pytorch

Go language - Reflection
![[set theory] order relation (chain | anti chain | chain and anti chain example | chain and anti chain theorem | chain and anti chain inference | good order relation)](/img/fd/c0f885cdd17f1d13fdbc71b2aea641.jpg)
[set theory] order relation (chain | anti chain | chain and anti chain example | chain and anti chain theorem | chain and anti chain inference | good order relation)

Jenkins learning (I) -- Jenkins installation

数字化转型中,企业设备管理会出现什么问题?JNPF或将是“最优解”

【点云处理之论文狂读经典版11】—— Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling

低代码起势,这款信息管理系统开发神器,你值得拥有!

Basic knowledge of network security
随机推荐
【点云处理之论文狂读经典版10】—— PointCNN: Convolution On X-Transformed Points
Move anaconda, pycharm and jupyter notebook to mobile hard disk
LeetCode 871. 最低加油次数
[point cloud processing paper crazy reading frontier version 8] - pointview gcn: 3D shape classification with multi view point clouds
[point cloud processing paper crazy reading classic version 8] - o-cnn: octree based revolutionary neural networks for 3D shape analysis
【点云处理之论文狂读前沿版13】—— GAPNet: Graph Attention based Point Neural Network for Exploiting Local Feature
LeetCode 30. 串联所有单词的子串
Method of intercepting string in shell
即时通讯IM,是时代进步的逆流?看看JNPF怎么说
AcWing 786. 第k个数
PIC16F648A-E/SS PIC16 8位 微控制器,7KB(4Kx14)
【点云处理之论文狂读前沿版11】—— Unsupervised Point Cloud Pre-training via Occlusion Completion
AcWing 787. Merge sort (template)
LeetCode 1089. 复写零
[advanced feature learning on point clouds using multi resolution features and learning]
LeetCode 715. Range module
Memory search acwing 901 skiing
We have a common name, XX Gong
Severity code description the project file line prohibits the display of status error c2440 "initialization": unable to convert from "const char [31]" to "char *"
Introduction to the usage of getopts in shell