当前位置:网站首页>Crawler career from scratch (I): crawl the photos of my little sister ① (the website has been disabled)
Crawler career from scratch (I): crawl the photos of my little sister ① (the website has been disabled)
2022-07-03 09:18:00 【fishfuck】
List of articles
Preface
Start with this article , We will crawl through several articles in a row (url :https://imoemei.com/) All the pictures of my little sister . With this example, let's learn simple python Reptiles .
See related articles
A reptilian career from scratch ( Two ): Crawling for a picture of my little sister ②
A reptilian career from scratch ( 3、 ... and ): Crawling for a picture of my little sister ③
Display the page that needs to be crawled


Thought analysis
1. Page source analysis
First, let's check the source code of the page


Found his picture url All in a class called entry-content Of div In block , Then our goal is to take out p Label under src, This is the address of each picture , Then save it to the computer .
2. Reptilian thinking
Direct use request Get the whole page , Reuse BeatutifulSoup Parse web pages , Take out all the picture links , Finally, it's preserved
The crawler code
1. development environment
development environment :win10 python3.6.8
Using tools :pycharm
Using third party libraries :requests、os、BeatutifulSoup
2. Code decomposition
(1). Import and stock in
import requests
import os
from bs4 import BeautifulSoup
(2). Get the address of each picture
target_url = "https://imoemei.com/zipai/6288.html"
r = requests.get(url=target_url)
html = BeautifulSoup(r.text, 'html5lib')
entry_content = html.find('div', class_='entry-content')
img_list = entry_content.find_all('img')
for img in img_list:
img_url = img.get('src')
result = requests.get(img_url).content
(3). Save the picture to the specified folder
num = 0
name = html.find('h1').text
path = ' picture '
if not os.path.exists(path):
os.mkdir(path)
f = open(path + '/' + name + str(num) + '.jpg', 'wb')
f.write(result)
num += 1
print(' Downloading {} The first {} A picture '.format(name, num))
3. The overall code
import requests
import os
from bs4 import BeautifulSoup
target_url = "https://imoemei.com/zipai/6288.html"
r = requests.get(url=target_url)
html = BeautifulSoup(r.text, 'html5lib')
entry_content = html.find('div', class_='entry-content')
img_list = entry_content.find_all('img')
img_urls = []
num = 0
name = html.find('h1').text
for img in img_list:
img_url = img.get('src')
result = requests.get(img_url).content
path = ' picture '
if not os.path.exists(path):
os.mkdir(path)
f = open(path + '/' + name + str(num) + '.jpg', 'wb')
f.write(result)
num += 1
print(' Downloading {} The first {} A picture '.format(name, num))
Crawling results



You can see , This climb was very successful
边栏推荐
- 【点云处理之论文狂读经典版13】—— Adaptive Graph Convolutional Neural Networks
- AcWing 787. Merge sort (template)
- Matlab dichotomy to find the optimal solution
- 【点云处理之论文狂读经典版14】—— Dynamic Graph CNN for Learning on Point Clouds
- Go language - Reflection
- 浅谈企业信息化建设
- [point cloud processing paper crazy reading frontier edition 13] - gapnet: graph attention based point neural network for exploring local feature
- Noip 2002 popularity group selection number
- [point cloud processing paper crazy reading classic version 11] - mining point cloud local structures by kernel correlation and graph pooling
- Basic knowledge of network security
猜你喜欢

【点云处理之论文狂读前沿版9】—Advanced Feature Learning on Point Clouds using Multi-resolution Features and Learni

AcWing 785. 快速排序(模板)
![[advanced feature learning on point clouds using multi resolution features and learning]](/img/f0/abed28e94eb4a95c716ab8cecffe04.png)
[advanced feature learning on point clouds using multi resolution features and learning]

Just graduate student reading thesis

LeetCode 532. 数组中的 k-diff 数对

LeetCode 324. 摆动排序 II

即时通讯IM,是时代进步的逆流?看看JNPF怎么说

【点云处理之论文狂读前沿版11】—— Unsupervised Point Cloud Pre-training via Occlusion Completion

AcWing 787. Merge sort (template)

【点云处理之论文狂读前沿版8】—— Pointview-GCN: 3D Shape Classification With Multi-View Point Clouds
随机推荐
Excel is not as good as jnpf form for 3 minutes in an hour. Leaders must praise it when making reports like this!
【点云处理之论文狂读前沿版8】—— Pointview-GCN: 3D Shape Classification With Multi-View Point Clouds
[point cloud processing paper crazy reading classic version 7] - dynamic edge conditioned filters in revolutionary neural networks on Graphs
npm install安装依赖包报错解决方法
AcWing 785. Quick sort (template)
Save the drama shortage, programmers' favorite high-score American drama TOP10
Explanation of the answers to the three questions
Redis learning (I)
LeetCode 438. 找到字符串中所有字母异位词
20220630 learning clock in
2022-2-13 learning the imitation Niuke project - home page of the development community
LeetCode 241. Design priorities for operational expressions
【点云处理之论文狂读前沿版9】—Advanced Feature Learning on Point Clouds using Multi-resolution Features and Learni
Sword finger offer II 029 Sorted circular linked list
LeetCode 324. 摆动排序 II
LeetCode 75. 颜色分类
干货!零售业智能化管理会遇到哪些问题?看懂这篇文章就够了
Simple use of MATLAB
Just graduate student reading thesis
Introduction to the basic application and skills of QT