当前位置:网站首页>Crawler career from scratch (I): crawl the photos of my little sister ① (the website has been disabled)
Crawler career from scratch (I): crawl the photos of my little sister ① (the website has been disabled)
2022-07-03 09:18:00 【fishfuck】
List of articles
Preface
Start with this article , We will crawl through several articles in a row (url :https://imoemei.com/) All the pictures of my little sister . With this example, let's learn simple python Reptiles .
See related articles
A reptilian career from scratch ( Two ): Crawling for a picture of my little sister ②
A reptilian career from scratch ( 3、 ... and ): Crawling for a picture of my little sister ③
Display the page that needs to be crawled
Thought analysis
1. Page source analysis
First, let's check the source code of the page
Found his picture url All in a class called entry-content Of div In block , Then our goal is to take out p Label under src, This is the address of each picture , Then save it to the computer .
2. Reptilian thinking
Direct use request Get the whole page , Reuse BeatutifulSoup Parse web pages , Take out all the picture links , Finally, it's preserved
The crawler code
1. development environment
development environment :win10 python3.6.8
Using tools :pycharm
Using third party libraries :requests、os、BeatutifulSoup
2. Code decomposition
(1). Import and stock in
import requests
import os
from bs4 import BeautifulSoup
(2). Get the address of each picture
target_url = "https://imoemei.com/zipai/6288.html"
r = requests.get(url=target_url)
html = BeautifulSoup(r.text, 'html5lib')
entry_content = html.find('div', class_='entry-content')
img_list = entry_content.find_all('img')
for img in img_list:
img_url = img.get('src')
result = requests.get(img_url).content
(3). Save the picture to the specified folder
num = 0
name = html.find('h1').text
path = ' picture '
if not os.path.exists(path):
os.mkdir(path)
f = open(path + '/' + name + str(num) + '.jpg', 'wb')
f.write(result)
num += 1
print(' Downloading {} The first {} A picture '.format(name, num))
3. The overall code
import requests
import os
from bs4 import BeautifulSoup
target_url = "https://imoemei.com/zipai/6288.html"
r = requests.get(url=target_url)
html = BeautifulSoup(r.text, 'html5lib')
entry_content = html.find('div', class_='entry-content')
img_list = entry_content.find_all('img')
img_urls = []
num = 0
name = html.find('h1').text
for img in img_list:
img_url = img.get('src')
result = requests.get(img_url).content
path = ' picture '
if not os.path.exists(path):
os.mkdir(path)
f = open(path + '/' + name + str(num) + '.jpg', 'wb')
f.write(result)
num += 1
print(' Downloading {} The first {} A picture '.format(name, num))
Crawling results
You can see , This climb was very successful
边栏推荐
- 我们有个共同的名字,XX工
- 【点云处理之论文狂读经典版11】—— Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling
- [untitled] use of cmake
- How to check whether the disk is in guid format (GPT) or MBR format? Judge whether UEFI mode starts or legacy mode starts?
- LeetCode 535. TinyURL 的加密与解密
- With low code prospect, jnpf is flexible and easy to use, and uses intelligence to define a new office mode
- Method of intercepting string in shell
- LeetCode 1089. 复写零
- We have a common name, XX Gong
- Methods of using arrays as function parameters in shell
猜你喜欢
传统办公模式的“助推器”,搭建OA办公系统,原来就这么简单!
Jenkins learning (III) -- setting scheduled tasks
传统企业数字化转型需要经过哪几个阶段?
[point cloud processing paper crazy reading classic version 11] - mining point cloud local structures by kernel correlation and graph pooling
LeetCode 532. K-diff number pairs in array
State compression DP acwing 291 Mondrian's dream
LeetCode 57. 插入区间
2022-2-13 learning the imitation Niuke project - home page of the development community
We have a common name, XX Gong
Common penetration test range
随机推荐
Overview of database system
AcWing 788. Number of pairs in reverse order
LeetCode 871. 最低加油次数
Internet Protocol learning record
Jenkins learning (II) -- setting up Chinese
[kotlin learning] classes, objects and interfaces - classes with non default construction methods or attributes, data classes and class delegates, object keywords
Excel is not as good as jnpf form for 3 minutes in an hour. Leaders must praise it when making reports like this!
Common penetration test range
【点云处理之论文狂读经典版14】—— Dynamic Graph CNN for Learning on Point Clouds
LeetCode 30. Concatenate substrings of all words
Too many open files solution
Digital management medium + low code, jnpf opens a new engine for enterprise digital transformation
Jenkins learning (I) -- Jenkins installation
[untitled] use of cmake
Use the interface colmap interface of openmvs to generate the pose file required by openmvs mvs
Wonderful review | i/o extended 2022 activity dry goods sharing
2022-2-14 learning xiangniuke project - generate verification code
Temper cattle ranking problem
Education informatization has stepped into 2.0. How can jnpf help teachers reduce their burden and improve efficiency?
AcWing 788. 逆序对的数量