当前位置:网站首页>what? It's amazing that you can read the whole comic book for free. You can't learn to be a money saver together
what? It's amazing that you can read the whole comic book for free. You can't learn to be a money saver together
2022-06-29 11:05:00 【The devil will not cry】
Preface
Hi. , Hello, everyone , This is the demon king ~

I believe many people have read the cartoon , Today, let's take a look at this website
This website , If you are a new user, I will send you 15 Days of vip

We got it , We can climb down all the cartoons we want to see , Look slowly ~
Don't talk much , Let's just start rolling the code
Catalog ( You can click on the place you want to see )

This code is provided by : Qingdeng Education - Self visiting teacher
Environment use :
- Python 3.8
- Pycharm
Try to keep the version consistent ~
Module USES :
- requests >>> pip install requests
- parsel >>> pip install parsel
If installed python Third-party module :
- win + R Input cmd Click ok , Enter the installation command pip install Module name (pip install requests) enter
- stay pycharm Click on the Terminal( terminal ) Enter the installation command
Installation failure reason :
Failure one : pip Not an internal command
resolvent : Set the environment variable
Failure two : There are a lot of red reports (read time out)
resolvent : Because the network link timed out , You need to switch the mirror source
for example :pip3 install -i https://pypi.doubanio.com/simple/ Module name
Failure three : cmd It shows that it has been installed , Or the installation is successful , But in pycharm It still can't be imported
resolvent : Multiple... May be installed python edition (anaconda perhaps python Just install one ) Just uninstall one
Or you pycharm Inside python The interpreter is not set

How to configure pycharm Inside python Interpreter ?
- choice file( file ) >>> setting( Set up ) >>> Project( project ) >>> python interpreter(python Interpreter )
- Click on the gear , choice add
- add to python The installation path
pycharm How to install plug-ins ?
- choice file( file ) >>> setting( Set up ) >>> Plugins( plug-in unit )
- Click on Marketplace Enter the name of the plug-in you want to install such as : Translation plug-ins Input translation / Chinese plug-in Input Chinese
- Select the corresponding plug-in and click install( install ) that will do
- After successful installation Yes, it will pop up restart pycharm The option to Click ok , Restart to take effect

Basic ideas and processes < Universal >:
One . Data source analysis
- Clear requirements
- Through developer tools for packet capture analysis , analysis manhua Where does the data content come from
a sheet manhua picture <url Address > ----> Get all of this chapter manhua Where does the content come from
Two . Code implementation steps process
- Send a request , For the image data packet just analyzed url Address send request
- get data , Get the response data returned by the server response
- Parsing data , Extract all manhau picture url Address
- Save the data , hold manhua Save contents to local folder
Collect a comic book
Collect multiple chapters manhua Content —> To find more manhau Data packets url Address —> Analysis request url Address parameter change —> chapter ID change
Just get all manhua chapter ID That's all right. —> All directory pages List page To analyze and find
- Send a request , about manhau The directory page sends a request
- get data , Get the response data returned by the server response
- Parsing data , Extract all manhua chapter ID as well as manhua title

Code
Due to the audit mechanism , I deleted some things from the website , Xiao Kenai can add it by themselves , It's easy
There are two more words , I used Pinyin instead of , You can change back to the text ~
If there is a little lazy or not able to change, Xiao Kenai can also confide in me , I sent you ~
( Or view and click on the homepage ( article ) The mobile text on the left is free ~( You may need to row down ))
The import module
# Import data request module
import requests
# Import format output module
import pprint
# Import data analysis module
import parsel
# Import file operation module
import os
Determine web address
link = ''
Add camouflage
# headers Request header camouflage
headers = {
# user-agent: The user agent Represents the basic identity of the browser
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
}
Send a request
response_1 = requests.get(url=link, headers=headers)
get data
# print(response_1.text)
# Parsing data What is it like to get data , Choose the most appropriate parsing method
selector = parsel.Selector(response_1.text)
lis = selector.css('.chapter__list-box .j-chapter-item')
Get the name
name = selector.css('.de-info__box .comic-title::text').get()
Automatically create files
filename = f'{
name}\\'
if not os.path.exists(filename):
os.mkdir(filename)
for li in list(reversed(lis)):
chapter_id = li.css('a::attr(data-chapterid)').get()
chapter_title = li.css('a::text').getall()[-1].strip()
print(chapter_id, chapter_title)
Send a request , Simulate browser for url Address send request
What follows the question mark , All belong to this url Request parameters for , You can use the dictionary alone to accept
use python Code simulation browser , It is necessary to use headers Request header —> You can copy and paste in the developer tool
user-agent: The user agent Represents the basic identity of the browser
How to quickly replace in batches :
Select the content to replace ctrl + R Enter the regular expression command , Click Replace All
(.*?): (.*)
'$1': '$2',
request url Address
—> Copy and paste
# https://comic..com/chapter/content/v1/?chapter_id=996914&comic_id=211471&format=1&quality=1&sign=c2f14c1bdb0505254416907f504b4e03&type=1&uid=55123713
url = ''
Request parameters
—> Copy and paste
data = {
'chapter_id': chapter_id,
'comic_id': '211471',
'format': '1',
'quality': '1',
'sign': 'c2f14c1bdb0505254416907f504b4e03',
'type': '1',
'uid': '55123713',
}
Request header
To disguise python Code —> Copy and paste
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36'
}
Send a request
response = requests.get(url=url, params=data, headers=headers)
# <Response [200]> The response object , 200 Status code Indicates that the request was successful
print(response)
# get data , Get the response data returned by the server
```python
# response.text Get text data < data type : character string > response.json() obtain json Dictionary data < data type : Dictionaries >
print(response.json())
Parsing data
— > What is it like to get data , Choose the most appropriate parsing method Dictionary values , Extract data contents according to key value pairs
According to the content to the left of the colon [ key ], Extract the content to the right of the colon [ value ] —> Key value pair value Key value pairs are separated by commas
image_list = response.json()['data']['page'] # list
num = 1
for image in image_list: # You can put the list < A box for things > The elements inside , One by one
img_url =image['image']
print(img_url)
Save the data
—> It is also necessary to correct the picture url Address send request , And get its data content response.content Get binary data
img_content = requests.get(url=img_url, headers=headers).content
# Save the data , Save the picture shipin Audio Specific format files <zip ppt..> Get binary data content
# mode Mode saving method wb w write in b Binary system wb Write in binary mode
with open(filename + chapter_title + str(num) + '.jpg', mode='wb') as f:
f.write(img_content)
num += 1
Tail language
There is no fast track to success , There is no highway to happiness .
All the successes , All come from tireless efforts and running , All happiness comes from ordinary struggle and persistence
—— Inspirational quotes
This article is finished ~ Interested partners can copy the code to try
Your support is my biggest motivation !! Remember Sanlian ~ Welcome to read previous articles ~

边栏推荐
- 【C语言进阶】动态内存管理
- FreeRTOS porting of official website based on keil5 auto configuring STM32F103 standard library
- Google Earth Engine(GEE)——GEDI L2A Vector Canopy Top Height (Version 2) 全球生态系统数据集
- Stm32f1 and stm32subeide programming example - ultrasonic distance sensor drive
- Shell 中你不得不熟知的变量运用
- 深入浅出总结Flink运行时架构
- Does your project need automated testing?
- Reids设计与实现
- Using EasyX configuration in clion
- MySQL查询时如何找出错误格式的手机号
猜你喜欢

【C语言进阶】字符串和内存函数(二)

Daily question brushing record (VII)

CS231n-2022 Module1: 神经网络要点概述(2)
![[200 opencv routines] 214 Detailed explanation of drawing ellipse parameters](/img/d2/807095d8ebf563915f0674f0992037.png)
[200 opencv routines] 214 Detailed explanation of drawing ellipse parameters

BS-GX-017基于SSM实现的在线考试管理系统

How to obtain method parameter values through WinDbg

在线SQL转HTMLTable工具

任职 22 年,PowerShell 之父将从微软离职:曾因开发 PowerShell 被微软降级过

历史上的今天:马斯克出生;微软推出 Office 365;蔡氏电路的发明者出生

30-year-old female, ordinary software testing Yuanyuan, confused and anxious about her career
随机推荐
[FreeRTOS] 08 mutex semaphores and priority inversion
【C语言进阶】自定义类型
Numeric Keypad
美国EB-5移民再现利好,区域中心再授权政策被叫停
np. astype()
在线文本过滤小于指定长度工具
math_数学表达式&等式方程的变形&组合操作技巧/手段积累
EasyDSS部署在C盘,录像回看无法正常播放该如何解决?
Using EasyX configuration in clion
misc3~7
2600 pages in total! Another divine interview manual is available~
Lizuofan, co-founder of nonconvex: Taking quantification as his lifelong career
With this tool, automatic identification and verification code is no longer a problem
Does your project need automated testing?
Cs231n-2022 module1: overview of key points of neural network (2)
Essential for efficient work: how can testers improve their communication skills?
The last 48 hours! The cloud XR theme competition invites you to bloom together. See you at the competition!
Stm32f1 and stm32subeide programming example - ultrasonic distance sensor drive
FreeRTOS porting of official website based on keil5 auto configuring STM32F103 standard library
[digital signal modulation] realize signal modulation and demodulation based on am+fm+dsb+ssb, including Matlab source code