当前位置:网站首页>what? It's amazing that you can read the whole comic book for free. You can't learn to be a money saver together
what? It's amazing that you can read the whole comic book for free. You can't learn to be a money saver together
2022-06-29 11:05:00 【The devil will not cry】
Preface
Hi. , Hello, everyone , This is the demon king ~

I believe many people have read the cartoon , Today, let's take a look at this website
This website , If you are a new user, I will send you 15 Days of vip

We got it , We can climb down all the cartoons we want to see , Look slowly ~
Don't talk much , Let's just start rolling the code
Catalog ( You can click on the place you want to see )

This code is provided by : Qingdeng Education - Self visiting teacher
Environment use :
- Python 3.8
- Pycharm
Try to keep the version consistent ~
Module USES :
- requests >>> pip install requests
- parsel >>> pip install parsel
If installed python Third-party module :
- win + R Input cmd Click ok , Enter the installation command pip install Module name (pip install requests) enter
- stay pycharm Click on the Terminal( terminal ) Enter the installation command
Installation failure reason :
Failure one : pip Not an internal command
resolvent : Set the environment variable
Failure two : There are a lot of red reports (read time out)
resolvent : Because the network link timed out , You need to switch the mirror source
for example :pip3 install -i https://pypi.doubanio.com/simple/ Module name
Failure three : cmd It shows that it has been installed , Or the installation is successful , But in pycharm It still can't be imported
resolvent : Multiple... May be installed python edition (anaconda perhaps python Just install one ) Just uninstall one
Or you pycharm Inside python The interpreter is not set

How to configure pycharm Inside python Interpreter ?
- choice file( file ) >>> setting( Set up ) >>> Project( project ) >>> python interpreter(python Interpreter )
- Click on the gear , choice add
- add to python The installation path
pycharm How to install plug-ins ?
- choice file( file ) >>> setting( Set up ) >>> Plugins( plug-in unit )
- Click on Marketplace Enter the name of the plug-in you want to install such as : Translation plug-ins Input translation / Chinese plug-in Input Chinese
- Select the corresponding plug-in and click install( install ) that will do
- After successful installation Yes, it will pop up restart pycharm The option to Click ok , Restart to take effect

Basic ideas and processes < Universal >:
One . Data source analysis
- Clear requirements
- Through developer tools for packet capture analysis , analysis manhua Where does the data content come from
a sheet manhua picture <url Address > ----> Get all of this chapter manhua Where does the content come from
Two . Code implementation steps process
- Send a request , For the image data packet just analyzed url Address send request
- get data , Get the response data returned by the server response
- Parsing data , Extract all manhau picture url Address
- Save the data , hold manhua Save contents to local folder
Collect a comic book
Collect multiple chapters manhua Content —> To find more manhau Data packets url Address —> Analysis request url Address parameter change —> chapter ID change
Just get all manhua chapter ID That's all right. —> All directory pages List page To analyze and find
- Send a request , about manhau The directory page sends a request
- get data , Get the response data returned by the server response
- Parsing data , Extract all manhua chapter ID as well as manhua title

Code
Due to the audit mechanism , I deleted some things from the website , Xiao Kenai can add it by themselves , It's easy
There are two more words , I used Pinyin instead of , You can change back to the text ~
If there is a little lazy or not able to change, Xiao Kenai can also confide in me , I sent you ~
( Or view and click on the homepage ( article ) The mobile text on the left is free ~( You may need to row down ))
The import module
# Import data request module
import requests
# Import format output module
import pprint
# Import data analysis module
import parsel
# Import file operation module
import os
Determine web address
link = ''
Add camouflage
# headers Request header camouflage
headers = {
# user-agent: The user agent Represents the basic identity of the browser
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
}
Send a request
response_1 = requests.get(url=link, headers=headers)
get data
# print(response_1.text)
# Parsing data What is it like to get data , Choose the most appropriate parsing method
selector = parsel.Selector(response_1.text)
lis = selector.css('.chapter__list-box .j-chapter-item')
Get the name
name = selector.css('.de-info__box .comic-title::text').get()
Automatically create files
filename = f'{
name}\\'
if not os.path.exists(filename):
os.mkdir(filename)
for li in list(reversed(lis)):
chapter_id = li.css('a::attr(data-chapterid)').get()
chapter_title = li.css('a::text').getall()[-1].strip()
print(chapter_id, chapter_title)
Send a request , Simulate browser for url Address send request
What follows the question mark , All belong to this url Request parameters for , You can use the dictionary alone to accept
use python Code simulation browser , It is necessary to use headers Request header —> You can copy and paste in the developer tool
user-agent: The user agent Represents the basic identity of the browser
How to quickly replace in batches :
Select the content to replace ctrl + R Enter the regular expression command , Click Replace All
(.*?): (.*)
'$1': '$2',
request url Address
—> Copy and paste
# https://comic..com/chapter/content/v1/?chapter_id=996914&comic_id=211471&format=1&quality=1&sign=c2f14c1bdb0505254416907f504b4e03&type=1&uid=55123713
url = ''
Request parameters
—> Copy and paste
data = {
'chapter_id': chapter_id,
'comic_id': '211471',
'format': '1',
'quality': '1',
'sign': 'c2f14c1bdb0505254416907f504b4e03',
'type': '1',
'uid': '55123713',
}
Request header
To disguise python Code —> Copy and paste
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36'
}
Send a request
response = requests.get(url=url, params=data, headers=headers)
# <Response [200]> The response object , 200 Status code Indicates that the request was successful
print(response)
# get data , Get the response data returned by the server
```python
# response.text Get text data < data type : character string > response.json() obtain json Dictionary data < data type : Dictionaries >
print(response.json())
Parsing data
— > What is it like to get data , Choose the most appropriate parsing method Dictionary values , Extract data contents according to key value pairs
According to the content to the left of the colon [ key ], Extract the content to the right of the colon [ value ] —> Key value pair value Key value pairs are separated by commas
image_list = response.json()['data']['page'] # list
num = 1
for image in image_list: # You can put the list < A box for things > The elements inside , One by one
img_url =image['image']
print(img_url)
Save the data
—> It is also necessary to correct the picture url Address send request , And get its data content response.content Get binary data
img_content = requests.get(url=img_url, headers=headers).content
# Save the data , Save the picture shipin Audio Specific format files <zip ppt..> Get binary data content
# mode Mode saving method wb w write in b Binary system wb Write in binary mode
with open(filename + chapter_title + str(num) + '.jpg', mode='wb') as f:
f.write(img_content)
num += 1
Tail language
There is no fast track to success , There is no highway to happiness .
All the successes , All come from tireless efforts and running , All happiness comes from ordinary struggle and persistence
—— Inspirational quotes
This article is finished ~ Interested partners can copy the code to try
Your support is my biggest motivation !! Remember Sanlian ~ Welcome to read previous articles ~

边栏推荐
- 【C语言进阶】文件操作(一)
- ModbusTCP协议网络学习型单路红外模块(中壳版)
- Nuc980 open source project 16- start from SPI flash (w25q128)
- 第12周实验---基于FPGA的VGA协议实现
- 添加通知公告,给在线用户发送通知
- Highly paid programmers & interview questions: how to ensure the data consistency between redis cache and database in series 117?
- zabbix监控mysql各项指标
- VI exit exit VIM applicable novice
- (JS)捕获错误(异常)
- 在线文本过滤小于指定长度工具
猜你喜欢
![[digital signal modulation] realize signal modulation and demodulation based on am+fm+dsb+ssb, including Matlab source code](/img/76/bcf0118c8eea2b45b47eda4a68d3fd.png)
[digital signal modulation] realize signal modulation and demodulation based on am+fm+dsb+ssb, including Matlab source code

Daily question brushing record (VII)

Cs231n-2022 module1: overview of key points of neural network (2)

How to obtain method parameter values through WinDbg

历史上的今天:马斯克出生;微软推出 Office 365;蔡氏电路的发明者出生

“AI x 科学计算”进行时,华为昇思 MindSpore 赛题火热开启,等你来!

在线文本过滤小于指定长度工具

极限导论总结

The encryption market has exploded one after another. Can Celsius avoid bankruptcy?

math_数学表达式&等式方程的变形&组合操作技巧/手段积累
随机推荐
Stm32f1 and stm32subeide programming example - ultrasonic distance sensor drive
(JS)数组去除重复
JS post download file
【FreeRTOS】08 互斥信号量、优先级反转问题
(JS)捕获错误(异常)
EasyDSS部署在C盘,录像回看无法正常播放该如何解决?
The last 48 hours! The cloud XR theme competition invites you to bloom together. See you at the competition!
Reids设计与实现
Several methods of enterprise competition analysis: SWOT, Porter five forces, pest "suggestions collection"
请问,flink sql 批任务,两表或多表join(inner join 或 outer join
(JS)职责链模式
Here comes the tutorial of datawhale recommendation system!
[digital signal modulation] realize signal modulation and demodulation based on am+fm+dsb+ssb, including Matlab source code
The encryption market has exploded one after another. Can Celsius avoid bankruptcy?
Daily question brushing record (VII)
高薪程序员&面试题精讲系列117之怎么保证Redis缓存与数据库的数据一致性?
Easydss is deployed on Disk C, and the video playback cannot be played normally. How to solve this problem?
Mysql获取表信息
在编写shell脚本时如何正确姿势地管理临时文件
(JS)数组排平(flat)