当前位置：网站首页>what? It's amazing that you can read the whole comic book for free. You can't learn to be a money saver together

what? It's amazing that you can read the whole comic book for free. You can't learn to be a money saver together

2022-06-29 11:05:00 【The devil will not cry】

Preface

Hi. , Hello, everyone , This is the demon king ~

I believe many people have read the cartoon , Today, let's take a look at this website

This website , If you are a new user, I will send you 15 Days of vip

Insert picture description here

We got it , We can climb down all the cartoons we want to see , Look slowly ~

Don't talk much , Let's just start rolling the code

Catalog （ You can click on the place you want to see ）

Preface
- - - This code is provided by : Qingdeng Education - Self visiting teacher
Environment use :
Module USES :
Basic ideas and processes < Universal >:
- - - One . Data source analysis
    - Two . Code implementation steps process
  - Collect a comic book
Code
Tail language

This code is provided by : Qingdeng Education - Self visiting teacher

Environment use :

Python 3.8
Pycharm

Try to keep the version consistent ~

Module USES :

requests >>> pip install requests
parsel >>> pip install parsel

If installed python Third-party module :

win + R Input cmd Click ok , Enter the installation command pip install Module name (pip install requests) enter
stay pycharm Click on the Terminal( terminal ) Enter the installation command

Installation failure reason :

Failure one : pip Not an internal command

resolvent : Set the environment variable

Failure two : There are a lot of red reports (read time out)

resolvent : Because the network link timed out , You need to switch the mirror source

for example ：pip3 install -i https://pypi.doubanio.com/simple/ Module name

Failure three : cmd It shows that it has been installed , Or the installation is successful , But in pycharm It still can't be imported

resolvent : Multiple... May be installed python edition (anaconda perhaps python Just install one ) Just uninstall one

Or you pycharm Inside python The interpreter is not set

How to configure pycharm Inside python Interpreter ?

choice file( file ) >>> setting( Set up ) >>> Project( project ) >>> python interpreter(python Interpreter )
Click on the gear , choice add
add to python The installation path

pycharm How to install plug-ins ?

choice file( file ) >>> setting( Set up ) >>> Plugins( plug-in unit )
Click on Marketplace Enter the name of the plug-in you want to install such as : Translation plug-ins Input translation / Chinese plug-in Input Chinese
Select the corresponding plug-in and click install( install ) that will do
After successful installation Yes, it will pop up restart pycharm The option to Click ok , Restart to take effect

Basic ideas and processes < Universal >:

One . Data source analysis

Clear requirements
Through developer tools for packet capture analysis , analysis manhua Where does the data content come from

a sheet manhua picture <url Address > ----> Get all of this chapter manhua Where does the content come from

Two . Code implementation steps process

Send a request , For the image data packet just analyzed url Address send request
get data , Get the response data returned by the server response
Parsing data , Extract all manhau picture url Address
Save the data , hold manhua Save contents to local folder

Collect a comic book

Collect multiple chapters manhua Content —> To find more manhau Data packets url Address —> Analysis request url Address parameter change —> chapter ID change

Just get all manhua chapter ID That's all right. —> All directory pages List page To analyze and find

Send a request , about manhau The directory page sends a request
get data , Get the response data returned by the server response
Parsing data , Extract all manhua chapter ID as well as manhua title

Code

Due to the audit mechanism , I deleted some things from the website , Xiao Kenai can add it by themselves , It's easy

There are two more words , I used Pinyin instead of , You can change back to the text ~

If there is a little lazy or not able to change, Xiao Kenai can also confide in me , I sent you ~

（ Or view and click on the homepage （ article ） The mobile text on the left is free ~（ You may need to row down ））

The import module

#  Import data request module 
import requests
#  Import format output module 
import pprint
#  Import data analysis module 
import parsel
#  Import file operation module 
import os

Determine web address

link = ''

Add camouflage

# headers Request header camouflage 
headers = {
    
    # user-agent:  The user agent   Represents the basic identity of the browser 
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
}

Send a request

response_1 = requests.get(url=link, headers=headers)

get data

# print(response_1.text)
#  Parsing data   What is it like to get data ,  Choose the most appropriate parsing method 
selector = parsel.Selector(response_1.text)
lis = selector.css('.chapter__list-box .j-chapter-item')

Get the name

name = selector.css('.de-info__box .comic-title::text').get()

Automatically create files

filename = f'{
      name}\\'
if not os.path.exists(filename):
    os.mkdir(filename)

for li in list(reversed(lis)):
    chapter_id = li.css('a::attr(data-chapterid)').get()
    chapter_title = li.css('a::text').getall()[-1].strip()
    print(chapter_id, chapter_title)

Send a request , Simulate browser for url Address send request

What follows the question mark , All belong to this url Request parameters for , You can use the dictionary alone to accept

use python Code simulation browser , It is necessary to use headers Request header —> You can copy and paste in the developer tool

user-agent: The user agent Represents the basic identity of the browser

How to quickly replace in batches :

Select the content to replace ctrl + R Enter the regular expression command , Click Replace All

 (.*?): (.*)
 '$1': '$2',

request url Address

—> Copy and paste

    # https://comic..com/chapter/content/v1/?chapter_id=996914&comic_id=211471&format=1&quality=1&sign=c2f14c1bdb0505254416907f504b4e03&type=1&uid=55123713
    url = ''

Request parameters

—> Copy and paste

    data = {
    
        'chapter_id': chapter_id,
        'comic_id': '211471',
        'format': '1',
        'quality': '1',
        'sign': 'c2f14c1bdb0505254416907f504b4e03',
        'type': '1',
        'uid': '55123713',
    }

Request header

To disguise python Code —> Copy and paste

    headers = {
    
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36'
    }

Send a request

    response = requests.get(url=url, params=data, headers=headers)
    # <Response [200]>  The response object , 200 Status code   Indicates that the request was successful 
    print(response)
    #  get data ,  Get the response data returned by the server 
```python
    # response.text  Get text data < data type :  character string > response.json()  obtain json Dictionary data < data type :  Dictionaries >
    print(response.json())

Parsing data

— > What is it like to get data , Choose the most appropriate parsing method Dictionary values , Extract data contents according to key value pairs

According to the content to the left of the colon [ key ], Extract the content to the right of the colon [ value ] —> Key value pair value Key value pairs are separated by commas

    image_list = response.json()['data']['page']  #  list 

    num = 1
    for image in image_list:  #  You can put the list < A box for things > The elements inside ,  One by one 
        img_url =image['image']
        print(img_url)

Save the data

—> It is also necessary to correct the picture url Address send request , And get its data content response.content Get binary data

        img_content = requests.get(url=img_url, headers=headers).content
        #  Save the data ,  Save the picture  shipin  Audio   Specific format files <zip ppt..>  Get binary data content 
        # mode Mode saving method  wb w write in  b Binary system  wb Write in binary mode 
        with open(filename + chapter_title + str(num) + '.jpg', mode='wb') as f:
            f.write(img_content)
        num += 1