当前位置:网站首页>Reptile 01 basic principles of reptile
Reptile 01 basic principles of reptile
2022-07-05 18:26:00 【Twinkling stars】
List of articles
1. What is a reptile
An automated program that requests websites and extracts data .
2. Basic flow of crawler
- Initiate request
- Get response content
- Parsing content
- Save the data

3. What is? request And Response

3.1 Request What is included in

Request mode : There are mainly get post
* View request mode

* Type of request method
HTTP/1.0
1. GET: Prefer the way of acquisition
Most of them give back-end parameters , Used to get some columns of data
2. POST: Prefer to give the server some data
Most of them log in , Give the server some information , You give me a simple result
3. PUT: Prefer to give the server some information , But it is to add and use
Most do registration , Give the server some information , You save this information
4. HEAD: Used to get server header information
HTTP/1.1
5. DELETE: Prefer to delete
Mostly delete comments , Delete micro-blog
6. CONNECT: Pipe connection changes proxy connection usage 【 Not commonly used 】
7. PATCH: Prefer to give the server some information , Prefer to modify some information
Most of them are used to improve user data
8. OPTIONS: Used to obtain server performance , However, the server's consent is required
* get And post The difference between

request URL
URL: Uniform resource locator 
Request header
Requested configuration information 
Request body
The request body is usually in get There is no request body
stay post When the way , With form data In the form of , Including login information .
3.2Response What does it contain

Response state
Status code : 200 success , 300: Jump 404: Can't find
Response head
Response body
preview Internal content
4 Instance Introduction
import requests
r = requests.get('https://www.baidu.com/')
print(type(r))
print(r.status_code)
print(type(r.text))
print(r.text)
print(r.cookies)
Output :
<class 'requests.models.Response'>
200
<class'str'>
<html>
<head>
<script>
location.replace(location.href.replace("https://","http://"));
</script>
</head>
<body>
<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>
<RequestsCookieJar[<Cookie BIDUPSID=992C3B26F4C4D09505C5E959D5FBC005 for .baidu.com/>, <Cookie
PSTM=1472227535 for .baidu.com/>, <Cookie __bsi=15304754498609545148_00_40_N_N_2_0303_C02F_N_N_N_0
for .www.baidu.com/>, <Cookie BD_NOT_HTTPS=1 for www.baidu.com/>]>
Respectively output Response The type of 、 Status code 、 The type of response body 、 Content and Cookies.
Running results show that , Its return type is requests.models.Response, The type of response body is string str,Cookies The type is RequestsCookieJar.
Use get Method successfully implements a GET request , That's nothing , The more convenient thing is that other request types can still be completed in one sentence , Examples are as follows :
r = requests.post('http://httpbin.org/post')
r = requests.put('http://httpbin.org/put')
r = requests.delete('http://httpbin.org/delete')
r = requests.head('http://httpbin.org/get')
r = requests.options('http://httpbin.org/get')
5 What kind of data can be grasped

6 How to analyze

7 Why do I catch something different from what I see in the browser
What we got was Javascript Format , Need to carry out ajax Equal resolution
8 How to solve JavaScript The problem of rendering

9 How to save data

边栏推荐
- JVM third talk -- JVM performance tuning practice and high-frequency interview question record
- 星环科技数据安全管理平台 Defensor重磅发布
- Introduction to Resampling
- How to solve the error "press any to exit" when deploying multiple easycvr on one server?
- Check namespaces and classes
- Privacy computing helps secure data circulation and sharing
- 瀚升优品app翰林优商系统开发功能介绍
- buuctf-pwn write-ups (9)
- 小林coding的内存管理章节
- Is it safe for Apple mobile phone to speculate in stocks? Is it a fraud to get new debts?
猜你喜欢

Sophon kg upgrade 3.1: break down barriers between data and liberate enterprise productivity

Sophon AutoCV:助力AI工业化生产,实现视觉智能感知

彻底理解为什么网络 I/O 会被阻塞?

LeetCode 6109. Number of people who know the secret

Use JMeter to record scripts and debug

U-Net: Convolutional Networks for Biomedical Images Segmentation

node_exporter内存使用率不显示

Sophon autocv: help AI industrial production and realize visual intelligent perception

LeetCode 6109. 知道秘密的人数

nano的CAN通信
随机推荐
写作写作写作写作
Memory leak of viewpager + recyclerview
ViewPager + RecyclerView的内存泄漏
使用JMeter录制脚本并调试
The 11th China cloud computing standards and Applications Conference | cloud computing national standards and white paper series release, and Huayun data fully participated in the preparation
Writing writing writing
Isprs2022 / Cloud Detection: Cloud Detection with Boundary nets Boundary Networks Based Cloud Detection
Trust counts the number of occurrences of words in the file
U-Net: Convolutional Networks for Biomedical Images Segmentation
About Estimation with Cross-Validation
Eliminate the writing of 'if () else{}'
Cronab log: how to record the output of my cron script
RPC协议详解
【PaddleClas】常用命令
Vulnhub's darkhole_ two
小林coding的内存管理章节
图片数据不够?我做了一个免费的图像增强软件
Electron installation problems
Use of print function in MATLAB
[paddleclas] common commands