当前位置:网站首页>Reptile 01 basic principles of reptile
Reptile 01 basic principles of reptile
2022-07-05 18:26:00 【Twinkling stars】
List of articles
1. What is a reptile
An automated program that requests websites and extracts data .
2. Basic flow of crawler
- Initiate request
- Get response content
- Parsing content
- Save the data
3. What is? request And Response
3.1 Request What is included in
Request mode : There are mainly get post
* View request mode
* Type of request method
HTTP/1.0
1. GET: Prefer the way of acquisition
Most of them give back-end parameters , Used to get some columns of data
2. POST: Prefer to give the server some data
Most of them log in , Give the server some information , You give me a simple result
3. PUT: Prefer to give the server some information , But it is to add and use
Most do registration , Give the server some information , You save this information
4. HEAD: Used to get server header information
HTTP/1.1
5. DELETE: Prefer to delete
Mostly delete comments , Delete micro-blog
6. CONNECT: Pipe connection changes proxy connection usage 【 Not commonly used 】
7. PATCH: Prefer to give the server some information , Prefer to modify some information
Most of them are used to improve user data
8. OPTIONS: Used to obtain server performance , However, the server's consent is required
* get And post The difference between
request URL
URL: Uniform resource locator
Request header
Requested configuration information
Request body
The request body is usually in get There is no request body
stay post When the way , With form data In the form of , Including login information .
3.2Response What does it contain
Response state
Status code : 200 success , 300: Jump 404: Can't find
Response head
Response body
preview Internal content
4 Instance Introduction
import requests
r = requests.get('https://www.baidu.com/')
print(type(r))
print(r.status_code)
print(type(r.text))
print(r.text)
print(r.cookies)
Output :
<class 'requests.models.Response'>
200
<class'str'>
<html>
<head>
<script>
location.replace(location.href.replace("https://","http://"));
</script>
</head>
<body>
<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>
<RequestsCookieJar[<Cookie BIDUPSID=992C3B26F4C4D09505C5E959D5FBC005 for .baidu.com/>, <Cookie
PSTM=1472227535 for .baidu.com/>, <Cookie __bsi=15304754498609545148_00_40_N_N_2_0303_C02F_N_N_N_0
for .www.baidu.com/>, <Cookie BD_NOT_HTTPS=1 for www.baidu.com/>]>
Respectively output Response The type of 、 Status code 、 The type of response body 、 Content and Cookies.
Running results show that , Its return type is requests.models.Response, The type of response body is string str,Cookies The type is RequestsCookieJar.
Use get Method successfully implements a GET request , That's nothing , The more convenient thing is that other request types can still be completed in one sentence , Examples are as follows :
r = requests.post('http://httpbin.org/post')
r = requests.put('http://httpbin.org/put')
r = requests.delete('http://httpbin.org/delete')
r = requests.head('http://httpbin.org/get')
r = requests.options('http://httpbin.org/get')
5 What kind of data can be grasped
6 How to analyze
7 Why do I catch something different from what I see in the browser
What we got was Javascript Format , Need to carry out ajax Equal resolution
8 How to solve JavaScript The problem of rendering
9 How to save data
边栏推荐
- [QNX Hypervisor 2.2用户手册]6.3.2 配置VM
- U-Net: Convolutional Networks for Biomedical Images Segmentation
- Is it safe for golden sun to open an account? Can I open an account free of 5 in case?
- Huaxia Fund: sharing of practical achievements of digital transformation in the fund industry
- Fix vulnerability - mysql, ES
- Is it safe to open an account, register and dig money? Is there any risk? Is it reliable?
- Nacos distributed transactions Seata * * install JDK on Linux, mysql5.7 start Nacos configure ideal call interface coordination (nanny level detail tutorial)
- 彻底理解为什么网络 I/O 会被阻塞?
- 【PaddlePaddle】 PaddleDetection 人脸识别 自定义数据集
- 怎么自动安装pythn三方库
猜你喜欢
Memory management chapter of Kobayashi coding
buuctf-pwn write-ups (9)
Sophon autocv: help AI industrial production and realize visual intelligent perception
About Estimation with Cross-Validation
FCN: Fully Convolutional Networks for Semantic Segmentation
Sophon Base 3.1 推出MLOps功能,为企业AI能力运营插上翅膀
nano的CAN通信
破解湖+仓混合架构顽疾,星环科技推出自主可控云原生湖仓一体平台
SAP 特征 特性 说明
rust统计文件中单词出现的次数
随机推荐
吳恩達團隊2022機器學習課程,來啦
Access the database and use redis as the cache of MySQL (a combination of redis and MySQL)
[utiliser Electron pour développer le Bureau sur youkirin devrait]
Sophon KG升级3.1:打破数据间壁垒,解放企业生产力
FCN: Fully Convolutional Networks for Semantic Segmentation
Introduction to Resampling
Is it safe for golden sun to open an account? Can I open an account free of 5 in case?
Eliminate the writing of 'if () else{}'
Generate classes from XML schema
记录Pytorch中的eval()和no_grad()
【PaddleClas】常用命令
Find the first k small element select_ k
关于服装ERP,你想知道的都在这里了
The 2022 China Xinchuang Ecological Market Research and model selection evaluation report released that Huayun data was selected as the mainstream manufacturer of Xinchuang IT infrastructure!
The 10th global Cloud Computing Conference | Huayun data won the "special contribution award for the 10th anniversary of 2013-2022"
在通达信上做基金定投安全吗?
Use of print function in MATLAB
The 11th China cloud computing standards and Applications Conference | cloud computing national standards and white paper series release, and Huayun data fully participated in the preparation
图像分类,看我就够啦!
破解湖+仓混合架构顽疾,星环科技推出自主可控云原生湖仓一体平台