当前位置:网站首页>Requests library simple method usage notes
Requests library simple method usage notes
2022-07-29 08:33:00 【Mr match】
List of articles
1 urllib
Simply understand this part
A simple demo
from urllib import request
response=request.urlopen('http://www.baidu.com')
response.read().decode('utf8')
Four modules
- urllib.request - Open and read URL.
- urllib.error - contain urllib.request Exception thrown .
- urllib.parse - analysis URL.
- urllib.robotparser - analysis robots.txt file .
Mainly introduce several methods that you are not familiar with or commonly used .
1.1.1 urllib.request.Request
request.Request(
url,# Will pass
data=None, # This part must use bytes( Byte stream )
headers={
},
origin_req_host=None,
unverifiable=False,
method=None,
)
demo
from urllib import request,parse
url='http://httpbin.org/post'
headers={
'User-Agent':'Mozilla/5.0 (Linux; Android 8.0.0; SM-G955U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36 Edg/103.0.5060.66',
'Host':'httpbin.org'
}
dict={
'name':'test_user'
}
data=bytes(parse.urlencode(dict),encoding='utf8') # First convert the dictionary type into the encoding format .
req=request.Request(url=url,data=data,headers=headers,method='POST')
response=request.urlopen(req)
print(response.read().decode('utf-8'))
#output
""" { "args": {}, "data": "", "files": {}, "form": { # This part enables us to upload content "name": "test_user" }, "headers": { "Accept-Encoding": "identity", "Content-Length": "14", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "Mozilla/5.0 (Linux; Android 8.0.0; SM-G955U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36 Edg/103.0.5060.66", "X-Amzn-Trace-Id": "Root=1-62c7e429-2dfdef7313887be20022ab31" }, "json": null, "origin": "60.168.149.12", "url": "http://httpbin.org/post" } """
Can also pass add_header() To add .
req.add_header('User-Agent','XXXXX')
1.2 Handler class
It is mainly used for some other advanced operations (cookie、 Agent processing, etc )
- baseHandle class
Supplement when necessary
1.3 exception handling urllib.error
urllib.error The module is urllib.request The exception thrown defines the exception class , The basic exception class is URLError.
urllib.error It contains two methods ,URLError and HTTPError.
# URLError There's only one property reason
from urllib import request,error
try:
response=request.urlopen('https://baidumatches999.com')
except error.URLError as e:
print(e.reason)
HTTPError yes URLError Subclasses of , Has three properties , Namely
- code: Status code
- reason
- headers
from urllib import request,error
try:
response=request.urlopen('https://baidu.com/test.htm')
except error.HTTPError as e:
print(e.reason,e.code,e.headers,sep='\n')
""" Not Found 404 Content-Length: 206 Content-Type: text/html; charset=iso-8859-1 Date: Fri, 08 Jul 2022 08:24:56 GMT Server: Apache Connection: close """
1.4 urllib.prase
URL Divided into six parts (scheme( agreement )、netloc( domain name )、path(‘ Access path ’)、params(‘ Parameters ’),query(‘ Query criteria ’))、fragment(‘ Anchor point ’)
scheme://netloc/path;params?query#fragment
Mainly used for processing URL, Split , Merger, etc .
urlprase(): Identification and segmentation
- urlstring: mandatory , To be resolved URL
- scheme: agreement , Default http
- allow_fragments: Whether to ignore fragment
urlprase():
urlsplit():
urlunsplit()
urljoin()
parse_qsl()
quote(): Translate content into URL Encoding format
# parse
from urllib.parse import quote
url="http://www.baidu.com/"+quote(' Hello ')
url
""" 'http://www.baidu.com/%E4%BD%A0%E5%A5%BD' """
- unquote(): decode
1.5 Robots agreement
robots The protocol is also called crawler Protocol 、 Crawler rules, etc , It means that a website can be established robots.txt File to tell search engines which pages can be crawled , Which pages can't be crawled , Search engines read robots.txt File to identify whether this page is allowed to be crawled . however , This robots The protocol is not a firewall , There is no enforcement , Search engines can be completely ignored robots.txt File to grab a snapshot of the web page **.**
robotparser Module to parse robots.txt
2 Requests modular
requests In the implementation of certain operations , More convenient , The similarities and differences between the two methods are compared through the same example below .
2.1 Basic usage
The most used is get()
import requests
url='XXXX'
r=requests.get(url)
Can also pass post、put、delete Implement the corresponding request .
- json
import requests
r=requests.get('http://httpbin.org/get')
print(r.text)
print(r.json()) # Return to json Format
""" Output { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.22.0", "X-Amzn-Trace-Id": "Root=1-62c80370-7ed4a0a10c7b27106031d457" }, "origin": "60.168.149.12", "url": "http://httpbin.org/get" } {'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.22.0', 'X-Amzn-Trace-Id': 'Root=1-62c80370-7ed4a0a10c7b27106031d457'}, 'origin': '60.168.149.12', 'url': 'http://httpbin.org/get'} """
2.2 Request header headers
There are many websites , If we don't add the request header , You may not get the content .
For example, the Zhihu discovery page below 
import requests
r=requests.get('https://www.zhihu.com/explore')
print(r.text)
""" Output <html> <head><title>403 Forbidden</title></head> <body bgcolor="white"> <center><h1>403 Forbidden</h1></center> <hr><center>openresty</center> </body> </html> """
After adding the request header, you can get the content normally .
import requests
headers={
'User-Agent':'Mozilla/5.0 (Linux; Android 8.0.0; SM-G955U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36 Edg/103.0.5060.66'
}
r=requests.get('https://www.zhihu.com/explore',headers=headers)
print(r.text)
Of course, the request header has other properties to master , such as Referer,Cookie You also need to understand , Reverse climbing is sometimes useful .
Refer to the previous post :【 Reptiles 】Web Basics —— Response head 、 Request header 、http&https、 Status code
2.3 post
def post(url, data=None, json=None, **kwargs):
r"""Sends a POST request. :param url: URL for the new :class:`Request` object. :param data: (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) json data to send in the body of the :class:`Request`. :param \*\*kwargs: Optional arguments that ``request`` takes. :return: :class:`Response <Response>` object :rtype: requests.Response """
demo Let's not list , Data is submitted by hitting him . The parameters in the document are similar to request similar , Below request Some attributes in can be used , such as file( We can upload files through this )
# Upload files
import requests
file={
'file':open("./data/test.png",'rb')}
r=requests.post('http://httpbin.org/post',files=file)
print(r.text)
2.4 requests.request
We can look at it Request This class , What are the specific attributes .
def request(method, url, **kwargs):
"""Constructs and sends a :class:`Request <Request>`. :param method: method for the new :class:`Request` object. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary, list of tuples or bytes to send in the query string for the :class:`Request`. :param data: (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`. :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload. ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')`` or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers to add for the file. :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth. :param timeout: (optional) How many seconds to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``. :type allow_redirects: bool :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy. :param verify: (optional) Either a boolean, in which case it controls whether we verify the server's TLS certificate, or a string, in which case it must be a path to a CA bundle to use. Defaults to ``True``. :param stream: (optional) if ``False``, the response content will be immediately downloaded. :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair. :return: :class:`Response <Response>` object :rtype: requests.Response Usage:: >>> import requests >>> req = requests.request('GET', 'https://httpbin.org/get') <Response [200]> """
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
2.5 cookie
Cookies Refers to some websites in order to identify users , Carry out drawing tracking II data stored in the user's local terminal . It can be used to maintain the session state .
When the client first requests the server , return set-cookie Field response , The client browser will store Cookie Information .
On the second visit , The client browser will submit the request to the server , The server determines the session state .
import requests
r=requests.get('http://www.baidu.com')
cookie=r.cookies
for key,value in cookies.items():
print(key+':'+value)
""" BAIDUID:974A BIDUPSID:974A9F PSTM:1657 """
2.6 Conversation maintenance
introduce Session object , Maintain the same session .( The difference is how many browsers )
- Demo1: It is equivalent to accessing through two different browsers
import requests
requests.get('http://httpbin.org/cookies/set/number/1234567')
r=requests.get('http://httpbin.org/cookies')
print(r.text)
"""output { "cookies": {} } """
You can also try to access these two with a browser url, Better understanding .
- Demo2:Session object
import requests
sess=requests.session()
sess.get('http://httpbin.org/cookies/set/number/1234567')
r=sess.get('http://httpbin.org/cookies')
print(r.text)
"""Output { "cookies": { "number": "1234567" } } """
2.7 Prepared Request object
Request object , Just look at the source code directly in this part , With this Request Objects can treat requests as independent objects . Write a little below demo.
from requests import Request,Session
url='http://www.baidu.com'
headers={
}
s=Session()
req=Request('get',url) #headers coca
prepare=s.prepare_request(req)
r=s.send(prepare)
print(r.text)
You can try to understand and supplement prepare() And Prepared_Request() The difference between .
Advanced Usage — Requests 2.28.1 documentation There is... In the document , But I didn't make it clear , Later use and supplement .
边栏推荐
- Solve the problem of MSVC2017 compiler with yellow exclamation mark in kits component of QT
- What is the working principle of the noise sensor?
- Four pin OLED display based on stm32
- leetcode hot 100(刷题篇9)(301/45/517/407/offer62/MST08.14/)
- Hal learning notes - Advanced timer of 7 timer
- Arfoundation starts from scratch 5-ar image tracking
- 【OpenCV】-算子(Sobel、Canny、Laplacian)学习
- Brief introduction and use of commonjs import and export and ES6 modules import and export
- Centos7/8 command line installation Oracle11g
- What if official account does not support markdown format file preparation?
猜你喜欢

Lesson 3 threejs panoramic preview room case

7.3-function-templates

Windows 安装 MySQL 5.7详细步骤

Cv520 domestic replacement of ci521 13.56MHz contactless reader chip

集群使用规范

Clion+opencv+aruco+cmake configuration

node:文件写入数据(readFile、writeFile),覆盖与增量两种模式

7.2-function-overloading

C language macro define command exercise

DC motor speed regulation system based on 51 single chip microcomputer (use of L298)
随机推荐
Hal learning notes - Advanced timer of 7 timer
PostgreSQL手动创建HikariDataSource解决报错Cannot commit when autoCommit is enabled
The computer video pauses and resumes, and the sound suddenly becomes louder
Phy6252 is an ultra-low power Bluetooth wireless communication chip for the Internet of things
[opencv] - Operator (Sobel, canny, Laplacian) learning
数字人民币时代隐私更安全
Lesson 3 threejs panoramic preview room case
Low cost 2.4GHz wireless transceiver chip -- ci24r1
Clickhouse learning (I) Clickhouse?
7.3-function-templates
Transaction management in SQL Server
Second week of postgraduate freshman training: convolutional neural network foundation
深度学习(2):图片文字识别
6.3 references
Arfoundation starts from scratch 5-ar image tracking
深度学习(1):银行客户流失预测
Normal visualization
What if official account does not support markdown format file preparation?
commonjs导入导出与ES6 Modules导入导出简单介绍及使用
Charging pile charging technology new energy charging pile development