当前位置:网站首页>Requests library simple method usage notes
Requests library simple method usage notes
2022-07-29 08:33:00 【Mr match】
List of articles
1 urllib
Simply understand this part
A simple demo
from urllib import request
response=request.urlopen('http://www.baidu.com')
response.read().decode('utf8')
Four modules
- urllib.request - Open and read URL.
- urllib.error - contain urllib.request Exception thrown .
- urllib.parse - analysis URL.
- urllib.robotparser - analysis robots.txt file .
Mainly introduce several methods that you are not familiar with or commonly used .
1.1.1 urllib.request.Request
request.Request(
url,# Will pass
data=None, # This part must use bytes( Byte stream )
headers={
},
origin_req_host=None,
unverifiable=False,
method=None,
)
demo
from urllib import request,parse
url='http://httpbin.org/post'
headers={
'User-Agent':'Mozilla/5.0 (Linux; Android 8.0.0; SM-G955U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36 Edg/103.0.5060.66',
'Host':'httpbin.org'
}
dict={
'name':'test_user'
}
data=bytes(parse.urlencode(dict),encoding='utf8') # First convert the dictionary type into the encoding format .
req=request.Request(url=url,data=data,headers=headers,method='POST')
response=request.urlopen(req)
print(response.read().decode('utf-8'))
#output
""" { "args": {}, "data": "", "files": {}, "form": { # This part enables us to upload content "name": "test_user" }, "headers": { "Accept-Encoding": "identity", "Content-Length": "14", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "Mozilla/5.0 (Linux; Android 8.0.0; SM-G955U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36 Edg/103.0.5060.66", "X-Amzn-Trace-Id": "Root=1-62c7e429-2dfdef7313887be20022ab31" }, "json": null, "origin": "60.168.149.12", "url": "http://httpbin.org/post" } """
Can also pass add_header() To add .
req.add_header('User-Agent','XXXXX')
1.2 Handler class
It is mainly used for some other advanced operations (cookie、 Agent processing, etc )
- baseHandle class
Supplement when necessary
1.3 exception handling urllib.error
urllib.error The module is urllib.request The exception thrown defines the exception class , The basic exception class is URLError.
urllib.error It contains two methods ,URLError and HTTPError.
# URLError There's only one property reason
from urllib import request,error
try:
response=request.urlopen('https://baidumatches999.com')
except error.URLError as e:
print(e.reason)
HTTPError yes URLError Subclasses of , Has three properties , Namely
- code: Status code
- reason
- headers
from urllib import request,error
try:
response=request.urlopen('https://baidu.com/test.htm')
except error.HTTPError as e:
print(e.reason,e.code,e.headers,sep='\n')
""" Not Found 404 Content-Length: 206 Content-Type: text/html; charset=iso-8859-1 Date: Fri, 08 Jul 2022 08:24:56 GMT Server: Apache Connection: close """
1.4 urllib.prase
URL Divided into six parts (scheme( agreement )、netloc( domain name )、path(‘ Access path ’)、params(‘ Parameters ’),query(‘ Query criteria ’))、fragment(‘ Anchor point ’)
scheme://netloc/path;params?query#fragment
Mainly used for processing URL, Split , Merger, etc .
urlprase(): Identification and segmentation
- urlstring: mandatory , To be resolved URL
- scheme: agreement , Default http
- allow_fragments: Whether to ignore fragment
urlprase():
urlsplit():
urlunsplit()
urljoin()
parse_qsl()
quote(): Translate content into URL Encoding format
# parse
from urllib.parse import quote
url="http://www.baidu.com/"+quote(' Hello ')
url
""" 'http://www.baidu.com/%E4%BD%A0%E5%A5%BD' """
- unquote(): decode
1.5 Robots agreement
robots The protocol is also called crawler Protocol 、 Crawler rules, etc , It means that a website can be established robots.txt File to tell search engines which pages can be crawled , Which pages can't be crawled , Search engines read robots.txt File to identify whether this page is allowed to be crawled . however , This robots The protocol is not a firewall , There is no enforcement , Search engines can be completely ignored robots.txt File to grab a snapshot of the web page **.**
robotparser Module to parse robots.txt
2 Requests modular
requests In the implementation of certain operations , More convenient , The similarities and differences between the two methods are compared through the same example below .
2.1 Basic usage
The most used is get()
import requests
url='XXXX'
r=requests.get(url)
Can also pass post、put、delete Implement the corresponding request .
- json
import requests
r=requests.get('http://httpbin.org/get')
print(r.text)
print(r.json()) # Return to json Format
""" Output { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.22.0", "X-Amzn-Trace-Id": "Root=1-62c80370-7ed4a0a10c7b27106031d457" }, "origin": "60.168.149.12", "url": "http://httpbin.org/get" } {'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.22.0', 'X-Amzn-Trace-Id': 'Root=1-62c80370-7ed4a0a10c7b27106031d457'}, 'origin': '60.168.149.12', 'url': 'http://httpbin.org/get'} """
2.2 Request header headers
There are many websites , If we don't add the request header , You may not get the content .
For example, the Zhihu discovery page below 
import requests
r=requests.get('https://www.zhihu.com/explore')
print(r.text)
""" Output <html> <head><title>403 Forbidden</title></head> <body bgcolor="white"> <center><h1>403 Forbidden</h1></center> <hr><center>openresty</center> </body> </html> """
After adding the request header, you can get the content normally .
import requests
headers={
'User-Agent':'Mozilla/5.0 (Linux; Android 8.0.0; SM-G955U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36 Edg/103.0.5060.66'
}
r=requests.get('https://www.zhihu.com/explore',headers=headers)
print(r.text)
Of course, the request header has other properties to master , such as Referer,Cookie You also need to understand , Reverse climbing is sometimes useful .
Refer to the previous post :【 Reptiles 】Web Basics —— Response head 、 Request header 、http&https、 Status code
2.3 post
def post(url, data=None, json=None, **kwargs):
r"""Sends a POST request. :param url: URL for the new :class:`Request` object. :param data: (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) json data to send in the body of the :class:`Request`. :param \*\*kwargs: Optional arguments that ``request`` takes. :return: :class:`Response <Response>` object :rtype: requests.Response """
demo Let's not list , Data is submitted by hitting him . The parameters in the document are similar to request similar , Below request Some attributes in can be used , such as file( We can upload files through this )
# Upload files
import requests
file={
'file':open("./data/test.png",'rb')}
r=requests.post('http://httpbin.org/post',files=file)
print(r.text)
2.4 requests.request
We can look at it Request This class , What are the specific attributes .
def request(method, url, **kwargs):
"""Constructs and sends a :class:`Request <Request>`. :param method: method for the new :class:`Request` object. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary, list of tuples or bytes to send in the query string for the :class:`Request`. :param data: (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`. :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload. ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')`` or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers to add for the file. :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth. :param timeout: (optional) How many seconds to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``. :type allow_redirects: bool :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy. :param verify: (optional) Either a boolean, in which case it controls whether we verify the server's TLS certificate, or a string, in which case it must be a path to a CA bundle to use. Defaults to ``True``. :param stream: (optional) if ``False``, the response content will be immediately downloaded. :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair. :return: :class:`Response <Response>` object :rtype: requests.Response Usage:: >>> import requests >>> req = requests.request('GET', 'https://httpbin.org/get') <Response [200]> """
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
2.5 cookie
Cookies Refers to some websites in order to identify users , Carry out drawing tracking II data stored in the user's local terminal . It can be used to maintain the session state .
When the client first requests the server , return set-cookie Field response , The client browser will store Cookie Information .
On the second visit , The client browser will submit the request to the server , The server determines the session state .
import requests
r=requests.get('http://www.baidu.com')
cookie=r.cookies
for key,value in cookies.items():
print(key+':'+value)
""" BAIDUID:974A BIDUPSID:974A9F PSTM:1657 """
2.6 Conversation maintenance
introduce Session object , Maintain the same session .( The difference is how many browsers )
- Demo1: It is equivalent to accessing through two different browsers
import requests
requests.get('http://httpbin.org/cookies/set/number/1234567')
r=requests.get('http://httpbin.org/cookies')
print(r.text)
"""output { "cookies": {} } """
You can also try to access these two with a browser url, Better understanding .
- Demo2:Session object
import requests
sess=requests.session()
sess.get('http://httpbin.org/cookies/set/number/1234567')
r=sess.get('http://httpbin.org/cookies')
print(r.text)
"""Output { "cookies": { "number": "1234567" } } """
2.7 Prepared Request object
Request object , Just look at the source code directly in this part , With this Request Objects can treat requests as independent objects . Write a little below demo.
from requests import Request,Session
url='http://www.baidu.com'
headers={
}
s=Session()
req=Request('get',url) #headers coca
prepare=s.prepare_request(req)
r=s.send(prepare)
print(r.text)
You can try to understand and supplement prepare() And Prepared_Request() The difference between .
Advanced Usage — Requests 2.28.1 documentation There is... In the document , But I didn't make it clear , Later use and supplement .
边栏推荐
- Charging pile charging technology new energy charging pile development
- TCP——滑动窗口
- Basic shell operations (Part 1)
- Solve the problem of MSVC2017 compiler with yellow exclamation mark in kits component of QT
- 用户身份标识与账号体系实践
- What if official account does not support markdown format file preparation?
- Proteus simulation based on msp430f2491 (realize water lamp)
- New energy shared charging pile management and operation platform
- C language watch second kill assist repeatedly
- Arfoundation starts from scratch 5-ar image tracking
猜你喜欢

What if official account does not support markdown format file preparation?

Detailed steps of installing MySQL 5.7 for windows

QT learning: use non TS files such as json/xml to realize multilingual internationalization

Common query optimization technology of data Lake - "deepnova developer community"

Week 1 task deep learning and pytorch Foundation

Temperature acquisition and control system based on WiFi

Virtual augmentation and reality Part 2 (I'm a Firebird)

Noise monitoring and sensing system

Data warehouse layered design and data synchronization,, 220728,,,,

Cs4344 domestic substitute for dp4344 192K dual channel 24 bit DA converter
随机推荐
2022 Teddy cup data mining challenge C project and post game summary
ML.NET相关资源整理
Component transfer participation lifecycle
Proteus simulation based on 51 MCU ADC0808
HC-SR04超声波测距模块使用方法和例程(STM32)
Importerror: no module named XX
Use SQL client How can the job generated by SH achieve breakpoint continuation after cancle?
用户身份标识与账号体系实践
Personal study notes
C language calculates the length of string
The first week of postgraduate freshman training: deep learning and pytorch Foundation
What if official account does not support markdown format file preparation?
Low cost 2.4GHz wireless transceiver chip -- ci24r1
RPC和REST
7.1-default-arguments
Noise monitoring and sensing system
Unity shader learning (VI) achieving radar scanning effect
Solve the problem of MSVC2017 compiler with yellow exclamation mark in kits component of QT
To create a thread pool for the rate, start the core thread
DC motor speed regulation system based on 51 single chip microcomputer (use of L298)