当前位置:网站首页>Chapter II proxy and cookies of urllib Library
Chapter II proxy and cookies of urllib Library
2022-07-07 00:46:00 【Iron pot Dun goose】
2.6 Advanced usage
List of articles
When dealing with more advanced operations (Cookies Handle , Agent settings ), Need a powerful tool Handler, It can be understood as various processors , There are people who handle login authentication 、 There is a deal Cookies Of 、 There are options for handling agent settings . Using these can almost do HTTP Everything in the request . among urllib.request In the module BaseHandler class It's all the others Handler Parent class of , Provides the most basic method . Such as default_open()、protocol_request() etc. .
HTTPdefaultErrorHandler: Used for processing HTTP Response error , Errors will be thrown HTTPError Types of abnormal
HTTPRedirectHandler: Used to handle redirection
HTTPCookieProcessor: Used for processing Cookies
ProxyHandler: Used to set up agents , The default is empty.
HTTPPasswordMgr: For managing passwords , Maintain a table of user names and passwords
HTTPBacisAuthHandler: Used to manage authentication , Weak link opening requires authentication , It can be used to solve the authentication problem
There are other official documents Handler class :urllib.request — Extensible library for opening URLs — Python 3.10.2 documentation
Another common class OpenerDirector Also called Opener,urlopen() It can be seen as urllib One of the Opener. Ahead Rquest and urlopen It is equivalent to the common request method encapsulated by the class library , You can complete basic requests , But to achieve more advanced operations, we need to go deep into a layer of configuration . So you need to use Opener.
Baidu is still the chestnut
import urllib.request
url = 'http://www.baidu.com'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1'
}
request = urllib.request.Request(url=url, headers=headers)
# obtain handler object
handler = urllib.request.HTTPHandler()
# obtain opener object
opener = urllib.request.build_opener(handler)
# call open Method
response = opener.open(request)
content = response.read().decode('utf-8')
print(content)
above 12~16 OK, it's equivalent to urllib.request.urlopen().
2.6.1 Cookies
2.6.1.1 Microblogging Cookie
import urllib.request
url = 'https://weibo.cn/5567237565/info'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1'
}
request = urllib.request.Request(url=url, headers=headers)
response = urllib.request.urlopen(request)
content = response.read().decode('utf-8')
print(content)
Traceback (most recent call last):
File "D:\pythonSpider\demo1.py", line 18, in <module>
content = response.read().decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 672: invalid start byte
Because from the top url What you enter is not the user profile interface , It's the login interface . But the code format of the login interface is not utf-8
The source code of the obtained web page cannot enter the profile page , Because the user name, password and other information in the source code are empty , To put it bluntly, the request header information is less . The decisive factor is in the request header Cookies.Cookie Login information is carried in , If you have a login Cookie, We can almost take Cookie Go to any page . But remember to change the above decoding format back utf-8.
2.6.1.2 Cookies obtain
CookieJar Subclasses of classes
- CookieJar: management HTTP cookie value 、 Storage HTTP Request generated cookie、 Outgoing to HTTP Request add cookie The object of . Whole cookie All stored in memory , Yes CookieJar After garbage collection of instance cookie Will also be lost .
- FileCookieJar: from CookieJar Derived from , Used to create FileCookieJar example , retrieval cookie Information and will cookie Store in file .filename Is stored cookie The name of the file .delayload by True Delayed access to files is supported , That is, read files or store data in files only when needed .
- MozillaCookieJar: from FileCookieJar Derived from , Create with Mozilla browser cookies.txt Compatible FileCookieJar example .
- LWPCookieJar: from FileCookieJar Derived from , Create with libwww-perl The standard Set-Cookie3 File format compatible FileCookieJar example .
Let me declare one CookieJar object , And then use it HTTTPCookieProcess Construct a handler, And then use build_opener() build opener To carry out open().
import http.cookiejar, urllib.request
cookie = http.cookiejar.CookieJar()
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('https://www.baidu.com')
for item in cookie:
print(item.name+'='+item.value)
Output each Cookie Name and value of :
BAIDUID=4C8D462AF5782C41A1C879BB0C58E1EA:FG=1
BIDUPSID=4C8D462AF5782C41ED2A362BEB0BE5C7
PSTM=1644124352
BD_NOT_HTTPS=1
It can also be output in file format , For example, text format , Different subclasses are required . But those three steps are still those three steps :
structure handler object :handler = urllib.request.HTTPCookieProcessor(cookie)
structure opener object :opener = urllib.request.build_opener(handler)
perform open:opener = urllib.request.build_opener(handler)
urllib.request: https://docs.python.org/3/library/urllib.request.html#httpcookieprocessor-objects Official documents
2.6.2 proxy server
1. Common functions of agents :
- Break through yourself ip Access restrictions , Visit foreign sites .
- Visit internal resources of some units or groups .
- Improve access speed
- Usually the proxy server has a large hard disk buffer , When outside information passes through , Also save it to the buffer . The browser sends a request to the proxy server , When the proxy server receives the request information number , Then the proxy server retrieves the information required by the browser and transmits it to the browser of the end user .
- Hide the truth ip, Be free from attack .
2. Agent settings
establish Request object
establish ProxyHandler object
use handler establish opener object
use opener.open Method to send a request
import urllib.request
url = 'http://ip.hao86.com/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1'
}
request = urllib.request.Request(url=url, headers=headers)
# obtain handler object
proxy = {
'http': '39.155.253.58:8060'
}
handler = urllib.request.ProxyHandler(proxies=proxy)
# obtain opener object
opener = urllib.request.build_opener(handler)
# call open Method
response = opener.open(request)
content = response.read().decode('utf-8')
print(content)
Illustrate this ip No duck , You can spend some money on one , It won't cost much ~. If you don't want to spend money, you can also act as an agent in Kuai https://www.kuaidaili.com/free/ Find one that works .
边栏推荐
- Advanced learning of MySQL -- basics -- multi table query -- joint query
- 沉浸式投影在线下展示中的三大应用特点
- MySQL learning notes (mind map)
- Quaternion attitude calculation of madgwick
- 英雄联盟|王者|穿越火线 bgm AI配乐大赛分享
- JS+SVG爱心扩散动画js特效
- Mujoco finite state machine and trajectory tracking
- Advanced learning of MySQL -- basics -- multi table query -- external connection
- Levels - UE5中的暴雨效果
- Data sharing of the 835 postgraduate entrance examination of software engineering in Hainan University in 23
猜你喜欢
Core knowledge of distributed cache
stm32F407-------SPI通信
Memory optimization of Amazon memorydb for redis and Amazon elasticache for redis
48 page digital government smart government all in one solution
Alexnet experiment encounters: loss Nan, train ACC 0.100, test ACC 0.100
Stm32f407 ------- SPI communication
Data analysis course notes (III) array shape and calculation, numpy storage / reading data, indexing, slicing and splicing
Data analysis course notes (V) common statistical methods, data and spelling, index and composite index
If the college entrance examination goes well, I'm already graying out at the construction site at the moment
Attention SLAM:一种从人类注意中学习的视觉单目SLAM
随机推荐
Cross-entrpy Method
Imeta | Chen Chengjie / Xia Rui of South China Agricultural University released a simple method of constructing Circos map by tbtools
Advanced learning of MySQL -- basics -- multi table query -- self join
JS+SVG爱心扩散动画js特效
Stm32f407 ------- DAC digital to analog conversion
Lombok makes ⽤ @data and @builder's pit at the same time. Are you hit?
Meet the level 3 requirements of ISO 2.0 with the level B construction standard of computer room | hybrid cloud infrastructure
Common shortcuts to idea
Business process testing based on functional testing
一图看懂对程序员的误解:西方程序员眼中的中国程序员
Things like random
What is web penetration testing_ Infiltration practice
JWT signature does not match locally computed signature. JWT validity cannot be asserted and should
Jenkins' user credentials plug-in installation
Data processing of deep learning
509 certificat basé sur Go
Leetcode(547)——省份数量
Rails 4 asset pipeline vendor asset images are not precompiled
建立自己的网站(17)
【vulnhub】presidential1