当前位置:网站首页>Chapter II proxy and cookies of urllib Library
Chapter II proxy and cookies of urllib Library
2022-07-07 00:46:00 【Iron pot Dun goose】
2.6 Advanced usage
List of articles
When dealing with more advanced operations (Cookies Handle , Agent settings ), Need a powerful tool Handler, It can be understood as various processors , There are people who handle login authentication 、 There is a deal Cookies Of 、 There are options for handling agent settings . Using these can almost do HTTP Everything in the request . among urllib.request In the module BaseHandler class It's all the others Handler Parent class of , Provides the most basic method . Such as default_open()、protocol_request() etc. .
HTTPdefaultErrorHandler: Used for processing HTTP Response error , Errors will be thrown HTTPError Types of abnormal
HTTPRedirectHandler: Used to handle redirection
HTTPCookieProcessor: Used for processing Cookies
ProxyHandler: Used to set up agents , The default is empty.
HTTPPasswordMgr: For managing passwords , Maintain a table of user names and passwords
HTTPBacisAuthHandler: Used to manage authentication , Weak link opening requires authentication , It can be used to solve the authentication problem
There are other official documents Handler class :urllib.request — Extensible library for opening URLs — Python 3.10.2 documentation
Another common class OpenerDirector Also called Opener,urlopen() It can be seen as urllib One of the Opener. Ahead Rquest and urlopen It is equivalent to the common request method encapsulated by the class library , You can complete basic requests , But to achieve more advanced operations, we need to go deep into a layer of configuration . So you need to use Opener.
Baidu is still the chestnut
import urllib.request
url = 'http://www.baidu.com'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1'
}
request = urllib.request.Request(url=url, headers=headers)
# obtain handler object
handler = urllib.request.HTTPHandler()
# obtain opener object
opener = urllib.request.build_opener(handler)
# call open Method
response = opener.open(request)
content = response.read().decode('utf-8')
print(content)
above 12~16 OK, it's equivalent to urllib.request.urlopen().
2.6.1 Cookies
2.6.1.1 Microblogging Cookie
import urllib.request
url = 'https://weibo.cn/5567237565/info'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1'
}
request = urllib.request.Request(url=url, headers=headers)
response = urllib.request.urlopen(request)
content = response.read().decode('utf-8')
print(content)
Traceback (most recent call last):
File "D:\pythonSpider\demo1.py", line 18, in <module>
content = response.read().decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 672: invalid start byte
Because from the top url What you enter is not the user profile interface , It's the login interface . But the code format of the login interface is not utf-8
The source code of the obtained web page cannot enter the profile page , Because the user name, password and other information in the source code are empty , To put it bluntly, the request header information is less . The decisive factor is in the request header Cookies.Cookie Login information is carried in , If you have a login Cookie, We can almost take Cookie Go to any page . But remember to change the above decoding format back utf-8.
2.6.1.2 Cookies obtain
CookieJar Subclasses of classes
- CookieJar: management HTTP cookie value 、 Storage HTTP Request generated cookie、 Outgoing to HTTP Request add cookie The object of . Whole cookie All stored in memory , Yes CookieJar After garbage collection of instance cookie Will also be lost .
- FileCookieJar: from CookieJar Derived from , Used to create FileCookieJar example , retrieval cookie Information and will cookie Store in file .filename Is stored cookie The name of the file .delayload by True Delayed access to files is supported , That is, read files or store data in files only when needed .
- MozillaCookieJar: from FileCookieJar Derived from , Create with Mozilla browser cookies.txt Compatible FileCookieJar example .
- LWPCookieJar: from FileCookieJar Derived from , Create with libwww-perl The standard Set-Cookie3 File format compatible FileCookieJar example .
Let me declare one CookieJar object , And then use it HTTTPCookieProcess Construct a handler, And then use build_opener() build opener To carry out open().
import http.cookiejar, urllib.request
cookie = http.cookiejar.CookieJar()
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('https://www.baidu.com')
for item in cookie:
print(item.name+'='+item.value)
Output each Cookie Name and value of :
BAIDUID=4C8D462AF5782C41A1C879BB0C58E1EA:FG=1
BIDUPSID=4C8D462AF5782C41ED2A362BEB0BE5C7
PSTM=1644124352
BD_NOT_HTTPS=1
It can also be output in file format , For example, text format , Different subclasses are required . But those three steps are still those three steps :
structure handler object :handler = urllib.request.HTTPCookieProcessor(cookie)
structure opener object :opener = urllib.request.build_opener(handler)
perform open:opener = urllib.request.build_opener(handler)
urllib.request: https://docs.python.org/3/library/urllib.request.html#httpcookieprocessor-objects Official documents
2.6.2 proxy server
1. Common functions of agents :
- Break through yourself ip Access restrictions , Visit foreign sites .
- Visit internal resources of some units or groups .
- Improve access speed
- Usually the proxy server has a large hard disk buffer , When outside information passes through , Also save it to the buffer . The browser sends a request to the proxy server , When the proxy server receives the request information number , Then the proxy server retrieves the information required by the browser and transmits it to the browser of the end user .
- Hide the truth ip, Be free from attack .
2. Agent settings
establish Request object
establish ProxyHandler object
use handler establish opener object
use opener.open Method to send a request
import urllib.request
url = 'http://ip.hao86.com/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1'
}
request = urllib.request.Request(url=url, headers=headers)
# obtain handler object
proxy = {
'http': '39.155.253.58:8060'
}
handler = urllib.request.ProxyHandler(proxies=proxy)
# obtain opener object
opener = urllib.request.build_opener(handler)
# call open Method
response = opener.open(request)
content = response.read().decode('utf-8')
print(content)
Illustrate this ip No duck , You can spend some money on one , It won't cost much ~. If you don't want to spend money, you can also act as an agent in Kuai https://www.kuaidaili.com/free/ Find one that works .
边栏推荐
- Testers, how to prepare test data
- 深度学习之数据处理
- Data analysis course notes (V) common statistical methods, data and spelling, index and composite index
- QT tutorial: creating the first QT program
- 工程师如何对待开源 --- 一个老工程师的肺腑之言
- Compilation of kickstart file
- Amazon MemoryDB for Redis 和 Amazon ElastiCache for Redis 的内存优化
- 一图看懂对程序员的误解:西方程序员眼中的中国程序员
- .class文件的字节码结构
- Equals() and hashcode()
猜你喜欢
Zynq transplant ucosiii
一图看懂对程序员的误解:西方程序员眼中的中国程序员
On February 19, 2021ccf award ceremony will be held, "why in Hengdian?"
Attention slam: a visual monocular slam that learns from human attention
用tkinter做一个简单图形界面
File and image comparison tool kaleidoscope latest download
stm32F407-------SPI通信
Mujoco finite state machine and trajectory tracking
Lombok 同时使⽤ @Data 和 @Builder 的坑,你中招没?
Stm32f407 ------- DAC digital to analog conversion
随机推荐
What is time
Are you ready to automate continuous deployment in ci/cd?
Leecode brush questions record sword finger offer 43 The number of occurrences of 1 in integers 1 to n
Personal digestion of DDD
沉浸式投影在线下展示中的三大应用特点
Everyone is always talking about EQ, so what is EQ?
Value Function Approximation
alexnet实验偶遇:loss nan, train acc 0.100, test acc 0.100情况
Compilation of kickstart file
Lombok 同时使⽤ @Data 和 @Builder 的坑,你中招没?
Notes of training courses selected by Massey school
【软件逆向-求解flag】内存获取、逆变换操作、线性变换、约束求解
Alexnet experiment encounters: loss Nan, train ACC 0.100, test ACC 0.100
C9高校,博士生一作发Nature!
Deep learning environment configuration jupyter notebook
2022/2/12 summary
Imeta | Chen Chengjie / Xia Rui of South China Agricultural University released a simple method of constructing Circos map by tbtools
Advanced learning of MySQL -- Fundamentals -- four characteristics of transactions
Leecode brushes questions and records interview questions 01.02 Determine whether it is character rearrangement for each other
Matlab learning notes