当前位置:网站首页>Reptile practice
Reptile practice
2022-07-05 21:49:00 【Computer Trainee】
Reptile practice —urllib Library usage
Hello ! Welcome to Computer Trainee Blog post .
If you want to learn relevant content , You can follow bloggers Computer Trainee , Communicate and discuss problems with bloggers .
urllib Library basic use
# -*- coding=utf-8 -*-
# ------------urllib Basic use ----------------
import urllib.request
# Enter url
url = 'http://www.baidu.com'
# Impersonate a browser to send a request , return response
response = urllib.request.urlopen(url)
# read Read the content ,decode decode ,utf-8 by <html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"
content = response.read().decode('utf-8')
# Print the content
print(content)
# 1, A byte by byte read
content1 = response.read()
# 2, Before reading 5 Bytes
content2 = response.read(5)
# 3, Read a line
content3 = response.readline()
# 4, Read multiple lines
content4 = response.readlines()
# 5, Read the head
content5 = response.getheaders()
# Read the status code ,200 It's normal
code = response.getcode()
# Read URL
urll = response.geturl()
# ------------------------------------------------
The above content is for basic use , Readers can run , View results . If you have questions, you can leave a message .
urllib Download Baidu pictures and videos
- Find the link address of the video or picture
The picture link address is right click , Directly copy the address into the program
The video link address is shown in the figure below ( The steps are left click ):
find src, That is the link address of the video . Copy it into the program . - Use urlretrieve Download
urlretrieve For downloaded functions , Usage method: :urlretrieve( video / The link address of the picture , ‘ Save the path ’)
# ---------------- Download pictures and videos ----------------------
from urllib.request import urlopen, urlretrieve
url = 'http://www.baidu.com'
# Copy image address
url_img = 'https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fimg.jj20.com%2Fup%2Fallimg%2F4k%2Fs%2F02%2F2109242129504953-0-lp.jpg&refer=http%3A%2F%2Fimg.jj20.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1646899077&t=6585c66ba1ae162ac19a4665f8120aea'
# url_img File download path ,' File save path '
urlretrieve(url_img, 'F://python- introduction / Reptile practice /girl.jpg')
# Check , Copy the address of the video , See crawler handwriting for detailed operation
url_video = 'https://vd4.bdstatic.com/mda-kkefq6gkpfrcniwa/sc/cae_h264_nowatermark/1605409952/mda-kkefq6gkpfrcniwa.mp4?v_from_s=hkapp-haokan-nanjing&auth_key=1644309721-0-0-3296c254900172089ec79be4faf7c27e&bcevod_channel=searchbox_feed&pd=1&pt=3&logid=0721124788&vid=5479348676568395032&abtest=100534_1&klogid=0721124788'
urlretrieve(url_video, 'F://python- introduction / Reptile practice / beauty .mp4')
# -------------------------------------------------
Customization of request header ——UA Back climbing
- Right click the mouse , Click to check , Refresh web page
Click on Network, Click the icon to find the website and click , Slide down to UA, Copy into the program .
# ----------------- Request object customization ---------------------
from urllib.request import urlopen, Request
url = 'http://www.baidu.com/'
# Set request header
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'
}
# Simulate the browser to send a request to the server
request = Request(url=url, headers=header)
# Return the requested content
response = urlopen(request)
# Read request content ,Content-Type: text/html;charset=utf-8
content = response.read().decode('utf-8')
print(content)
# ---------------------------------------------------
The above is the actual operation of reptile entry , If you have any questions, you can contact the author for communication .
Paying attention to the author can learn more about the actual operation of the program .
边栏推荐
- 使用Aspect制作全局异常处理类
- GCC9.5离线安装
- The primary key is set after the table is created, but auto increment is not set
- Huawei game multimedia service calls the method of shielding the voice of the specified player, and the error code 3010 is returned
- 张丽俊:穿透不确定性要靠四个“不变”
- uni-app 蓝牙通信
- Analysis and test of ModbusRTU communication protocol
- PIP install beatifulsoup4 installation failed
- Oracle检查点队列–实例崩溃恢复原理剖析
- EasyExcel的讀寫操作
猜你喜欢
Simple interest mode - evil Chinese style
Drawing HSV color wheel with MATLAB
递归查询多级菜单数据
R language learning notes
Incentive mechanism of Ethereum eth
An exception occurred in Huawei game multimedia calling the room switching method internal system error Reason:90000017
华为游戏多媒体调用切换房间方法出现异常Internal system error. Reason:90000017
Kingbasees v8r3 cluster maintenance case -- online addition of standby database management node
Teach yourself to train pytorch model to Caffe (III)
SQL knowledge leak detection
随机推荐
场景化面试:关于分布式锁的十问十答
Interviewer: will concurrent programming practice meet? (detailed explanation of thread control operation)
MMAP
Net small and medium-sized enterprise project development framework series (one)
KingbaseES V8R3集群维护案例之---在线添加备库管理节点
Yolov5 training custom data set (pycharm ultra detailed version)
Gcc9.5 offline installation
The solution to the problem that Oracle hugepages are not used, causing the server to be too laggy
Huawei game multimedia service calls the method of shielding the voice of the specified player, and the error code 3010 is returned
张丽俊:穿透不确定性要靠四个“不变”
crm创建基于fetch自己的自定义报告
"Grain mall" -- Summary and induction
Sorting out the problems encountered in MySQL built by pycharm connecting virtual machines
Some common processing problems of structural equation model Amos software
Teach yourself to train pytorch model to Caffe (I)
POJ 3237 tree (tree chain splitting)
Summarize the reasons for 2XX, 3xx, 4xx, 5xx status codes
[daily training -- Tencent select 50] 89 Gray code (only after seeing the solution of the problem)
Huawei cloud modelarts text classification - takeout comments
datagrid直接编辑保存“设计缺陷”