当前位置:网站首页>Reptile practice
Reptile practice
2022-07-05 21:49:00 【Computer Trainee】
Reptile practice —urllib Library usage
Hello ! Welcome to Computer Trainee Blog post .
If you want to learn relevant content , You can follow bloggers Computer Trainee , Communicate and discuss problems with bloggers .
urllib Library basic use
# -*- coding=utf-8 -*-
# ------------urllib Basic use ----------------
import urllib.request
# Enter url
url = 'http://www.baidu.com'
# Impersonate a browser to send a request , return response
response = urllib.request.urlopen(url)
# read Read the content ,decode decode ,utf-8 by <html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"
content = response.read().decode('utf-8')
# Print the content
print(content)
# 1, A byte by byte read
content1 = response.read()
# 2, Before reading 5 Bytes
content2 = response.read(5)
# 3, Read a line
content3 = response.readline()
# 4, Read multiple lines
content4 = response.readlines()
# 5, Read the head
content5 = response.getheaders()
# Read the status code ,200 It's normal
code = response.getcode()
# Read URL
urll = response.geturl()
# ------------------------------------------------
The above content is for basic use , Readers can run , View results . If you have questions, you can leave a message .
urllib Download Baidu pictures and videos
- Find the link address of the video or picture
The picture link address is right click , Directly copy the address into the program
The video link address is shown in the figure below ( The steps are left click ):
find src, That is the link address of the video . Copy it into the program .
- Use urlretrieve Download
urlretrieve For downloaded functions , Usage method: :urlretrieve( video / The link address of the picture , ‘ Save the path ’)
# ---------------- Download pictures and videos ----------------------
from urllib.request import urlopen, urlretrieve
url = 'http://www.baidu.com'
# Copy image address
url_img = 'https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fimg.jj20.com%2Fup%2Fallimg%2F4k%2Fs%2F02%2F2109242129504953-0-lp.jpg&refer=http%3A%2F%2Fimg.jj20.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1646899077&t=6585c66ba1ae162ac19a4665f8120aea'
# url_img File download path ,' File save path '
urlretrieve(url_img, 'F://python- introduction / Reptile practice /girl.jpg')
# Check , Copy the address of the video , See crawler handwriting for detailed operation
url_video = 'https://vd4.bdstatic.com/mda-kkefq6gkpfrcniwa/sc/cae_h264_nowatermark/1605409952/mda-kkefq6gkpfrcniwa.mp4?v_from_s=hkapp-haokan-nanjing&auth_key=1644309721-0-0-3296c254900172089ec79be4faf7c27e&bcevod_channel=searchbox_feed&pd=1&pt=3&logid=0721124788&vid=5479348676568395032&abtest=100534_1&klogid=0721124788'
urlretrieve(url_video, 'F://python- introduction / Reptile practice / beauty .mp4')
# -------------------------------------------------
Customization of request header ——UA Back climbing
- Right click the mouse , Click to check , Refresh web page
Click on Network, Click the icon to find the website and click , Slide down to UA, Copy into the program .
# ----------------- Request object customization ---------------------
from urllib.request import urlopen, Request
url = 'http://www.baidu.com/'
# Set request header
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'
}
# Simulate the browser to send a request to the server
request = Request(url=url, headers=header)
# Return the requested content
response = urlopen(request)
# Read request content ,Content-Type: text/html;charset=utf-8
content = response.read().decode('utf-8')
print(content)
# ---------------------------------------------------
The above is the actual operation of reptile entry , If you have any questions, you can contact the author for communication .
Paying attention to the author can learn more about the actual operation of the program .
边栏推荐
猜你喜欢

Summarize the reasons for 2XX, 3xx, 4xx, 5xx status codes

资深电感厂家告诉你电感什么情况会有噪音电感噪音是比较常见的一种电感故障情况,如果使用的电感出现了噪音大家也不用着急,只需要准确查找分析出什么何原因,其实还是有具体的方法来解决的。作为一家拥有18年品牌

matlab绘制hsv色轮图

DBeaver同时执行多条insert into报错处理

Two ways to realize video recording based on avfoundation

Comprehensive optimization of event R & D workflow | Erda version 2.2 comes as "7"

面试官:并发编程实战会吗?(线程控制操作详解)

QML reported an error expected token ";", expected a qualified name ID

华为快游戏调用登录接口失败,返回错误码 -1

MMAP learning
随机推荐
Selenium finds the contents of B or P Tags
Environment configuration problem record
Poj3414 extensive search
事项研发工作流全面优化|Erda 2.2 版本如“七”而至
QML reported an error expected token ";", expected a qualified name ID
How to organize an actual attack and defense drill
leetcode:1755. Sum of subsequences closest to the target value
华为联机对战如何提升玩家匹配成功几率
Pointer parameter passing vs reference parameter passing vs value parameter passing
深信服X计划-网络协议基础 DNS
Teach yourself to train pytorch model to Caffe (I)
HDU 4391 Paint The Wall 段树(水
HYSBZ 2243 染色 (树链拆分)
Haas506 2.0 development tutorial - Alibaba cloud OTA - PAC firmware upgrade (only supports versions above 2.2)
Opérations de lecture et d'écriture pour easyexcel
EasyExcel的读写操作
Dbeaver executes multiple insert into error processing at the same time
Selenium gets the verification code image in DOM
Image editor for their AutoLayout environment
Robot framework setting variables