当前位置:网站首页>爬虫入门(1)——requests(1)
爬虫入门(1)——requests(1)
2022-07-28 14:33:00 【WHJ226】
目录
requests库采requests用的是阻塞式的网络请求方式,也就是说,发起请求之后,必须等到有响应才会继续执行下面的任务。
1. requests模块安装
基于PyCharm2022.1.1的开发环境。
1.1 pip insatll requests
点击Terminal

输入pip install requests后回车,本人已经安装过,所以显示需求以满足。
1.2 PyCharm安装




安装完成后会显示类似successful标志 。
2. requests实战
以搜狗为例:
import requests #导入模块
url = 'https://www.sogou.com/' #请求网址
response = requests.get(url) #响应
response.encoding = 'utf-8' #编码方式
print('响应内容为:',response.content) #获取响应内容
print('响应文本为:',response.text) #获取响应文本
print('请求头为:',response.headers) #获取请求头
print('请求方式为:',response.request) #获取请求方式
print('编码方式为:',response.encoding) #获取编码方式
print('请求网址url为:',response.url) #获取请求网址url
print('cookies为:',response.cookies) #获取cookies
print('状态码为:',response.status_code) #获取状态码,一般200请求成功,404请求失败
print('响应类型为:',type(response)) #获取响应类型
print('内容响应类型为:',type(response.content))
print('文本响应类型为:',type(response.text))运行结果如下:
响应内容为: b'<!DOCTYPE html><html lang="cn"><head><meta name="viewport" content="width=device-width,minimum-scale=1,maximum-scale=1,user-scalable=no"><script>window._speedMark = new Date(); window.lead_ip = \'123.147.244.130\';\n window.now = 1653966907968;</script><script type="text/javascript">/*file=static/js/resourceErrorReport.js*/!function(a){var n=(new Date).getTime(),r=a.location.protocol;function c(e,t){var o=(new Date).getTime()-n;(new Image).src=["//pb.sogou.com/pv.gif?uigs_productid=wapapp&type=resource-error&stype=",e,"×tamp=",o,"&protocol=",r,"&host=",encodeURIComponent(a.location.host),"&path=",encodeURIComponent(a.location.pathname),"&resource=",encodeURIComponent(t)].join("")}function e(e){if((e=e||a.event)&&"error"===e.type){var t=e.srcElement?e.srcElement:e.target;if(t){var o,n,r=t.tagName;"LINK"===r?(n="css",(o=t.getAttribute("href"))&&o.match(/\\.css($|\\?)/)&&c(n,o)):"SCRIPT"===r&&(n="js",(o=t.getAttribute("src"))&&o.match(/\\.js($|\\?)/)&&c(n,o))}}}r&&(r=r.substring(0,r.length-1)),a.addEventListener?a.addEventListener("error",e,!0):a.attachEvent&&a.attachEvent("onerror",e)}(window);</script><meta charset="utf-8"><link rel="dns-prefetch" href="//img01.sogoucdn.com"><link rel="dns-prefetch" href="//img02.sogoucdn.com"><link rel="dns-prefetch" href="//img03.sogoucdn.com"><link rel="dns-prefetch" href="//img04.sogoucdn.com"><link rel="dns-prefetch" href="//dlweb.sogoucdn.com"><title>\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2\xe5\xbc\x95\xe6\x93\x8e - \xe4\xb8\x8a\xe7\xbd\x91\xe4\xbb\x8e\xe6\x90\x9c\xe7\x8b\x97\xe5\xbc\x80\xe5\xa7\x8b</title><link rel="shortcut icon" href="/images/logo/new/favicon.ico?v=4" type="image/x-icon"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2"><meta name="keywords" content="\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2,\xe7\xbd\x91\xe9\xa1\xb5\xe6\x90\x9c\xe7\xb4\xa2,\xe5\xbe\xae\xe4\xbf\xa1\xe6\x90\x9c\xe7\xb4\xa2,\xe8\xa7\x86\xe9\xa2\x91\xe6\x90\x9c\xe7\xb4\xa2,\xe5\x9b\xbe\xe7\x89\x87\xe6\x90\x9c\xe7\xb4\xa2,\xe9\x9f\xb3\xe4\xb9\x90\xe6\x90\x9c\xe7\xb4\xa2,\xe6\x96\xb0\xe9\x97\xbb\xe6\x90\x9c\xe7\xb4\xa2,\xe8\xbd\xaf\xe4\xbb\xb6\xe6\x90\x9c\xe7\xb4\xa2,\xe9\x97\xae\xe7\xad\x94\xe6\x90\x9c\xe7\xb4\xa2,\xe7\x99\xbe\xe7\xa7\x91\xe6\x90\x9c\xe7\xb4\xa2,\xe8\xb4\xad\xe7\x89\xa9\xe6\x90\x9c\xe7\xb4\xa2"><meta name="description" content="\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2\xe6\x98\xaf\xe5\x85\xa8\xe7\x90\x83\xe7\xac\xac\xe4\xb8\x89\xe4\xbb\xa3\xe4\xba\x92\xe5\x8a\xa8\xe5\xbc\x8f\xe6\x90\x9c\xe7\xb4\xa2\xe5\xbc\x95\xe6\x93\x8e\xef\xbc\x8c\xe6\x94\xaf\xe6\x8c\x81\xe5\xbe\xae\xe4\xbf\xa1\xe5\x85\xac\xe4\xbc\x97\xe5\x8f\xb7\xe5\x92\x8c\xe6\x96\x87\xe7\xab\xa0\xe6\x90\x9c\xe7\xb4\xa2\xe3\x80\x81\xe7\x9f\xa5\xe4\xb9\x8e\xe6\x90\x9c\xe7\xb4\xa2\xe3\x80\x81\xe8\x8b\xb1\xe6\x96\x87\xe6\x90\x9c\xe7\xb4\xa2\xe5\x8f\x8a\xe7\xbf\xbb\xe8\xaf\x91\xe7\xad\x89\xef\xbc\x8c\xe9\x80\x9a\xe8\xbf\x87\xe8\x87\xaa\xe4\xb8\xbb\xe7\xa0\x94\xe5\x8f\x91\xe7\x9a\x84\xe4\xba\xba\xe5\xb7\xa5\xe6\x99\xba\xe8\x83\xbd\xe7\xae\x97\xe6\xb3\x95\xe4\xb8\xba\xe7\x94\xa8\xe6\x88\xb7\xe6\x8f\x90\xe4\xbe\x9b\xe4\xb8\x93\xe4\xb8\x9a\xe3\x80\x81\xe7\xb2\xbe\xe5\x87\x86\xe3\x80\x81\xe4\xbe\xbf\xe6\x8d\xb7\xe7\x9a\x84\xe6\x90\x9c\xe7\xb4\xa2\xe6\x9c\x8d\xe5\x8a\xa1\xe3\x80\x82"><link rel="stylesheet" type="text/css" href="//dlweb.sogoucdn.com/pcsearch/web/index/css/index_style_39e6e10.css"><style>.wrapper .suggestion{border:1px solid #e8e8e8;width:653px;-moz-box-shadow:0 1px 8px rgba(0,0,0,.1);-webkit-box-shadow:0 1px 8px rgba(0,0,0,.1);box-shadow:0 1px 8px rgba(0,0,0,.1);border-top-left-radius:0;border-top-right-radius:0;border-bottom-right-radius:2px;border-bottom-left-radius:2px;top:43px}.wrapper .suglist{width:206px}.wrapper .suglist .keyword{color:#7a77c8}.big-scn .suggestion{width:820px}.big-scn .suglist{width:236px}.wrapper .suglist{padding:4px 0}input[type=text]::-ms-clear{display:none}</style><!-- indexSnippetToHeader start --> <!-- indexSnippetToHeader end --></head><body color-style="white"><div class="wrapper " id="wrap"><div class="header"> <div class="top-nav"><ul><li class="cur"><span>\xe7\xbd\x91\xe9\xa1\xb5</span></li><li><a onclick="st(this,\'73141200\',\'weixin\')" href="http://weixin.sogou.com/" uigs-id="nav_weixin" id="weixinch">\xe5\xbe\xae\xe4\xbf\xa1</a></li><li><a onclick="st(this,\'40051200\',\'zhihu\')" href="http://zhihu.sogou.com/" uigs-id="nav_zhihu" id="zhihu">\xe7\x9f\xa5\xe4\xb9\x8e</a></li><li><a onclick="st(this,\'40030500\',\'pic\')" href="http://pic.sogou.com" uigs-id="nav_pic" id="pic">\xe5\x9b\xbe\xe7\x89\x87</a></li><li><a onclick="st(this,\'40030600\',\'video\')" href="https://v.sogou.com/" uigs-id="nav_v" id="video">\xe8\xa7\x86\xe9\xa2\x91</a></li><li><a href="http://mingyi.sogou.com?fr=common_index_nav" uigs-id="nav_mingyi" id="mingyi" onclick="st(this,\'\',\'myingyi\')">\xe5\x8c\xbb\xe7\x96\x97</a></li><li><a href="http://hanyu.sogou.com?fr=pcweb_index_nav" uigs-id="nav_hanyu" id="hanyu" onclick="st(this,\'\',\'hanyu\')">\xe6\xb1\x89\xe8\xaf\xad</a></li><li><a href="http://fanyi.sogou.com?fr=common_index_nav_pc" uigs-id="nav_fanyi" id="fanyi" onclick="st(this,\'\',\'fanyi\')">\xe7\xbf\xbb\xe8\xaf\x91</a></li><li><a onclick="st(this,\'web2ww\',\'wenwen\')" href="https://wenwen.sogou.com/?ch=websearch" uigs-id="nav_wenwen" id="index_more_wenwen">\xe9\x97\xae\xe9\x97\xae</a></li><li><a onclick="st(this,\'web2ww\',\'baike\')" href="http://baike.sogou.com/Home.v" uigs-id="nav_baike" id="index_baike">\xe7\x99\xbe\xe7\xa7\x91</a></li><li><a onclick="st(this,\'40031000\')" href="http://map.sogou.com" uigs-id="nav_map" id="map">\xe5\x9c\xb0\xe5\x9b\xbe</a></li><li class="show-more"><a href="javascript:void(0);" id="more-product">\xe6\x9b\xb4\xe5\xa4\x9a<i class="m-arr"></i></a><div class="pos-more" id="products-box" style="top:40px"><span class="ico-san"></span><a onclick="st(this,\'40031500\')" href="http://gouwu.sogou.com/" uigs-id="nav_gouwu" id="index_more_gouwu">\xe8\xb4\xad\xe7\x89\xa9</a><a onclick="st(this)" href="http://zhishi.sogou.com" uigs-id="nav_zhishi" id="index_more_zhishi">\xe7\x9f\xa5\xe8\xaf\x86</a><a onclick="st(this,\'40051205\')" href="http://as.sogou.com/" uigs-id="nav_app" id="index_more_appli">\xe5\xba\x94\xe7\x94\xa8</a><a href="https://baike.sogou.com/kexue/home.htm" uigs-id="nav_science" id="science">\xe7\xa7\x91\xe5\xad\xa6</a><span class="all"><a onclick="st(this,\'40051206\')" href="http://www.sogou.com/docs/more.htm?v=1" uigs-id="nav_all" target="_blank">\xe5\x85\xa8\xe9\x83\xa8</a></span></div></li></ul></div><div class="user-box"> <a href="javascript:void(0)" id="cniil_wza" style="float:left;text-decoration:none;color:#000;opacity:.75;padding-right:20px;margin-right:20px;border-right:1px solid #e7e7e7;line-height:14px;position:relative;top:5px">\xe6\x97\xa0\xe9\x9a\x9c\xe7\xa2\x8d</a> <div class="local-weather" id="local-weather"><div class="wea-box" id="cur-weather" style="display:none"></div> <div class="pos-more" id="detail-weather" style="top:40px;left:-80px"></div> </div><span class="line" id="user-box-line" style="display:none"></span><div class="user-enter"> <a href="javascript:void(0);" class="enter" id="loginBtn">\xe7\x99\xbb\xe5\xbd\x95</a> </div></div></div><div class="content" id="content"><div class="pos-header" id="top-float-bar"><div class="part-one"></div><div class="part-two" id="card-tab-layer"><div class="c-top" id="top-card-tab"></div></div></div><div class="logo2" id="logo-s"><span></span></div><div class="logo" id="logo-l"><span></span></div> <div class="search-box querybox-focus" id="search-box"><form action="/web" name="sf" id="sf"><span class="sec-input-box"><input type="text" class="sec-input active" name="query" id="query" maxlength="100" len="80" autocomplete="off"></span><span class="enter-input"><input type="submit" value="\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2" id="stb"></span><input type="hidden" name="_asf" value="www.sogou.com"> <input type="hidden" name="_ast"> <input type="hidden" name="w" value="01019900"> <input type="hidden" name="p" value="40040100"> <input type="hidden" name="ie" value="utf8"> <input type="hidden" name="from" value="index-nologin"> <input type="hidden" name="s_from" value="index"><div class="keywords-tips" id="keywordsTips" style="display:none"><i></i><p>\xe2\x80\x9c<strong id="keywordsTipsStrong">369</strong>\xe2\x80\x9d\xe5\x90\x8e\xe9\x9d\xa2\xe7\x9a\x84\xe6\x96\x87\xe5\xad\x97\xe8\xa2\xab\xe5\xbf\xbd\xe7\x95\xa5\xef\xbc\x8c\xe6\x90\x9c\xe7\x8b\x97\xe7\x9a\x84\xe6\x9f\xa5\xe8\xaf\xa2\xe9\x99\x90\xe5\x88\xb6\xe5\x9c\xa840\xe4\xb8\xaa\xe6\xb1\x89\xe5\xad\x97\xe4\xbb\xa5\xe5\x86\x85\xe3\x80\x82</p></div></form></div> </div><div class="card-box" id="card-box" style="display:none"><div class="card-box2" id="card-box2"><div class="c-top" id="card-tab-box"><a href="javascript:void(0);" uigs-id="settings_close-card" id="close-card" class="shezhi"></a></div><div class="c-main" id="card-content"></div></div></div><div class="loog-more" id="scroll-more" style="display:none"><a href="javascript:void(0);" uigs-id="scroll-more">\xe6\xbb\x9a\xe5\x8a\xa8\xe6\x9f\xa5\xe7\x9c\x8b\xe6\x9b\xb4\xe5\xa4\x9a<br><span class="ico_san"></span></a></div><div class="ft" id="footer" style="display:none" ><a href="http://b.sogou.com/" target="_blank" uigs-id="footer_tuiguang">\xe4\xbc\x81\xe4\xb8\x9a\xe6\x8e\xa8\xe5\xb9\xbf</a><span class="line"></span><a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" uigs-id="footer_disclaimer">\xe5\x85\x8d\xe8\xb4\xa3\xe5\xa3\xb0\xe6\x98\x8e</a><span class="line"></span><a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" uigs-id="footer_feedback">\xe6\x84\x8f\xe8\xa7\x81\xe5\x8f\x8d\xe9\xa6\x88\xe5\x8f\x8a\xe6\x8a\x95\xe8\xaf\x89</a><span class="line"></span><a href="http://corp.sogou.com/private.html" target="_blank" uigs-id="footer_private">\xe9\x9a\x90\xe7\xa7\x81\xe6\x94\xbf\xe7\xad\x96</a><br><span class="g">\xe8\x8d\xaf\xe5\x93\x81\xe5\x8c\xbb\xe7\x96\x97\xe5\x99\xa8\xe6\xa2\xb0\xe7\xbd\x91\xe7\xbb\x9c\xe4\xbf\xa1\xe6\x81\xaf\xe6\x9c\x8d\xe5\x8a\xa1\xe5\xa4\x87\xe6\xa1\x88\xef\xbc\x9a\xef\xbc\x88\xe4\xba\xac\xef\xbc\x89\xe7\xbd\x91\xe8\x8d\xaf\xe6\xa2\xb0\xe4\xbf\xa1\xe6\x81\xaf\xe5\xa4\x87\xe5\xad\x97\xef\xbc\x882021\xef\xbc\x89\xe7\xac\xac00047\xe5\x8f\xb7</span> / <span class="g">\xe4\xba\x92\xe8\x81\x94\xe7\xbd\x91\xe8\x8d\xaf\xe5\x93\x81\xe4\xbf\xa1\xe6\x81\xaf\xe6\x9c\x8d\xe5\x8a\xa1\xe8\xb5\x84\xe6\xa0\xbc\xe8\xaf\x81\xe4\xb9\xa6(\xe9\x9d\x9e\xe7\xbb\x8f\xe8\x90\xa5\xe6\x80\xa7)\xef\xbc\x9a(\xe4\xba\xac)-\xe9\x9d\x9e\xe7\xbb\x8f\xe8\x90\xa5\xe6\x80\xa7-2018-0311</span><br>© 2004-2022 Sogou.com / <a href="http://www.12377.cn" class="g" target="_blank">\xe7\xbd\x91\xe4\xb8\x8a\xe6\x9c\x89\xe5\xae\xb3\xe4\xbf\xa1\xe6\x81\xaf\xe4\xb8\xbe\xe6\x8a\xa5\xe4\xb8\x93\xe5\x8c\xba</a> / <span class="g">\xe4\xba\xac\xe7\xbd\x91\xe6\x96\x87(2019)6117-724\xe5\x8f\xb7</span> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">\xe4\xba\xacICP\xe8\xaf\x81050897\xe5\x8f\xb7</a> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">\xe4\xba\xacICP\xe5\xa4\x8711001839\xe5\x8f\xb7-1</a> / <a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">\xe4\xba\xac\xe5\x85\xac\xe7\xbd\x91\xe5\xae\x89\xe5\xa4\x8711000002000025\xe5\x8f\xb7</a></div> <div class="ft-v1" id="QRcode-footer" style="padding-bottom:28px"><div class="ft-info"><a uigs-id="mid_pinyin" href="http://pinyin.sogou.com/" target="_blank"><i class="i1"></i>\xe6\x90\x9c\xe7\x8b\x97\xe8\xbe\x93\xe5\x85\xa5\xe6\xb3\x95</a><span class="line"></span><a uigs-id="mid_liulanqi" href="http://ie.sogou.com/" target="_blank"><i class="i2"></i>\xe6\xb5\x8f\xe8\xa7\x88\xe5\x99\xa8</a><span class="line"></span><a uigs-id="mid_daohang" href="http://123.sogou.com/" target="_blank"><i class="i3"></i>\xe7\xbd\x91\xe5\x9d\x80\xe5\xaf\xbc\xe8\x88\xaa</a><br><a href="http://b.sogou.com/" target="_blank" class="g">\xe4\xbc\x81\xe4\xb8\x9a\xe6\x8e\xa8\xe5\xb9\xbf</a> - <a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" class="g">\xe5\x85\x8d\xe8\xb4\xa3\xe5\xa3\xb0\xe6\x98\x8e</a> - <a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" class="g">\xe6\x84\x8f\xe8\xa7\x81\xe5\x8f\x8d\xe9\xa6\x88\xe5\x8f\x8a\xe6\x8a\x95\xe8\xaf\x89</a> - <a href="http://corp.sogou.com/private.html" target="_blank" class="g" uigs-id="footer_private">\xe9\x9a\x90\xe7\xa7\x81\xe6\x94\xbf\xe7\xad\x96</a><br><span class="g">\xe8\x8d\xaf\xe5\x93\x81\xe5\x8c\xbb\xe7\x96\x97\xe5\x99\xa8\xe6\xa2\xb0\xe7\xbd\x91\xe7\xbb\x9c\xe4\xbf\xa1\xe6\x81\xaf\xe6\x9c\x8d\xe5\x8a\xa1\xe5\xa4\x87\xe6\xa1\x88\xef\xbc\x9a\xef\xbc\x88\xe4\xba\xac\xef\xbc\x89\xe7\xbd\x91\xe8\x8d\xaf\xe6\xa2\xb0\xe4\xbf\xa1\xe6\x81\xaf\xe5\xa4\x87\xe5\xad\x97\xef\xbc\x882021\xef\xbc\x89\xe7\xac\xac00047\xe5\x8f\xb7</span> / <span class="g">\xe4\xba\x92\xe8\x81\x94\xe7\xbd\x91\xe8\x8d\xaf\xe5\x93\x81\xe4\xbf\xa1\xe6\x81\xaf\xe6\x9c\x8d\xe5\x8a\xa1\xe8\xb5\x84\xe6\xa0\xbc\xe8\xaf\x81\xe4\xb9\xa6(\xe9\x9d\x9e\xe7\xbb\x8f\xe8\x90\xa5\xe6\x80\xa7)\xef\xbc\x9a(\xe4\xba\xac)-\xe9\x9d\x9e\xe7\xbb\x8f\xe8\x90\xa5\xe6\x80\xa7-2018-0311</span><br>© 2004-2022 Sogou.com / <a href="http://www.12377.cn" class="g" target="_blank">\xe7\xbd\x91\xe4\xb8\x8a\xe6\x9c\x89\xe5\xae\xb3\xe4\xbf\xa1\xe6\x81\xaf\xe4\xb8\xbe\xe6\x8a\xa5\xe4\xb8\x93\xe5\x8c\xba</a> / <span class="g">\xe4\xba\xac\xe7\xbd\x91\xe6\x96\x87(2019)6117-724\xe5\x8f\xb7</span> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">\xe4\xba\xacICP\xe8\xaf\x81050897\xe5\x8f\xb7</a> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">\xe4\xba\xacICP\xe5\xa4\x8711001839\xe5\x8f\xb7-1</a> / <a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">\xe4\xba\xac\xe5\x85\xac\xe7\xbd\x91\xe5\xae\x89\xe5\xa4\x8711000002000025\xe5\x8f\xb7</a></div> <div class="fit-older"></div> </div> <div class="kuozhan" id="QRcode-box" style="display:none"><a href="javascript:void(0);" id="miniQRcode"></a><span id="QRcode"></span></div><a href="javascript:void(0);" class="back-top" id="back-top"></a></div> <script>var SugPara, uigs_para, msBrowserName = navigator.userAgent.toLowerCase(),msIsSe = false,msIsMSearch = false, hasDoodle = false, queryinput = document.getElementById(\'query\');</script><script>/*file=static/js/indexjs.js*/function indexjsInit(e,o,n,t,s,u,i){var r={puid:t,cards:s,cards_sw:u,uigs_cookie:"SUID,sct,SUV"};function c(){try{window.external.metasearch("make_connection","www.google.com.hk")}catch(e){}}uigs_para={uigs_productid:"webapp",type:"webindex_new",stype:e?"login":"nologin",scrnwi:screen.width,scrnhi:screen.height,uigs_pbtag:"A",uigs_cookie:"SUID,sct",protocol:"https:"==location.protocol.toLowerCase()?"https":"http"},e&&(uigs_para=Object.assign(uigs_para,r)),window.loginCardConfig={},SugPara={queryboxid:"search-box",enableSug:!0,sugType:"web",domain:"w.sugg.sogou.com",productId:"web",sugFormName:"sf",inputid:"query",submitId:"stb",suggestRid:"01015002",normalRid:"01019900",useParent:1,sugglocation:"index",showVr:!0,showHotwords:!0,suggAbtestObject:o},/se 2\\.x/i.test(msBrowserName)&&(msIsSe=!0),/metasr/i.test(msBrowserName)&&(msIsMSearch=!0),queryinput&&msIsSe&&msIsMSearch&&(queryinput.addEventListener?(queryinput.addEventListener("keypress",c,!1),queryinput.addEventListener("keydown",c,!1)):queryinput.attachEvent?(queryinput.attachEvent("onkeypress",c),queryinput.attachEvent("onkeydown",c)):(queryinput.onkeypress=c,queryinput.onkeydown=c)),window.m_s_index=function(){var e=document.sf.query,o=Math.round(1e3*((new Date).getTime()+Math.random()));e.focus(),new RegExp("kw=([^&]+)").test(location.search)&&0==e.value.length&&(e.value=decodeURIComponent(RegExp.$1)),document.cookie.indexOf("SUV=")<0&&(document.cookie="SUV="+o+";path=/;expires=Sun, 29 July 2026 00:00:00 UTC;domain="+function(){var e=document.domain;return e.indexOf("sogou.com")==e.length-9?".sogou.com":e.indexOf("soso.com")==e.length-8?".soso.com":-1!=e.indexOf("sogo.com")?".sogo.com":void 0}()),n&&((new Image).src="//pb6.sogou.com/v6")},window.st=function(e,o,n,t){var s=document.sf.query,u=encodeURIComponent(s.value),i={news:"http://news.sogou.com/news?ie=utf8&query=",web:"web?ie=utf8&query=",weixin:"http://weixin.sogou.com/weixin?type=2&ie=utf8&query=",zhihu:"http://zhihu.sogou.com/zhihu?ie=utf8&query=",pic:"http://pic.sogou.com/pics?ie=utf8&query=",video:"https://v.sogou.com/v?ie=utf8&query=",myingyi:"https://www.sogou.com/web?m2web=mingyi.sogou.com&ie=utf8&query=",overseas:"http://english.sogou.com?b_o_e=1&ie=utf8&fr=pcweb_index_nav&query=",scholar:"http://scholar.sogou.com?ie=utf8&fr=common_index_nav&query=",fanyi:"http://fanyi.sogou.com/?fr=common_index_nav_pc&ie=utf8&keyword=",wenwen:"http://wenwen.sogou.com/s/?ch=websearch&w=",hanyu:"https://hanyu.sogou.com/?query=",science:"https://baike.sogou.com/kexue/home.htm?query="},r=i[n]||e.href;function c(e){return-1<e.indexOf("?")?"&":"?"}s&&""!==s.value&&(["hanyu"].includes(n)?r=r.match(/.*(?=\\?query\\=)/)[0]+{hanyu:{index:"",result:"result"}}[n].result+"?query="+u:i[n]?r=i[n]+u:0<r.indexOf("kw=")?r=r.replace(new RegExp("kw=[^&$]*"),"kw="+u):r+=c(r)+"kw="+u),o&&(r+=c(r)+"p="+o),t&&0<t.length&&(r+="#"+t),!s||""!=s.value||"wenwen"!=n&&"science"!=n||(r=e.href),e.href=r},window.cid=function(e,o){var n=document.sf.query,t=encodeURIComponent(n.value);t?"web2ww"===o?e.href+="s/?cid=web2ww&w="+t:"web2bk"===o&&(e.href+="Search.e?sp=S"+t+"&cid=web2bk"):e.href+="?cid="+o},window.m_s_index()}indexjsInit(false, {"suggestHistoryStrategy1":"","suggestHistoryStrategy2":"0|1|2|3|4|5|6|7|8","suggHistoryAbtest":""}, true, \'invaliduser\', \'\', \'\');</script><script src="//dlweb.sogoucdn.com/pcsearch/web/index/js/suggbase_b9937f7.js"></script> <script src="//dlweb.sogoucdn.com/pcsearch/js/common/widget/index_login_b1cc5cb.js"></script><script src="//account.sogou.com/static/api/passport-async.js"></script> <script src="//dlweb.sogoucdn.com/pcsearch/web/index/js/searchbase_453304b.js"></script> <script defer="defer" async type="text/javascript" src="//dlweb.sogoucdn.com/barrier_free/pc/wzaV15/aria.js?appid=c4d5562ec7daa12a5a351cbe1a292da1" charset="utf-8"></script></body></html><!--zly-->'
响应文本为: <!DOCTYPE html><html lang="cn"><head><meta name="viewport" content="width=device-width,minimum-scale=1,maximum-scale=1,user-scalable=no"><script>window._speedMark = new Date(); window.lead_ip = '123.147.244.130';
window.now = 1653966907968;</script><script type="text/javascript">/*file=static/js/resourceErrorReport.js*/!function(a){var n=(new Date).getTime(),r=a.location.protocol;function c(e,t){var o=(new Date).getTime()-n;(new Image).src=["//pb.sogou.com/pv.gif?uigs_productid=wapapp&type=resource-error&stype=",e,"×tamp=",o,"&protocol=",r,"&host=",encodeURIComponent(a.location.host),"&path=",encodeURIComponent(a.location.pathname),"&resource=",encodeURIComponent(t)].join("")}function e(e){if((e=e||a.event)&&"error"===e.type){var t=e.srcElement?e.srcElement:e.target;if(t){var o,n,r=t.tagName;"LINK"===r?(n="css",(o=t.getAttribute("href"))&&o.match(/\.css($|\?)/)&&c(n,o)):"SCRIPT"===r&&(n="js",(o=t.getAttribute("src"))&&o.match(/\.js($|\?)/)&&c(n,o))}}}r&&(r=r.substring(0,r.length-1)),a.addEventListener?a.addEventListener("error",e,!0):a.attachEvent&&a.attachEvent("onerror",e)}(window);</script><meta charset="utf-8"><link rel="dns-prefetch" href="//img01.sogoucdn.com"><link rel="dns-prefetch" href="//img02.sogoucdn.com"><link rel="dns-prefetch" href="//img03.sogoucdn.com"><link rel="dns-prefetch" href="//img04.sogoucdn.com"><link rel="dns-prefetch" href="//dlweb.sogoucdn.com"><title>搜狗搜索引擎 - 上网从搜狗开始</title><link rel="shortcut icon" href="/images/logo/new/favicon.ico?v=4" type="image/x-icon"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="搜狗搜索"><meta name="keywords" content="搜狗搜索,网页搜索,微信搜索,视频搜索,图片搜索,音乐搜索,新闻搜索,软件搜索,问答搜索,百科搜索,购物搜索"><meta name="description" content="搜狗搜索是全球第三代互动式搜索引擎,支持微信公众号和文章搜索、知乎搜索、英文搜索及翻译等,通过自主研发的人工智能算法为用户提供专业、精准、便捷的搜索服务。"><link rel="stylesheet" type="text/css" href="//dlweb.sogoucdn.com/pcsearch/web/index/css/index_style_39e6e10.css"><style>.wrapper .suggestion{border:1px solid #e8e8e8;width:653px;-moz-box-shadow:0 1px 8px rgba(0,0,0,.1);-webkit-box-shadow:0 1px 8px rgba(0,0,0,.1);box-shadow:0 1px 8px rgba(0,0,0,.1);border-top-left-radius:0;border-top-right-radius:0;border-bottom-right-radius:2px;border-bottom-left-radius:2px;top:43px}.wrapper .suglist{width:206px}.wrapper .suglist .keyword{color:#7a77c8}.big-scn .suggestion{width:820px}.big-scn .suglist{width:236px}.wrapper .suglist{padding:4px 0}input[type=text]::-ms-clear{display:none}</style><!-- indexSnippetToHeader start --> <!-- indexSnippetToHeader end --></head><body color-style="white"><div class="wrapper " id="wrap"><div class="header"> <div class="top-nav"><ul><li class="cur"><span>网页</span></li><li><a onclick="st(this,'73141200','weixin')" href="http://weixin.sogou.com/" uigs-id="nav_weixin" id="weixinch">微信</a></li><li><a onclick="st(this,'40051200','zhihu')" href="http://zhihu.sogou.com/" uigs-id="nav_zhihu" id="zhihu">知乎</a></li><li><a onclick="st(this,'40030500','pic')" href="http://pic.sogou.com" uigs-id="nav_pic" id="pic">图片</a></li><li><a onclick="st(this,'40030600','video')" href="https://v.sogou.com/" uigs-id="nav_v" id="video">视频</a></li><li><a href="http://mingyi.sogou.com?fr=common_index_nav" uigs-id="nav_mingyi" id="mingyi" onclick="st(this,'','myingyi')">医疗</a></li><li><a href="http://hanyu.sogou.com?fr=pcweb_index_nav" uigs-id="nav_hanyu" id="hanyu" onclick="st(this,'','hanyu')">汉语</a></li><li><a href="http://fanyi.sogou.com?fr=common_index_nav_pc" uigs-id="nav_fanyi" id="fanyi" onclick="st(this,'','fanyi')">翻译</a></li><li><a onclick="st(this,'web2ww','wenwen')" href="https://wenwen.sogou.com/?ch=websearch" uigs-id="nav_wenwen" id="index_more_wenwen">问问</a></li><li><a onclick="st(this,'web2ww','baike')" href="http://baike.sogou.com/Home.v" uigs-id="nav_baike" id="index_baike">百科</a></li><li><a onclick="st(this,'40031000')" href="http://map.sogou.com" uigs-id="nav_map" id="map">地图</a></li><li class="show-more"><a href="javascript:void(0);" id="more-product">更多<i class="m-arr"></i></a><div class="pos-more" id="products-box" style="top:40px"><span class="ico-san"></span><a onclick="st(this,'40031500')" href="http://gouwu.sogou.com/" uigs-id="nav_gouwu" id="index_more_gouwu">购物</a><a onclick="st(this)" href="http://zhishi.sogou.com" uigs-id="nav_zhishi" id="index_more_zhishi">知识</a><a onclick="st(this,'40051205')" href="http://as.sogou.com/" uigs-id="nav_app" id="index_more_appli">应用</a><a href="https://baike.sogou.com/kexue/home.htm" uigs-id="nav_science" id="science">科学</a><span class="all"><a onclick="st(this,'40051206')" href="http://www.sogou.com/docs/more.htm?v=1" uigs-id="nav_all" target="_blank">全部</a></span></div></li></ul></div><div class="user-box"> <a href="javascript:void(0)" id="cniil_wza" style="float:left;text-decoration:none;color:#000;opacity:.75;padding-right:20px;margin-right:20px;border-right:1px solid #e7e7e7;line-height:14px;position:relative;top:5px">无障碍</a> <div class="local-weather" id="local-weather"><div class="wea-box" id="cur-weather" style="display:none"></div> <div class="pos-more" id="detail-weather" style="top:40px;left:-80px"></div> </div><span class="line" id="user-box-line" style="display:none"></span><div class="user-enter"> <a href="javascript:void(0);" class="enter" id="loginBtn">登录</a> </div></div></div><div class="content" id="content"><div class="pos-header" id="top-float-bar"><div class="part-one"></div><div class="part-two" id="card-tab-layer"><div class="c-top" id="top-card-tab"></div></div></div><div class="logo2" id="logo-s"><span></span></div><div class="logo" id="logo-l"><span></span></div> <div class="search-box querybox-focus" id="search-box"><form action="/web" name="sf" id="sf"><span class="sec-input-box"><input type="text" class="sec-input active" name="query" id="query" maxlength="100" len="80" autocomplete="off"></span><span class="enter-input"><input type="submit" value="搜狗搜索" id="stb"></span><input type="hidden" name="_asf" value="www.sogou.com"> <input type="hidden" name="_ast"> <input type="hidden" name="w" value="01019900"> <input type="hidden" name="p" value="40040100"> <input type="hidden" name="ie" value="utf8"> <input type="hidden" name="from" value="index-nologin"> <input type="hidden" name="s_from" value="index"><div class="keywords-tips" id="keywordsTips" style="display:none"><i></i><p>“<strong id="keywordsTipsStrong">369</strong>”后面的文字被忽略,搜狗的查询限制在40个汉字以内。</p></div></form></div> </div><div class="card-box" id="card-box" style="display:none"><div class="card-box2" id="card-box2"><div class="c-top" id="card-tab-box"><a href="javascript:void(0);" uigs-id="settings_close-card" id="close-card" class="shezhi"></a></div><div class="c-main" id="card-content"></div></div></div><div class="loog-more" id="scroll-more" style="display:none"><a href="javascript:void(0);" uigs-id="scroll-more">滚动查看更多<br><span class="ico_san"></span></a></div><div class="ft" id="footer" style="display:none" ><a href="http://b.sogou.com/" target="_blank" uigs-id="footer_tuiguang">企业推广</a><span class="line"></span><a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" uigs-id="footer_disclaimer">免责声明</a><span class="line"></span><a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" uigs-id="footer_feedback">意见反馈及投诉</a><span class="line"></span><a href="http://corp.sogou.com/private.html" target="_blank" uigs-id="footer_private">隐私政策</a><br><span class="g">药品医疗器械网络信息服务备案:(京)网药械信息备字(2021)第00047号</span> / <span class="g">互联网药品信息服务资格证书(非经营性):(京)-非经营性-2018-0311</span><br>© 2004-2022 Sogou.com / <a href="http://www.12377.cn" class="g" target="_blank">网上有害信息举报专区</a> / <span class="g">京网文(2019)6117-724号</span> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP证050897号</a> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP备11001839号-1</a> / <a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a></div> <div class="ft-v1" id="QRcode-footer" style="padding-bottom:28px"><div class="ft-info"><a uigs-id="mid_pinyin" href="http://pinyin.sogou.com/" target="_blank"><i class="i1"></i>搜狗输入法</a><span class="line"></span><a uigs-id="mid_liulanqi" href="http://ie.sogou.com/" target="_blank"><i class="i2"></i>浏览器</a><span class="line"></span><a uigs-id="mid_daohang" href="http://123.sogou.com/" target="_blank"><i class="i3"></i>网址导航</a><br><a href="http://b.sogou.com/" target="_blank" class="g">企业推广</a> - <a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" class="g">免责声明</a> - <a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" class="g">意见反馈及投诉</a> - <a href="http://corp.sogou.com/private.html" target="_blank" class="g" uigs-id="footer_private">隐私政策</a><br><span class="g">药品医疗器械网络信息服务备案:(京)网药械信息备字(2021)第00047号</span> / <span class="g">互联网药品信息服务资格证书(非经营性):(京)-非经营性-2018-0311</span><br>© 2004-2022 Sogou.com / <a href="http://www.12377.cn" class="g" target="_blank">网上有害信息举报专区</a> / <span class="g">京网文(2019)6117-724号</span> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP证050897号</a> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP备11001839号-1</a> / <a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a></div> <div class="fit-older"></div> </div> <div class="kuozhan" id="QRcode-box" style="display:none"><a href="javascript:void(0);" id="miniQRcode"></a><span id="QRcode"></span></div><a href="javascript:void(0);" class="back-top" id="back-top"></a></div> <script>var SugPara, uigs_para, msBrowserName = navigator.userAgent.toLowerCase(),msIsSe = false,msIsMSearch = false, hasDoodle = false, queryinput = document.getElementById('query');</script><script>/*file=static/js/indexjs.js*/function indexjsInit(e,o,n,t,s,u,i){var r={puid:t,cards:s,cards_sw:u,uigs_cookie:"SUID,sct,SUV"};function c(){try{window.external.metasearch("make_connection","www.google.com.hk")}catch(e){}}uigs_para={uigs_productid:"webapp",type:"webindex_new",stype:e?"login":"nologin",scrnwi:screen.width,scrnhi:screen.height,uigs_pbtag:"A",uigs_cookie:"SUID,sct",protocol:"https:"==location.protocol.toLowerCase()?"https":"http"},e&&(uigs_para=Object.assign(uigs_para,r)),window.loginCardConfig={},SugPara={queryboxid:"search-box",enableSug:!0,sugType:"web",domain:"w.sugg.sogou.com",productId:"web",sugFormName:"sf",inputid:"query",submitId:"stb",suggestRid:"01015002",normalRid:"01019900",useParent:1,sugglocation:"index",showVr:!0,showHotwords:!0,suggAbtestObject:o},/se 2\.x/i.test(msBrowserName)&&(msIsSe=!0),/metasr/i.test(msBrowserName)&&(msIsMSearch=!0),queryinput&&msIsSe&&msIsMSearch&&(queryinput.addEventListener?(queryinput.addEventListener("keypress",c,!1),queryinput.addEventListener("keydown",c,!1)):queryinput.attachEvent?(queryinput.attachEvent("onkeypress",c),queryinput.attachEvent("onkeydown",c)):(queryinput.onkeypress=c,queryinput.onkeydown=c)),window.m_s_index=function(){var e=document.sf.query,o=Math.round(1e3*((new Date).getTime()+Math.random()));e.focus(),new RegExp("kw=([^&]+)").test(location.search)&&0==e.value.length&&(e.value=decodeURIComponent(RegExp.$1)),document.cookie.indexOf("SUV=")<0&&(document.cookie="SUV="+o+";path=/;expires=Sun, 29 July 2026 00:00:00 UTC;domain="+function(){var e=document.domain;return e.indexOf("sogou.com")==e.length-9?".sogou.com":e.indexOf("soso.com")==e.length-8?".soso.com":-1!=e.indexOf("sogo.com")?".sogo.com":void 0}()),n&&((new Image).src="//pb6.sogou.com/v6")},window.st=function(e,o,n,t){var s=document.sf.query,u=encodeURIComponent(s.value),i={news:"http://news.sogou.com/news?ie=utf8&query=",web:"web?ie=utf8&query=",weixin:"http://weixin.sogou.com/weixin?type=2&ie=utf8&query=",zhihu:"http://zhihu.sogou.com/zhihu?ie=utf8&query=",pic:"http://pic.sogou.com/pics?ie=utf8&query=",video:"https://v.sogou.com/v?ie=utf8&query=",myingyi:"https://www.sogou.com/web?m2web=mingyi.sogou.com&ie=utf8&query=",overseas:"http://english.sogou.com?b_o_e=1&ie=utf8&fr=pcweb_index_nav&query=",scholar:"http://scholar.sogou.com?ie=utf8&fr=common_index_nav&query=",fanyi:"http://fanyi.sogou.com/?fr=common_index_nav_pc&ie=utf8&keyword=",wenwen:"http://wenwen.sogou.com/s/?ch=websearch&w=",hanyu:"https://hanyu.sogou.com/?query=",science:"https://baike.sogou.com/kexue/home.htm?query="},r=i[n]||e.href;function c(e){return-1<e.indexOf("?")?"&":"?"}s&&""!==s.value&&(["hanyu"].includes(n)?r=r.match(/.*(?=\?query\=)/)[0]+{hanyu:{index:"",result:"result"}}[n].result+"?query="+u:i[n]?r=i[n]+u:0<r.indexOf("kw=")?r=r.replace(new RegExp("kw=[^&$]*"),"kw="+u):r+=c(r)+"kw="+u),o&&(r+=c(r)+"p="+o),t&&0<t.length&&(r+="#"+t),!s||""!=s.value||"wenwen"!=n&&"science"!=n||(r=e.href),e.href=r},window.cid=function(e,o){var n=document.sf.query,t=encodeURIComponent(n.value);t?"web2ww"===o?e.href+="s/?cid=web2ww&w="+t:"web2bk"===o&&(e.href+="Search.e?sp=S"+t+"&cid=web2bk"):e.href+="?cid="+o},window.m_s_index()}indexjsInit(false, {"suggestHistoryStrategy1":"","suggestHistoryStrategy2":"0|1|2|3|4|5|6|7|8","suggHistoryAbtest":""}, true, 'invaliduser', '', '');</script><script src="//dlweb.sogoucdn.com/pcsearch/web/index/js/suggbase_b9937f7.js"></script> <script src="//dlweb.sogoucdn.com/pcsearch/js/common/widget/index_login_b1cc5cb.js"></script><script src="//account.sogou.com/static/api/passport-async.js"></script> <script src="//dlweb.sogoucdn.com/pcsearch/web/index/js/searchbase_453304b.js"></script> <script defer="defer" async type="text/javascript" src="//dlweb.sogoucdn.com/barrier_free/pc/wzaV15/aria.js?appid=c4d5562ec7daa12a5a351cbe1a292da1" charset="utf-8"></script></body></html><!--zly-->
请求头为: {'Server': 'nginx', 'Date': 'Tue, 31 May 2022 03:15:08 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Set-Cookie': 'ABTEST=7|1653966908|v17; expires=Thu, 30-Jun-22 03:15:08 GMT; path=/, IPLOC=CN5000; expires=Wed, 31-May-23 03:15:08 GMT; domain=.sogou.com; path=/, SUID=82F4937B364A910A000000006295883C; expires=Mon, 26-May-2042 03:15:08 GMT; domain=.sogou.com; path=/, black_passportid=; path=/; expires=Thu, 01 Jan 1970 00:00:00 GMT; domain=.sogou.com', 'P3P': 'CP="CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR", CP="CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR", CP="CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR"', 'Pragma': 'No-cache', 'Cache-Control': 'max-age=0', 'Expires': 'Tue, 31 May 2022 03:15:08 GMT', 'UUID': 'a09a4fc6-1144-4ddb-a028-df355ff57969', 'Content-Encoding': 'gzip'}
请求方式为: <PreparedRequest [GET]>
编码方式为: utf-8
请求网址url为: https://www.sogou.com/
cookies为: <RequestsCookieJar[<Cookie IPLOC=CN5000 for .sogou.com/>, <Cookie SUID=82F4937B364A910A000000006295883C for .sogou.com/>, <Cookie ABTEST=7|1653966908|v17 for www.sogou.com/>]>
状态码为: 200
响应类型为: <class 'requests.models.Response'>
内容响应类型为: <class 'bytes'>
文本响应类型为: <class 'str'>2.1 获取请求方式
以搜狗为例:
在空白处右键点击“检查”或者按F12键。

将进入以下界面,1点击Network,2刷新,3选中Name下的第一个www.sogou.com。
进入之后,可查看URL,请求方式,请求头等信息。


2.2 添加请求头
添加请求头进行伪装,处理一个小小的反爬。
以搜狗为例,在搜索框中输入“成龙”搜索,按F12键进入以下页面:

获取url和请求方式get ,编写爬虫程序:
未添加请求头信息时:
#未添加请求头时:
import requests
url = "https://www.sogou.com/web?query=%E6%88%90%E9%BE%99&_ast=1653967846&_asf=www.sogou.com&w=01029901&p=40040100&dp=1&cid=&s_from=result_up&sut=674&sst0=1653967851220&lkt=0%2C0%2C0&sugsuv=1653292431916060&sugtime=1653967851220"#f,query,表示用f将变量query塞到url的字符串里
response = requests.get(url)
print(response.text)#拿到源代码运行结果如下:
<!DOCTYPE HTML>
<html>
<head>
<meta charset="utf-8">
<link rel="shortcut icon" href="//www.sogou.com/images/logo/new/favicon.ico?v=4" type="image/x-icon">
<title>搜狗搜索</title>
<link rel="stylesheet" href="static/css/anti.min.css?v=1"/>
<script src="//dlweb.sogoucdn.com/common/lib/jquery/jquery-1.11.0.min.js"></script>
<script src="static/js/antispider.min.js?v=3"></script>
<script>
var domain = getDomain();
window.imgCode = -1;
(function() {
function checkSNUID() {
var cookieArr = document.cookie.split('; '),
count = 0;
for(var i = 0, len = cookieArr.length; i < len; i++) {
if (cookieArr[i].indexOf('SNUID=') > -1) {
count++;
}
}
return count > 1;
}
if(checkSNUID()) {
var date = new Date(), expires;
date.setTime(date.getTime() -100000);
expires = date.toGMTString();
document.cookie = 'SNUID=1;path=/;expires=' + expires;
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.www.sogo.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.weixin.sogo.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.sogo.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.www.sogou.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.weixin.sogou.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.sogou.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.snapshot.sogoucdn.com';
/*document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.zhinan.sogou.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.gouwu.sogou.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.ishop.sogou.com';*/
sendLog('delSNUID');
}
if(getCookie('seccodeRight') === 'success') {
sendLog('verifyLoop');
setCookie('seccodeRight', 1, getUTCString(-1), location.hostname, '/');
}
if(getCookie('refresh')) {
sendLog('refresh');
}
})();
function setImgCode(code) {
try {
var t = new Date().getTime() - imgRequestTime.getTime();
sendLog('imgCost',"cost="+t);
} catch (e) {
}
window.imgCode = code;
}
sendLog('index');
function changeImg2() {
if(window.event) {
window.event.returnValue=false
}
}
var suuid = "9321d62d-f547-4a1e-a150-c9527e7c82de";var auuid = "c918ed45-9536-4d74-b27b-6ca583856d4a"; </script>
</head>
<body>
<div class="header">
<div class="logo">
<a href="/">
<img width="180" height="60" src="static/images/logo_180x60.png" srcset="static/images/[email protected] 2x">
</a>
</div>
<div class="other"><span class="s1">您的访问出错了</span><span class="s2"><a href="/">返回首页>></a></span></div>
</div>
<div class="content-box">
<p class="ip-time-p">IP:123.147.244.130<br>访问时间:2022.05.31 14:37:58<br>SourceVerifyCode:c9527e7c82de<br>From:www.sogou.com</p>
<p class="p2">用户您好,我们的系统检测到您网络中存在异常访问请求。<br>此验证码用于确认这些请求是您的正常行为而不是自动程序发出的,需要您协助验证。</p>
<p class="p3"><label for="seccodeInput">验证码:</label></p>
<form name="authform" method="POST" id="seccodeForm" action="/">
<p class="p4">
<input type=text name="c" value="" placeholder="请输入验证码" id="seccodeInput" autocomplete="off">
<input type="hidden" name="tc" id="tc" value="">
<input type="hidden" name="r" id="from" value="%2Fweb%3Fquery%3D%E6%88%90%E9%BE%99%26_ast%3D1653967846%26_asf%3Dwww.sogou.com%26w%3D01029901%26p%3D40040100%26dp%3D1%26cid%3D%26s_from%3Dresult_up%26sut%3D674%26sst0%3D1653967851220%26lkt%3D0%2C0%2C0%26sugsuv%3D1653292431916060%26sugtime%3D1653967851220" >
<input type="hidden" name="p" id="product" value="web_gd" >
<input type="hidden" name="m" value="f9ab5bf7a9587003b95025fada8f5ce5" > <span class="s1">
<script>imgRequestTime=new Date();</script>
<a onclick="changeImg2();" href="javascript:void(0)">
<img id="seccodeImage" onload="setImgCode(1)" onerror="setImgCode(0)" src="util/seccode.php?tc=1653979078" width="100" height="40" alt="请输入图中的验证码" title="请输入图中的验证码">
</a>
</span>
<a href="javascript:void(0);" id="change-img" onclick="changeImg2();" style="padding-left:50px;">换一张</a>
<span class="s2" id="error-tips" style="display: none;"></span>
</p>
</form>
<p class="p5">
<a href="javascript:void(0);" id="submit">提交</a>
<span>提交后没解决问题?欢迎<a href="http://fankui.help.sogou.com/index.php/web/web/index?type=10&anti_time=1653979078&domain=www.sogou.com" target="_blank">反馈</a>。</span>
<!--span>提交后没解决问题?欢迎<a href="http://fankui.help.sogou.com/index.php/web/web/index?type=10&anti_time=1653979078&domain=www.sogou.com&verifycode=c9527e7c82de" target="_blank">反馈</a>。</span-->
</p>
</div>
<div id="ft"><a href="http://fuwu.sogou.com/" target="_blank">企业推广</a><a href="http://corp.sogou.com/" target="_blank">关于搜狗</a><a href="/docs/terms.htm?v=1" target="_blank">免责声明</a><a href="http://fankui.help.sogou.com/index.php/web/web/index?type=10&anti_time=1653979078&domain=www.sogou.com" target="_blank">意见反馈</a><br> © 2022<span id="footer-year"></span> Sogou Inc. - <a href="http://www.miibeian.gov.cn" target="_blank" class="g">京ICP证050897号</a> - 京公网安备1100<span class="ba">00000025号</span></div>
<script src="static/js/index.min.js?v=0.1.5"></script>
</body>
</html>
<!--zly-->很明显有问题,没有出现“成龙”的有关信息。
解决方法:
1.获取请求头:

2.添加请求头:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.41 Safari/537.36'
}#请求头,伪装
url = "https://www.sogou.com/web?query=%E6%88%90%E9%BE%99&_ast=1653967846&_asf=www.sogou.com&w=01029901&p=40040100&dp=1&cid=&s_from=result_up&sut=674&sst0=1653967851220&lkt=0%2C0%2C0&sugsuv=1653292431916060&sugtime=1653967851220"#f,query,表示用f将变量query塞到url的字符串里
response = requests.get(url=url,headers=headers)
print(response.text)#拿到源代码运行结果如下:
篇幅过长,放上结果截图查看,请谅解:


成功拿到源代码。
不过我们会发现此时的url太长了,怎么处理呢?
#原网址:
url ='https://www.sogou.com/web?query=%E6%88%90%E9%BE%99&_ast=1653985908&_asf=www.sogou.com&w=01029901&p=40040108&dp=1&cid=&s_from=result_up&sut=916&sst0=1653985949798&lkt=0%2C0%2C0&sugsuv=1653292431916060&sugtime=1653985949798'
#处理后的网址:
url ='https://www.sogou.com/web?query=%E6%88%90%E9%BE%99'
#或者是:
url = 'https://www.sogou.com/web?query=成龙'在对网址进行删减后回车发现进入的是同一个界面,因此我们就得到了精简版的网址。

如果我还想搜索其他人怎么办呢?
我们先看一下原网页有什么特点:

原来我们得到的响应内容是由“query:成龙”控制的,于是更改代码如下:
import requests
query = input('输入一个明星的名字:')
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.41 Safari/537.36'
}#请求头,伪装
url = f"https://www.sogou.com/web?query={query}"#f,query,表示用f将变量query塞到url的字符串里
response = requests.get(url=url,headers=headers)#处理一个小小的反爬
print(response)
print(response.text)#拿到源代码运行结果如下:

成功了。
3. 尾末福利:抓取精美图片
实战演练:
首先,获取图片下载地址:

没有其次,最后:
import requests
url = 'https://i01piccdn.sogoucdn.com/a2df911ea958c157'
response = requests.get(url)
with open('liuyifei.jpg', 'wb') as f: #在当前路径下创建liuyifei.jpg文件并打开作为f文件
f.write(response.content)
print("下载成功!")
结果如下:

边栏推荐
- How to write a JMeter script common to the test team
- MySQL 8.0 common (continuous update)
- 迟来的2021年终总结
- HJS-DE1/2时间继电器
- Crmeb Standard Edition window+phpstudy8 installation tutorial (III)
- 流畅到让人头皮发麻的单商户商城,你用过吗?
- .net core 2.2 版本跨域配置
- Qt刷新UI界面问题
- Have you ever used the single merchant mall, which is smooth enough to make people feel numb?
- About the reptile thing
猜你喜欢

有奖活动分享:使用WordPress搭建一个专属自己的博客后最高可领取iPhone13

一篇文章了解RSocket协议

机器学习的3大“疑难杂症”,因果学习是突破口 | 重庆大学刘礼

In 2022, the average salary of global programmers was released, and China ranked unexpectedly

Easy start, swagger

NFTScan 与 NFTPlay 在 NFT 数据领域达成战略合作

如何获取及嵌入Go二进制执行包信息

Grpc protocol buffer

最小堆提升每次排序的效率

Hjs-de1/2 time relay
随机推荐
Dj-131/60c voltage relay
【通道注意力机制】SENet
MPLS LDP的原理与配置
Opencv - draw mask images of multiple instances
sql 开发篇一 之 表锁查询及解锁
4.8 HD-GR GNSS导航软件源码
Celery related
Solve the problem of pycharm using PowerShell
Svg verification code recognition experience
MIT指出公开预训练模型不能乱用
crmeb v4.3部署流程
手把手带你编写一个规范的字符设备驱动
Shellcode writing (unfinished)
知识付费开源系统
使用Mock技术帮助提升测试效率的小tips,你知道几个?
Heuristic merging simple problem on tree
Explain the difference set, intersection set and union set of complex type set in detail.Net
根据输入target,返回数组的两个下标。
21、电文处理任务定义
关闭独立窗口对其他窗口同时关闭的问题