当前位置:网站首页>Selenium wire obtains Baidu Index
Selenium wire obtains Baidu Index
2022-07-29 00:50:00 【Xiaoming - code entity】
I was in 《 How to use Python Download Baidu Index Data 》 Shared how to use the interface to obtain Baidu Index , But this year Baidu Index has added a new verification method , For example, the following code :
import requests
import json
from datetime import date, timedelta
headers = {
"Connection": "keep-alive",
"Accept": "application/json, text/plain, */*",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://index.baidu.com/v2/main/index.html",
"Accept-Language": "zh-CN,zh;q=0.9",
'Cookie': cookie,
}
words = '[[{"name":"python","wordType":1}],[{"name":"java","wordType":1}]]'
start, end = "2022-06-28", "2022-07-27"
url = f'https://index.baidu.com/api/SearchApi/index?area=0&word={
words}&startDate={
start}&endDate={
end}'
res = requests.get(url, headers=headers)
res.json()
{'status': 10018,
'data': '',
'logid': 2631899650,
'message': ' Hello! , Baidu Index detected suspected xx Access behavior , If you have no similar behavior , It may be because you use the public network or visit too often ,\n You can use email [email protected] Contact us '}
Baidu index did not return data , Instead, prompt access exception access . After simple inspection , Now the request parameters header Added in Cipher-Text Parameters ,JS The reverse boss can directly analyze js So that the parameter can be generated correctly and pass the verification .
But today I will demonstrate a very simple and practical scheme to obtain Baidu Index , Use it directly seleniumwire To get data and decrypt .
About seleniumwire Introduction to , Please refer to my last article :《selenium Connect the agent with seleniumwire Visit developer tools NetWork》
Achieve automatic login Baidu Index
because selenium It's troublesome to log in every time you operate Baidu Index website , We can cache cookie After local file , Every time you restart, you can automatically log in to Baidu .
Save automatically cookie Code :
from selenium import webdriver
import time
browser = webdriver.Chrome()
browser.get("https://index.baidu.com/v2/index.html")
browser.find_element_by_css_selector("span.username-text").click()
print(" Waiting for login ...")
while True:
if browser.find_element_by_css_selector("span.username-text").text != " Sign in ":
break
else:
time.sleep(3)
print(" Logged in , Now save for you cookie...")
with open('cookie.txt', 'w', encoding='u8') as f:
json.dump(browser.get_cookies(), f)
browser.close()
print("cookie Save complete , The browser has automatically exited ...")
After running the above code , It will automatically open the login interface , Wait for manual login , It will save itself cookie Go local and turn off the tour .
Then we visit Baidu Index in the following way , You can log in automatically :
from seleniumwire import webdriver
browser = webdriver.Chrome()
with open('cookie.txt', 'r', encoding='u8') as f:
cookies = json.load(f)
browser.get('https://index.baidu.com/v2/index.html')
for cookie in cookies:
browser.add_cookie(cookie)
browser.get('https://index.baidu.com/v2/index.html')
Reference resources :《 Extract Google viewer Cookie Five realms of 》
Search and get data
Make the browser perform a search for specific keywords , for example Python:
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(browser, 30)
edit = wait.until(EC.presence_of_element_located(
(By.CSS_SELECTOR, "#search-input-form > input.search-input")))
print(" Number of history records before emptying :", len(browser.requests))
del browser.requests # Clear historical data
edit.send_keys(Keys.CONTROL+'a')
edit.send_keys(Keys.DELETE)
edit.send_keys("Python")
submit = browser.find_element_by_css_selector("span.search-input-cancle")
submit.click()
print(" The number of history records after clearing and then executing the search :", len(browser.requests))
Number of history records before emptying : 87
The number of history records after clearing and then executing the search : 3
After performing the search operation , We can get data from the browser cache :
import gzip
import zlib
import brotli
import json
def auto_decompress(res):
content_encoding = res.headers["content-encoding"]
if content_encoding == "gzip":
res.body = gzip.decompress(res.body)
elif content_encoding == "deflate":
res.body = zlib.decompress(res.body)
elif content_encoding == "br":
res.body = brotli.decompress(res.body)
def fetch_data(rule, encoding="u8", is_json=True):
result = ""
for request in reversed(browser.requests):
if rule in request.url:
res = request.response
auto_decompress(res)
result = res.body.decode(encoding)
if is_json:
result = json.loads(result)
return result
def decrypt(ptbk, index_data):
n = len(ptbk)//2
a = dict(zip(ptbk[:n], ptbk[n:]))
return "".join([a[s] for s in index_data])
ptbk = fetch_data("Interface/ptbk")['data']
data = fetch_data("api/SearchApi/index")['data']
for userIndexe in data['userIndexes']:
name = userIndexe['word'][0]['name']
index_data = userIndexe['all']['data']
r = decrypt(ptbk, index_data)
print(name, r)
python 21077,21093,21186,19643,14612,13961,21733,21411,21085,21284,18591,13211,12753,27225,20302,19772,20156,17647,12018,11745,19535,19300,20075,20136,18153,12956,12406,17098,16259,18707
After comparing the results, we can see , Data acquisition is correct . So we can get through seleniumwire Get the data of Baidu Index , If you need to get a specified date range or a specified province , Just pass selenium Simulate manual execution of corresponding query operations , Then get it through the background cache of the browser .
For the analysis of multi client data, please refer to the previous 《 How to use Python Download Baidu Index Data 》 The code in .
边栏推荐
- 我不建议你使用SELECT *
- MySQL sub database and sub table and its smooth expansion scheme
- Outlier detection and open set identification (2)
- 【MySQL 8】Generated Invisible Primary Keys(GIPK)
- Anti shake and throttling
- Outlier detection and open set identification (1)
- 异步模式之工作线程
- Techo Hub 福州站干货来袭|与开发者共话工业智能新技术
- Flash and seven cattle cloud upload pictures
- Techo hub Fuzhou Station dry goods attack | talk with developers about new industrial intelligence technology
猜你喜欢

追踪伦敦银实时行情的方法

NFTScan 与 NFTPlay 在 NFT 数据领域达成战略合作

DRF - web development mode, API interface, API interface testing tool, restful specification, serialization and deserialization, DRF installation and use

Statistical analysis of time series

Teach you how to install latex (nanny level tutorial)

【开发教程11】疯壳·开源蓝牙心率防水运动手环-整机功能代码讲解
![[ESN] learning echo state network](/img/8e/09cc2d2c0e0ee24e9bee13979b03cb.png)
[ESN] learning echo state network

靠云业务独撑收入增长大梁,微软仍然被高估?

Error reporting: Rong Lianyun sends SMS verification code message 500

requestVideoFrameCallback() 简单实例
随机推荐
Isolation level of MySQL, possible problems (dirty reading, unrepeatable reading, phantom reading) and their solutions
Mock.js essay
Talk about seven ways to realize asynchronous programming
1331. Array sequence number conversion: simple simulation question
[untitled]
Dynamic programming problem (VIII)
【愚公系列】2022年07月 Go教学课程 020-Go容器之数组
【开发教程10】疯壳·开源蓝牙心率防水运动手环-蓝牙 BLE 收发
多线程顺序运行的几种方法,面试可以随便问
从零开始实现lmax-Disruptor队列(六)Disruptor 解决伪共享、消费者优雅停止实现原理解析
Relying on cloud business to support revenue growth alone, is Microsoft still overvalued?
手把手教你安装Latex(保姆级教程)
Android必备的面试技能(含面试题和学习资料)
Error reporting: when the browser clicks the modify add button, there is no response and no error reporting. Solution
16. Influence of deviation, variance, regularization and learning curve on the model
[network security] complete the blacklist and whitelist functions of server firewall through iptables and ipset
2022DASCTF7月赋能赛(复现)
Some operations of Ubuntu remote server configuration database (unable to locate package MySQL server, steps of installing mysql, unable to enter password when logging in MySQL)
异步模式之工作线程
110 MySQL interview questions and answers (continuously updated)