当前位置:网站首页>English translation is too difficult? I wrote two translation scripts with crawler in a rage
English translation is too difficult? I wrote two translation scripts with crawler in a rage
2022-07-07 07:23:00 【Hall owner a Niu】
Personal profile
- Author's brief introduction : Hello everyone , I'm Daniel
- Personal home page : Hall owner a Niu
- Stand by me : Like collection ️ Leaving a message.
- Series column :python Web crawler
- Maxim : So far, all life is about failure , But it doesn't prevent me from moving forward !

Here's the catalog title
Preface
Here it comes ! Here it comes ! As a programmer , English sentences cannot be translated , I can't bear it , The script must be scheduled !!!
Baidu translation ( Simple )
analysis
Enter Baidu translation ,F12 Enter all of the network , When you write what you want to translate , You can see in all of the network sug This link , Our interface is ours url, Parameter is kw.

Code
import requests
post_url='https://fanyi.baidu.com/sug'
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}
word = input(' Please enter the... You want to translate , It can be used in various languages :')
data = {
'kw': word
}
response = requests.post(url=post_url,data=data,headers=headers)
dic_obj = response.json() # take json Data conversion to dictionary
print(dic_obj['data'][0]['v'])
result


Youdao translation version ( difficult )
analysis (js reverse )
F12 Go into developer mode , In the network xhr( look for ajax Place of request ) Find the interface shown in the figure below .
Then we look at the parameters :

The comparison between the two figures shows that ,i It should be the sentence we want to translate , The green line is the parameters of different forms , We need to deal with it ,Its A see be 13 Bit time stamp ,salt It means salt in English , And better than timestamp lts More than a , The first 13 are the same , It should be a salt timestamp ( For a string of numbers, you can add a string of numbers or strings and then encrypt , In encryption, we call adding salt ), We can use these two parameters python Separate simulation , In order to avoid unnecessary trouble or some people will not , We found them directly behind js sentence , use python perform js Just generate it .
And here it is sign At a glance, there is 32 position , It should be generated by some encryption algorithm , The most common is md5 and rsa Encrypted , Let's do a global search js reverse :

After searching , We found an old friend md5 encryption , The generation method of parameters is also found , In the figure js Inside r It's a time stamp ,js Inside i It's the salt timestamp ,sign Yes, it is md5 Encrypted string in parentheses , And analyze e The birth of , You can find out through break point debugging .
You can see e Is what we want to translate , Now the parameters are obvious , Our simplest call is actually python Medium hashlib Module md5 The encryption algorithm can get sign, But here we don't have to , Increase the difficulty , practice js reverse . I directly extracted md5 cryptographic js Put the files in the network disk , You can extract it yourself , Use... In the code .
link :https://pan.baidu.com/s/1aV1tEo35Oyw4TUExhJoXUA
Extraction code :waan
meanwhile , In order to deal with reverse climbing , Not just User-Agent, Plus Cookie and Referer.
Code
import requests
import execjs # perform js Module of statement
import json
import jsonpath
class Youdao():
def __init__(self,msg):
# url
self.url = 'https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
# headers
self.headers = {
'User-Agent': 'Mozilla / 5.0(Windows NT 10.0;WOW64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 91.0.4472.124Safari / 537.',
'Cookie': 'OUTFOX_SEARCH_USER_ID = [email protected];OUTFOX_SEARCH_USER_ID_NCOO = 39238000.072458096;JSESSIONID = aaak-QLUNaabh_wFWK8Qx;___rl__test__cookies = 1626662199192',
'Referer': 'https://fanyi.youdao.com/'
}
self.msg = msg
self.Formdata = None
def js_Formdata(self):
# Time stamp
r = execjs.eval('"" + (new Date).getTime()')
# Time stamp, salt
i = r + str(execjs.eval('parseInt(10 * Math.random(), 10)'))
ctx = execjs.compile(open('./youdao.js', 'r', encoding='utf-8').read())
sign = ctx.call('getsign', self.msg,i) # call youdao.js Inside getsign function , Pass in the things to be translated and the salt time stamp .
self.Formdata = {
'i': self.msg,
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': 'fanyideskweb',
'salt': i,
'sign': sign,
'lts': r,
'bv': 'f46e446c6db49492797b7d03ea1e82da',
'doctype': 'json',
'version': '2.1',
'keyfrom': 'fanyi.web',
'action': 'FY_BY_REALTlME',
}
def response(self):
resp = requests.post(url=self.url,data=self.Formdata,headers=self.headers).text
data = json.loads(resp) # take json Convert data into a dictionary
# utilize jsonpath Extract the data
if "translateResult" in data:
k = jsonpath.jsonpath(data, '$..translateResult')[0][0][0]['tgt']
print(k)
print(" Other translators :")
if "smartResult" in data:
lst = jsonpath.jsonpath(data, '$..entries')[0]
for k in lst[1:]:
k = k.replace("\r\n", "")
print(k)
def main(self):
#Formdata
self.js_Formdata()
#print(self.Formdata)
# Send a request , Get a response
self.response()
if __name__ == '__main__':
msg = input(' Please enter the word or sentence you want to translate :')
youdao = Youdao(msg)
youdao.main()
result


Conclusion
If you think the blogger's writing is good, give it to the third company !!!
边栏推荐
- Blue Bridge Cup Birthday candles (violence)
- toRefs API 与 toRef Api
- Flexible layout (II)
- Circulating tumor cells - here comes abnova's solution
- Implementation of AVL tree
- Abnova membrane protein lipoprotein technology and category display
- Abnova circulating tumor DNA whole blood isolation, genomic DNA extraction and analysis
- Tujia, muniao, meituan... Home stay summer war will start
- Kuboard无法发送邮件和钉钉告警问题解决
- Tumor immunotherapy research prosci Lag3 antibody solution
猜你喜欢

虚拟机的作用

Master-slave replication principle of MySQL

Advanced level of C language (high level) pointer

普通测试年薪15w,测试开发年薪30w+,二者差距在哪?

Sword finger offer high quality code

Fast quantitative, abbkine protein quantitative kit BCA method is coming!

Abnova membrane protein lipoprotein technology and category display

Bindingexception exception (error reporting) processing

Kuboard can't send email and nail alarm problem is solved

How DHCP router works
随机推荐
freeswitch拨打分机号源代码跟踪
$parent (get parent component) and $root (get root component)
普通测试年薪15w,测试开发年薪30w+,二者差距在哪?
JS small exercise
Graduation design game mall
非父子组件的通信
Implementing data dictionary with JSP custom tag
C language (high-level) data storage + Practice
机器人技术创新与实践旧版本大纲
抽丝剥茧C语言(高阶)指针的进阶
Le Service MySQL manque dans le service informatique
Procedure in PostgreSQL supports transaction syntax (instance & Analysis)
记一个并发规则验证实现
Chinese and English instructions prosci LAG-3 recombinant protein
Project practice five fitting straight lines to obtain the center line
transform-origin属性详解
Sword finger offer high quality code
Wechat applet full stack development practice Chapter 3 Introduction and use of APIs commonly used in wechat applet development -- 3.9 introduction to network interface (IX) extending the request3 met
Flexible layout (II)
Modify the jupyter notebook file path