当前位置:网站首页>English translation is too difficult? I wrote two translation scripts with crawler in a rage
English translation is too difficult? I wrote two translation scripts with crawler in a rage
2022-07-07 07:23:00 【Hall owner a Niu】
Personal profile
- Author's brief introduction : Hello everyone , I'm Daniel
- Personal home page : Hall owner a Niu
- Stand by me : Like collection ️ Leaving a message.
- Series column :python Web crawler
- Maxim : So far, all life is about failure , But it doesn't prevent me from moving forward !

Here's the catalog title
Preface
Here it comes ! Here it comes ! As a programmer , English sentences cannot be translated , I can't bear it , The script must be scheduled !!!
Baidu translation ( Simple )
analysis
Enter Baidu translation ,F12 Enter all of the network , When you write what you want to translate , You can see in all of the network sug This link , Our interface is ours url, Parameter is kw.

Code
import requests
post_url='https://fanyi.baidu.com/sug'
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}
word = input(' Please enter the... You want to translate , It can be used in various languages :')
data = {
'kw': word
}
response = requests.post(url=post_url,data=data,headers=headers)
dic_obj = response.json() # take json Data conversion to dictionary
print(dic_obj['data'][0]['v'])
result


Youdao translation version ( difficult )
analysis (js reverse )
F12 Go into developer mode , In the network xhr( look for ajax Place of request ) Find the interface shown in the figure below .
Then we look at the parameters :

The comparison between the two figures shows that ,i It should be the sentence we want to translate , The green line is the parameters of different forms , We need to deal with it ,Its A see be 13 Bit time stamp ,salt It means salt in English , And better than timestamp lts More than a , The first 13 are the same , It should be a salt timestamp ( For a string of numbers, you can add a string of numbers or strings and then encrypt , In encryption, we call adding salt ), We can use these two parameters python Separate simulation , In order to avoid unnecessary trouble or some people will not , We found them directly behind js sentence , use python perform js Just generate it .
And here it is sign At a glance, there is 32 position , It should be generated by some encryption algorithm , The most common is md5 and rsa Encrypted , Let's do a global search js reverse :

After searching , We found an old friend md5 encryption , The generation method of parameters is also found , In the figure js Inside r It's a time stamp ,js Inside i It's the salt timestamp ,sign Yes, it is md5 Encrypted string in parentheses , And analyze e The birth of , You can find out through break point debugging .
You can see e Is what we want to translate , Now the parameters are obvious , Our simplest call is actually python Medium hashlib Module md5 The encryption algorithm can get sign, But here we don't have to , Increase the difficulty , practice js reverse . I directly extracted md5 cryptographic js Put the files in the network disk , You can extract it yourself , Use... In the code .
link :https://pan.baidu.com/s/1aV1tEo35Oyw4TUExhJoXUA
Extraction code :waan
meanwhile , In order to deal with reverse climbing , Not just User-Agent, Plus Cookie and Referer.
Code
import requests
import execjs # perform js Module of statement
import json
import jsonpath
class Youdao():
def __init__(self,msg):
# url
self.url = 'https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
# headers
self.headers = {
'User-Agent': 'Mozilla / 5.0(Windows NT 10.0;WOW64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 91.0.4472.124Safari / 537.',
'Cookie': 'OUTFOX_SEARCH_USER_ID = [email protected];OUTFOX_SEARCH_USER_ID_NCOO = 39238000.072458096;JSESSIONID = aaak-QLUNaabh_wFWK8Qx;___rl__test__cookies = 1626662199192',
'Referer': 'https://fanyi.youdao.com/'
}
self.msg = msg
self.Formdata = None
def js_Formdata(self):
# Time stamp
r = execjs.eval('"" + (new Date).getTime()')
# Time stamp, salt
i = r + str(execjs.eval('parseInt(10 * Math.random(), 10)'))
ctx = execjs.compile(open('./youdao.js', 'r', encoding='utf-8').read())
sign = ctx.call('getsign', self.msg,i) # call youdao.js Inside getsign function , Pass in the things to be translated and the salt time stamp .
self.Formdata = {
'i': self.msg,
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': 'fanyideskweb',
'salt': i,
'sign': sign,
'lts': r,
'bv': 'f46e446c6db49492797b7d03ea1e82da',
'doctype': 'json',
'version': '2.1',
'keyfrom': 'fanyi.web',
'action': 'FY_BY_REALTlME',
}
def response(self):
resp = requests.post(url=self.url,data=self.Formdata,headers=self.headers).text
data = json.loads(resp) # take json Convert data into a dictionary
# utilize jsonpath Extract the data
if "translateResult" in data:
k = jsonpath.jsonpath(data, '$..translateResult')[0][0][0]['tgt']
print(k)
print(" Other translators :")
if "smartResult" in data:
lst = jsonpath.jsonpath(data, '$..entries')[0]
for k in lst[1:]:
k = k.replace("\r\n", "")
print(k)
def main(self):
#Formdata
self.js_Formdata()
#print(self.Formdata)
# Send a request , Get a response
self.response()
if __name__ == '__main__':
msg = input(' Please enter the word or sentence you want to translate :')
youdao = Youdao(msg)
youdao.main()
result


Conclusion
If you think the blogger's writing is good, give it to the third company !!!
边栏推荐
- Stack Title: nesting depth of valid parentheses
- . Net 5 fluentftp connection FTP failure problem: this operation is only allowed using a successfully authenticated context
- LC interview question 02.07 Linked list intersection & lc142 Circular linked list II
- leetcode 509. Fibonacci number
- C language (high-level) data storage + Practice
- 95后CV工程师晒出工资单,狠补了这个,真香...
- Fullgc problem analysis and solution summary
- 詳解機器翻譯任務中的BLEU
- Model application of time series analysis - stock price prediction
- Abnova membrane protein lipoprotein technology and category display
猜你喜欢

Network foundation - header, encapsulation and unpacking

Graduation design game mall

深度学习花书+机器学习西瓜书电子版我找到了

父组件传递给子组件:Props

Wechat applet full stack development practice Chapter 3 Introduction and use of APIs commonly used in wechat applet development -- 3.10 tabbar component (I) how to open and use the default tabbar comp

Non empty verification of collection in SQL

弹性布局(一)

外包干了四年,废了...

Role of virtual machine

Sword finger offer high quality code
随机推荐
Causes and solutions of oom (memory overflow)
Bindingexception exception (error reporting) processing
详解机器翻译任务中的BLEU
深度学习花书+机器学习西瓜书电子版我找到了
[explanation of JDBC and internal classes]
Communication of components
记一个并发规则验证实现
How DHCP router works
1089: highest order of factorial
Communication between non parent and child components
計算機服務中缺失MySQL服務
Wechat applet full stack development practice Chapter 3 Introduction and use of APIs commonly used in wechat applet development -- 3.9 introduction to network interface (IX) extending the request3 met
Esxi attaching mobile (Mechanical) hard disk detailed tutorial
Docker compose start redis cluster
弹性布局(一)
Explain Bleu in machine translation task in detail
Example of Pushlet using handle of Pushlet
Complete process of MySQL SQL
ViewModelProvider. Of obsolete solution
[semantic segmentation] - multi-scale attention