当前位置:网站首页>English translation is too difficult? I wrote two translation scripts with crawler in a rage
English translation is too difficult? I wrote two translation scripts with crawler in a rage
2022-07-07 07:23:00 【Hall owner a Niu】
Personal profile
- Author's brief introduction : Hello everyone , I'm Daniel
- Personal home page : Hall owner a Niu
- Stand by me : Like collection ️ Leaving a message.
- Series column :python Web crawler
- Maxim : So far, all life is about failure , But it doesn't prevent me from moving forward !
Here's the catalog title
Preface
Here it comes ! Here it comes ! As a programmer , English sentences cannot be translated , I can't bear it , The script must be scheduled !!!
Baidu translation ( Simple )
analysis
Enter Baidu translation ,F12 Enter all of the network , When you write what you want to translate , You can see in all of the network sug This link , Our interface is ours url, Parameter is kw.
Code
import requests
post_url='https://fanyi.baidu.com/sug'
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}
word = input(' Please enter the... You want to translate , It can be used in various languages :')
data = {
'kw': word
}
response = requests.post(url=post_url,data=data,headers=headers)
dic_obj = response.json() # take json Data conversion to dictionary
print(dic_obj['data'][0]['v'])
result
Youdao translation version ( difficult )
analysis (js reverse )
F12 Go into developer mode , In the network xhr( look for ajax Place of request ) Find the interface shown in the figure below .
Then we look at the parameters :
The comparison between the two figures shows that ,i It should be the sentence we want to translate , The green line is the parameters of different forms , We need to deal with it ,Its A see be 13 Bit time stamp ,salt It means salt in English , And better than timestamp lts More than a , The first 13 are the same , It should be a salt timestamp ( For a string of numbers, you can add a string of numbers or strings and then encrypt , In encryption, we call adding salt ), We can use these two parameters python Separate simulation , In order to avoid unnecessary trouble or some people will not , We found them directly behind js sentence , use python perform js Just generate it .
And here it is sign At a glance, there is 32 position , It should be generated by some encryption algorithm , The most common is md5 and rsa Encrypted , Let's do a global search js reverse :
After searching , We found an old friend md5 encryption , The generation method of parameters is also found , In the figure js Inside r It's a time stamp ,js Inside i It's the salt timestamp ,sign Yes, it is md5 Encrypted string in parentheses , And analyze e The birth of , You can find out through break point debugging .
You can see e Is what we want to translate , Now the parameters are obvious , Our simplest call is actually python Medium hashlib Module md5 The encryption algorithm can get sign, But here we don't have to , Increase the difficulty , practice js reverse . I directly extracted md5 cryptographic js Put the files in the network disk , You can extract it yourself , Use... In the code .
link :https://pan.baidu.com/s/1aV1tEo35Oyw4TUExhJoXUA
Extraction code :waan
meanwhile , In order to deal with reverse climbing , Not just User-Agent, Plus Cookie and Referer.
Code
import requests
import execjs # perform js Module of statement
import json
import jsonpath
class Youdao():
def __init__(self,msg):
# url
self.url = 'https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
# headers
self.headers = {
'User-Agent': 'Mozilla / 5.0(Windows NT 10.0;WOW64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 91.0.4472.124Safari / 537.',
'Cookie': 'OUTFOX_SEARCH_USER_ID = [email protected];OUTFOX_SEARCH_USER_ID_NCOO = 39238000.072458096;JSESSIONID = aaak-QLUNaabh_wFWK8Qx;___rl__test__cookies = 1626662199192',
'Referer': 'https://fanyi.youdao.com/'
}
self.msg = msg
self.Formdata = None
def js_Formdata(self):
# Time stamp
r = execjs.eval('"" + (new Date).getTime()')
# Time stamp, salt
i = r + str(execjs.eval('parseInt(10 * Math.random(), 10)'))
ctx = execjs.compile(open('./youdao.js', 'r', encoding='utf-8').read())
sign = ctx.call('getsign', self.msg,i) # call youdao.js Inside getsign function , Pass in the things to be translated and the salt time stamp .
self.Formdata = {
'i': self.msg,
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': 'fanyideskweb',
'salt': i,
'sign': sign,
'lts': r,
'bv': 'f46e446c6db49492797b7d03ea1e82da',
'doctype': 'json',
'version': '2.1',
'keyfrom': 'fanyi.web',
'action': 'FY_BY_REALTlME',
}
def response(self):
resp = requests.post(url=self.url,data=self.Formdata,headers=self.headers).text
data = json.loads(resp) # take json Convert data into a dictionary
# utilize jsonpath Extract the data
if "translateResult" in data:
k = jsonpath.jsonpath(data, '$..translateResult')[0][0][0]['tgt']
print(k)
print(" Other translators :")
if "smartResult" in data:
lst = jsonpath.jsonpath(data, '$..entries')[0]
for k in lst[1:]:
k = k.replace("\r\n", "")
print(k)
def main(self):
#Formdata
self.js_Formdata()
#print(self.Formdata)
# Send a request , Get a response
self.response()
if __name__ == '__main__':
msg = input(' Please enter the word or sentence you want to translate :')
youdao = Youdao(msg)
youdao.main()
result
Conclusion
If you think the blogger's writing is good, give it to the third company !!!
边栏推荐
- FullGC问题分析及解决办法总结
- transform-origin属性详解
- 普通测试年薪15w,测试开发年薪30w+,二者差距在哪?
- 机器人技术创新与实践旧版本大纲
- 計算機服務中缺失MySQL服務
- "Xiaodeng in operation and maintenance" meets the compliance requirements of gdpr
- Flexible layout (I)
- Blue Bridge Cup Netizen age (violence)
- Summary of customer value model (RFM) technology for data analysis
- Release notes of JMeter version 5.5
猜你喜欢
Pass parent component to child component: props
mips uclibc 交叉编译ffmpeg,支持 G711A 编解码
Le Service MySQL manque dans le service informatique
Complete process of MySQL SQL
抽絲剝繭C語言(高階)數據的儲存+練習
SQLMAP使用教程(四)实战技巧三之绕过防火墙
After the promotion, sales volume and flow are both. Is it really easy to relax?
How to * * labelimg
Release notes of JMeter version 5.5
. Net 5 fluentftp connection FTP failure problem: this operation is only allowed using a successfully authenticated context
随机推荐
How does an enterprise manage data? Share the experience summary of four aspects of data governance
弹性布局(二)
JS small exercise ---- time sharing reminder and greeting, form password display hidden effect, text box focus event, closing advertisement
L'étape avancée du pointeur de langage C (haut de gamme) pour l'enroulement des cocons
Fullgc problem analysis and solution summary
Asynchronous components and suspend (in real development)
Freeswitch dials extension number source code tracking
計算機服務中缺失MySQL服務
点亮显示屏的几个重要步骤
Sqlmap tutorial (IV) practical skills three: bypass the firewall
PostgreSQL source code (60) transaction system summary
Tujia, muniao, meituan... Home stay summer war will start
Multidisciplinary integration
Special behavior of main function in import statement
Fast quantitative, abbkine protein quantitative kit BCA method is coming!
URP - shaders and materials - simple lit
RuntimeError: CUDA error: CUBLAS_ STATUS_ ALLOC_ Failed when calling `cublascreate (handle) `problem solving
云备份项目
Calculus key and difficult points record part integral + trigonometric function integral
Circulating tumor cells - here comes abnova's solution