当前位置:网站首页>English translation is too difficult? I wrote two translation scripts with crawler in a rage

English translation is too difficult? I wrote two translation scripts with crawler in a rage

2022-07-07 07:23:00 Hall owner a Niu

Personal profile

  • Author's brief introduction : Hello everyone , I'm Daniel
  • Personal home page : Hall owner a Niu
  • Stand by me : Like collection ️ Leaving a message.
  • Series column :python Web crawler
  • Maxim : So far, all life is about failure , But it doesn't prevent me from moving forward !

 Please add a picture description

Preface

Here it comes ! Here it comes ! As a programmer , English sentences cannot be translated , I can't bear it , The script must be scheduled !!!

Baidu translation ( Simple )

analysis

Enter Baidu translation ,F12 Enter all of the network , When you write what you want to translate , You can see in all of the network sug This link , Our interface is ours url, Parameter is kw.
 Please add a picture description
 Please add a picture description

Code

import requests
post_url='https://fanyi.baidu.com/sug'
headers={
    
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}
word = input(' Please enter the... You want to translate , It can be used in various languages :')
data = {
    
    'kw': word
}
response = requests.post(url=post_url,data=data,headers=headers)
dic_obj = response.json() # take json Data conversion to dictionary 
print(dic_obj['data'][0]['v'])

result

 Insert picture description here
 Insert picture description here

Youdao translation version ( difficult )

analysis (js reverse )

F12 Go into developer mode , In the network xhr( look for ajax Place of request ) Find the interface shown in the figure below .
 Insert picture description here
Then we look at the parameters :
 Insert picture description here
 Insert picture description here
The comparison between the two figures shows that ,i It should be the sentence we want to translate , The green line is the parameters of different forms , We need to deal with it ,Its A see be 13 Bit time stamp ,salt It means salt in English , And better than timestamp lts More than a , The first 13 are the same , It should be a salt timestamp ( For a string of numbers, you can add a string of numbers or strings and then encrypt , In encryption, we call adding salt ), We can use these two parameters python Separate simulation , In order to avoid unnecessary trouble or some people will not , We found them directly behind js sentence , use python perform js Just generate it .

And here it is sign At a glance, there is 32 position , It should be generated by some encryption algorithm , The most common is md5 and rsa Encrypted , Let's do a global search js reverse :
 Insert picture description here
 Insert picture description here
After searching , We found an old friend md5 encryption , The generation method of parameters is also found , In the figure js Inside r It's a time stamp ,js Inside i It's the salt timestamp ,sign Yes, it is md5 Encrypted string in parentheses , And analyze e The birth of , You can find out through break point debugging .
 Insert picture description here
You can see e Is what we want to translate , Now the parameters are obvious , Our simplest call is actually python Medium hashlib Module md5 The encryption algorithm can get sign, But here we don't have to , Increase the difficulty , practice js reverse . I directly extracted md5 cryptographic js Put the files in the network disk , You can extract it yourself , Use... In the code .

link :https://pan.baidu.com/s/1aV1tEo35Oyw4TUExhJoXUA
Extraction code :waan

meanwhile , In order to deal with reverse climbing , Not just User-Agent, Plus Cookie and Referer.

Code

import requests
import execjs  # perform js Module of statement 
import json
import jsonpath

class Youdao():
    def __init__(self,msg):
        # url
        self.url = 'https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
        # headers
        self.headers = {
    
            'User-Agent': 'Mozilla / 5.0(Windows NT 10.0;WOW64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 91.0.4472.124Safari / 537.',
            'Cookie': 'OUTFOX_SEARCH_USER_ID = [email protected];OUTFOX_SEARCH_USER_ID_NCOO = 39238000.072458096;JSESSIONID = aaak-QLUNaabh_wFWK8Qx;___rl__test__cookies = 1626662199192',
            'Referer': 'https://fanyi.youdao.com/'
        }
        self.msg = msg
        self.Formdata = None

    def js_Formdata(self):
        # Time stamp 
        r = execjs.eval('"" + (new Date).getTime()')
        # Time stamp, salt 
        i = r + str(execjs.eval('parseInt(10 * Math.random(), 10)'))
        ctx = execjs.compile(open('./youdao.js', 'r', encoding='utf-8').read())
        sign = ctx.call('getsign', self.msg,i)  # call youdao.js Inside getsign function , Pass in the things to be translated and the salt time stamp .
        self.Formdata = {
    
            'i': self.msg,
            'from': 'AUTO',
            'to': 'AUTO',
            'smartresult': 'dict',
            'client': 'fanyideskweb',
            'salt': i,
            'sign': sign,
            'lts': r,
            'bv': 'f46e446c6db49492797b7d03ea1e82da',
            'doctype': 'json',
            'version': '2.1',
            'keyfrom': 'fanyi.web',
            'action': 'FY_BY_REALTlME',
        }

    def response(self):
        resp = requests.post(url=self.url,data=self.Formdata,headers=self.headers).text
        data = json.loads(resp)  # take json Convert data into a dictionary 

        # utilize jsonpath Extract the data 
        if "translateResult" in data:
            k = jsonpath.jsonpath(data, '$..translateResult')[0][0][0]['tgt']
            print(k)

        print(" Other translators :")
        if "smartResult" in data:
            lst = jsonpath.jsonpath(data, '$..entries')[0]
            for k in lst[1:]:
                k = k.replace("\r\n", "")
                print(k)

    def main(self):
        #Formdata
        self.js_Formdata()
        #print(self.Formdata)
        # Send a request , Get a response 
        self.response()

if __name__ == '__main__':
    msg = input(' Please enter the word or sentence you want to translate :')
    youdao = Youdao(msg)
    youdao.main()

result

 Insert picture description here
 Insert picture description here

Conclusion

If you think the blogger's writing is good, give it to the third company !!!

原网站

版权声明
本文为[Hall owner a Niu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202130703499308.html