当前位置:网站首页>Using huggingface model to translate English
Using huggingface model to translate English
2022-07-24 11:54:00 【This Livermore is not too cold】
Baidu translated api There's a charge , We will use the open source model to translate English
from transformers import pipeline, AutoModelWithLMHead, AutoTokenizer
from tqdm import tqdm
import paramiko
from concurrent.futures import ThreadPoolExecutor
def get_en_to_zh_model():
model = AutoModelWithLMHead.from_pretrained("Helsinki-NLP/opus-mt-en-zh")
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-zh")
translation = pipeline("translation_en_to_zh", model=model, tokenizer=tokenizer)
return translation
def en_to_ch(text):
# Translate English into Chinese
#text = "Student accommodation centres, resorts"
translated_text = translation(text, max_length=1024)[0]['translation_text']
return translated_text
def ch_to_en():
# Translate Chinese into English
model = AutoModelWithLMHead.from_pretrained("Helsinki-NLP/opus-mt-zh-en")
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-zh-en")
translation = pipeline("translation_zh_to_en", model=model, tokenizer=tokenizer)
text = " Student accommodation Center , A vacation home "
translated_text = translation(text, max_length=40)[0]['translation_text']
return translated_text
def get_translate_list_single(ori_txt):
""" Single thread """
with open(ori_txt,'r') as fp:
contents = fp.readlines()
translate_list = []
for sample in tqdm(contents):
print(sample)
translated_text = en_to_ch(sample)
print(translated_text)
translate_list.append("{}***{}\n".format(sample[:-2],translated_text))
with open('/cloud/cloud_disk/users/huh/nlp/base_catree_Text_Categorization/script/fu.txt','w') as fp:
fp.writelines(translate_list)
def translate_english_to_chinese(tmp_sentence):
""" Translate English into Chinese , Multithreading """
en_zh_list = []
translated_text = en_to_ch(tmp_sentence)
print(translated_text)
en_zh_list.append("{} *** {}\n".format(tmp_sentence[:-2], translated_text))
return en_zh_list
def get_translate_list_multi(ori_txt,end_txt):
""" Multithreading """
with open(ori_txt,'r') as fp:
contents = fp.readlines()
executor = ThreadPoolExecutor(max_workers=10)
en_zh_list = [executor.submit(translate_english_to_chinese, (tmp_sentence)) for tmp_sentence in contents]
end_list = []
for sample in en_zh_list:
end_list.append("{}\n".format(sample.result()[0]))
with open(end_txt, 'w') as f:
f.writelines(end_list)
if __name__ == '__main__':
ori_txt = '/cloud/cloud_disk/users/huh/nlp/base_catree_Text_Categorization/script/cope_dataset/translate_english_to_chinese/question.txt'
end_txt = '/cloud/cloud_disk/users/huh/nlp/base_catree_Text_Categorization/script/fu.txt'
translation = get_en_to_zh_model()
# Single thread
#get_translate_list_single(ori_txt)
get_translate_list_multi(ori_txt,end_txt)
边栏推荐
- NFT digital collection system construction - app development
- [Commons beanautils topic] 004 beanautils topic
- Convergence rules for 4 * 4 image weights
- Hash - 202. Happy number
- 【C和指针第11章】动态内存分配
- A* and JPS
- Jmeter-While控制器
- Common formulas and application scenarios of discrete distribution
- Source code analysis sentry user behavior record implementation process
- Chapter 1 Introduction
猜你喜欢

L1-059 敲笨钟

Install JMeter

字符串——344.反转字符串

Shell script "< < EOF" my purpose and problems

Types and history of bugs in it circle

生信周刊第37期

Hcip OSPF interface network type experiment day 4

In kuborad graphical interface, operate kubernetes cluster to realize master-slave replication in MySQL

使用Prometheus+Grafana实时监控服务器性能

Use prometheus+grafana to monitor server performance in real time
随机推荐
哈希——202. 快乐数
Database operation through shell script
字符串——344.反转字符串
Mysql database
[I also want to brush through leetcode] 468. Verify the IP address
Notes on @enableconfigurationproperties
LogBack & MDC & a simple use
Install JMeter
Best practice | using Tencent cloud AI character recognition to realize enterprise qualification certificate recognition
One of his birds sold for 60million -- the collection of eight mountain people in the Ming and Qing Dynasties
什么是云原生,云原生技术为什么这么火?
20000 words detailed explanation, thoroughly understand es!
L1-064 AI core code valued at 100 million
Tensor and numpy convert "suggested collection" to each other
Two important laws about parallelism
一周精彩内容分享(第13期)
Day4: circular structure
JMeter runtime controller
Hash - 202. Happy number
A* and JPS