当前位置:网站首页>text 文本数据增强方法 data argumentation
text 文本数据增强方法 data argumentation
2022-07-06 09:11:00 【一曲无痕奈何】
知识点:text 数据增强 data argumentation
random insertion 随机插入
random deletion 随机删除
random swap 随机交换
参考论文: EDA : Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
Back Translation
举例: 英语 --> 中文 --> 英语
# 需要安装 : pip install google_trans_new
from google_trans_new import google_translator
translator = google_translator()
sentence = ['stay hungry, stay foolish. -- spoken / said by Steve Jobs']
# 英 --> 中
translation_cn = translator.translate(sentence, lang_tgt='zh-cn')
translation_cn
# 中 --> 英
translation_en = translator.translate(translation_cn, lang_tgt='en')
translation_en
随机选择一种语言翻译
import random
import google_trans_new
languages = list(google_trans_new.LANGUAGES.keys())
len(languages) # 可翻译的语言种类 108 种
object_lang = random.choice(languages)
object_lang
# 正向翻译
translations = translator.translate(sentence, lang_tgt=object_lang)
translations
# 反向翻译
back_trans = translator.translate(translations, lang_tgt='en')
back_trans
# 反向翻译
back_trans = translator.translate(translations, lang_tgt='en')
back_trans边栏推荐
- Why can't TN-C use 2p circuit breaker?
- How to build an interface automation testing framework?
- Carolyn Rosé博士的社交互通演讲记录
- C杂讲 浅拷贝 与 深拷贝
- MySQL底层的逻辑架构
- The real future of hardware engineers may not be believed by you if I say so
- flask运维脚本(长时间运行)
- Simple solution to phpjm encryption problem free phpjm decryption tool
- C miscellaneous two-way circular linked list
- docker MySQL解决时区问题
猜你喜欢

112 pages of mathematical knowledge sorting! Machine learning - a review of fundamentals of mathematics pptx
![15 medical registration system_ [appointment registration]](/img/c1/27c7a5aae82783535e5467583bb176.png)
15 medical registration system_ [appointment registration]

Security design verification of API interface: ticket, signature, timestamp

jar运行报错no main manifest attribute

Mexican SQL manual injection vulnerability test (mongodb database) problem solution

寶塔的安裝和flask項目部署

西南大学:胡航-关于学习行为和学习效果分析

C miscellaneous two-way circular linked list

MySQL实战优化高手04 借着更新语句在InnoDB存储引擎中的执行流程,聊聊binlog是什么?

C杂讲 文件 初讲
随机推荐
Pointer learning
[after reading the series of must know] one of how to realize app automation without programming (preparation)
Southwest University: Hu hang - Analysis on learning behavior and learning effect
C miscellaneous shallow copy and deep copy
Jar runs with error no main manifest attribute
Automation sequences of canoe simulation functions
17 medical registration system_ [wechat Payment]
MySQL Real Time Optimization Master 04 discute de ce qu'est binlog en mettant à jour le processus d'exécution des déclarations dans le moteur de stockage InnoDB.
Solve the problem of remote connection to MySQL under Linux in Windows
[untitled]
MySQL combat optimization expert 09 production experience: how to deploy a monitoring system for a database in a production environment?
Which is the better prospect for mechanical engineer or Electrical Engineer?
The replay block of canoe still needs to be combined with CAPL script to make it clear
寶塔的安裝和flask項目部署
14 medical registration system_ [Alibaba cloud OSS, user authentication and patient]
MySQL combat optimization expert 06 production experience: how does the production environment database of Internet companies conduct performance testing?
软件测试工程师必备之软技能:结构化思维
MySQL learning diary (II)
Configure system environment variables through bat script
MySQL ERROR 1040: Too many connections