当前位置:网站首页>text 文本数据增强方法 data argumentation
text 文本数据增强方法 data argumentation
2022-07-06 09:11:00 【一曲无痕奈何】
知识点:text 数据增强 data argumentation
random insertion 随机插入
random deletion 随机删除
random swap 随机交换
参考论文: EDA : Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
Back Translation
举例: 英语 --> 中文 --> 英语
# 需要安装 : pip install google_trans_new
from google_trans_new import google_translator
translator = google_translator()
sentence = ['stay hungry, stay foolish. -- spoken / said by Steve Jobs']
# 英 --> 中
translation_cn = translator.translate(sentence, lang_tgt='zh-cn')
translation_cn
# 中 --> 英
translation_en = translator.translate(translation_cn, lang_tgt='en')
translation_en
随机选择一种语言翻译
import random
import google_trans_new
languages = list(google_trans_new.LANGUAGES.keys())
len(languages) # 可翻译的语言种类 108 种
object_lang = random.choice(languages)
object_lang
# 正向翻译
translations = translator.translate(sentence, lang_tgt=object_lang)
translations
# 反向翻译
back_trans = translator.translate(translations, lang_tgt='en')
back_trans
# 反向翻译
back_trans = translator.translate(translations, lang_tgt='en')
back_trans
边栏推荐
- [after reading the series of must know] one of how to realize app automation without programming (preparation)
- 颜值爆表,推荐两款JSON可视化工具,配合Swagger使用真香
- NLP路线和资源
- Constants and pointers
- MySQL combat optimization expert 04 uses the execution process of update statements in the InnoDB storage engine to talk about what binlog is?
- Upload vulnerability
- 安装OpenCV时遇到的几种错误
- Super detailed steps to implement Wechat public number H5 Message push
- Zsh configuration file
- Vscode common instructions
猜你喜欢
14 medical registration system_ [Alibaba cloud OSS, user authentication and patient]
Contest3145 - the 37th game of 2021 freshman individual training match_ B: Password
15 medical registration system_ [appointment registration]
Redis集群方案应该怎么做?都有哪些方案?
Super detailed steps to implement Wechat public number H5 Message push
[after reading the series of must know] one of how to realize app automation without programming (preparation)
Mexican SQL manual injection vulnerability test (mongodb database) problem solution
MySQL實戰優化高手04 借著更新語句在InnoDB存儲引擎中的執行流程,聊聊binlog是什麼?
Sichuan cloud education and double teacher model
CANoe下载地址以及CAN Demo 16的下载与激活,并附录所有CANoe软件版本
随机推荐
112 pages of mathematical knowledge sorting! Machine learning - a review of fundamentals of mathematics pptx
MySQL实战优化高手10 生产经验:如何为数据库的监控系统部署可视化报表系统?
MySQL底层的逻辑架构
宝塔的安装和flask项目部署
Competition vscode Configuration Guide
Inject common SQL statement collation
MySQL combat optimization expert 10 production experience: how to deploy visual reporting system for database monitoring system?
再有人问你数据库缓存一致性的问题,直接把这篇文章发给他
oracle sys_ Context() function
The real future of hardware engineers may not be believed by you if I say so
A new understanding of RMAN retention policy recovery window
CDC: the outbreak of Listeria monocytogenes in the United States is related to ice cream products
South China Technology stack cnn+bilstm+attention
C杂讲 浅拷贝 与 深拷贝
MySQL learning diary (II)
① BOKE
The governor of New Jersey signed seven bills to improve gun safety
Vh6501 Learning Series
Canoe CAPL file operation directory collection
CAPL script printing functions write, writeex, writelineex, writetolog, writetologex, writedbglevel do you really know which one to use under what circumstances?