当前位置:网站首页>Multi classification of unbalanced text using AWS sagemaker blazingtext
Multi classification of unbalanced text using AWS sagemaker blazingtext
2020-11-06 01:22:00 【InfoQ】
background
Text classification (Text Classification) It belongs to the field of natural language processing , It refers to the process that the computer maps a text containing information to a given category or several categories of topics in advance . But in reality , We often encounter imbalances in the categories of data samples (class imbalance) The phenomenon , It seriously affects the final result of text classification . The so-called sample imbalance refers to a given data set, some categories of data more , Some data categories are few , And the data category samples with more data proportion and data category samples with small proportion reach a large proportion .
BlazingText yes AWS SageMaker A built-in algorithm for , Provides Word2vec And text classification algorithm highly optimized implementation . This article uses Sagemaker BlazingText It realizes the text multi classification . On the problem of sample imbalance , Back translation and EDA Two methods are used to over sample a small number of samples , The back translation method calls AWS Translate The service was translated and retranslated , and EDA Methods mainly use synonyms to replace 、 Insert randomly 、 Random exchange 、 Random deletion deals with text data . This article also uses AWS SageMaker Automatic parametric optimization for BlazingText The text classification algorithm based on the algorithm finds the optimal hyperparameter .
This article is based on DBpedia The public dataset generated by processing contains 14 Unbalanced text data of categories , And did not do any sample imbalance processing Baseline Experiment and include back translation and EDA Oversampling experiments of two methods .
Link to the original text :【https://www.infoq.cn/article/xbSAYuJcQrm048GHl5dJ】. Without the permission of the author , Prohibited reproduced .
版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
边栏推荐
- Use of vuepress
- 中小微企业选择共享办公室怎么样?
- In depth understanding of the construction of Intelligent Recommendation System
- 100元扫货阿里云是怎样的体验?
- 快快使用ModelArts,零基礎小白也能玩轉AI!
- H5 makes its own video player (JS Part 2)
- git rebase的時候捅婁子了,怎麼辦?線上等……
- Didi elasticsearch cluster cross version upgrade and platform reconfiguration
- Network security engineer Demo: the original * * is to get your computer administrator rights! 【***】
- 多机器人行情共享解决方案
猜你喜欢
Swagger 3.0 天天刷屏,真的香嗎?
vue-codemirror基本用法:实现搜索功能、代码折叠功能、获取编辑器值及时验证
Network security engineer Demo: the original * * is to get your computer administrator rights! 【***】
git rebase的時候捅婁子了,怎麼辦?線上等……
Don't go! Here is a note: picture and text to explain AQS, let's have a look at the source code of AQS (long text)
ES6学习笔记(五):轻松了解ES6的内置扩展对象
Calculation script for time series data
IPFS/Filecoin合法性:保护个人隐私不被泄露
中小微企业选择共享办公室怎么样?
Architecture article collection
随机推荐
Don't go! Here is a note: picture and text to explain AQS, let's have a look at the source code of AQS (long text)
一篇文章带你了解CSS 分页实例
Using Es5 to realize the class of ES6
6.4 viewresolver view parser (in-depth analysis of SSM and project practice)
从海外进军中国,Rancher要执容器云市场牛耳 | 爱分析调研
htmlcss
Skywalking series blog 5-apm-customize-enhance-plugin
前端基础牢记的一些操作-Github仓库管理
It's so embarrassing, fans broke ten thousand, used for a year!
Python crawler actual combat details: crawling home of pictures
Why do private enterprises do party building? ——Special subject study of geek state holding Party branch
关于Kubernetes 与 OAM 构建统一、标准化的应用管理平台知识!(附网盘链接)
Word segmentation, naming subject recognition, part of speech and grammatical analysis in natural language processing
一篇文章带你了解SVG 渐变知识
Serilog原始碼解析——使用方法
人工智能学什么课程?它将替代人类工作?
How long does it take you to work out an object-oriented programming interview question from Ali school?
ES6学习笔记(五):轻松了解ES6的内置扩展对象
From zero learning artificial intelligence, open the road of career planning!
Mongodb (from 0 to 1), 11 days mongodb primary to intermediate advanced secret