当前位置:网站首页>Sogou news - dataset
Sogou news - dataset
2022-08-03 13:03:00 【51CTO】
2,909,551 news articles from 5 categories of SogouCA and SogouCS news corpora.Each category contains 90,000 training samples and 12,000 test samples, respectively.These Chinese characters have been converted into Pinyin.
This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks couldachieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.
Translation:
This paper conducts an empirical study on the application of character-level convolutional networks (ConvNets) in text classification.We construct several large-scale datasets to demonstrate that character-level convolutional networks can achieve state-of-the-art or competitive results.Traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks are compared.
You can download the dataset from the official website address, and I myself shared a copy on Baidu Netdisk.You can follow my official account and reply "2020082502" to get the download link.
As long as I have time, I try to write articles and share them with everyone.
My official account:
边栏推荐
- 图像融合GAN-FM学习笔记
- 使用 %Status 值
- 新评论接口——京东评论接口
- 一次内存泄露排查小结
- Last blog for July
- 链游NFT元宇宙游戏系统开发技术方案及源码
- Station B responded that "HR said that core users are all Loser": the interviewer was persuaded to quit at the end of last year and will learn lessons to strengthen management
- 苹果发布 AI 生成模型 GAUDI,文字生成 3D 场景
- 【深度学习】高效轻量级语义分割综述
- 第3章 搭建短视频App基础架构
猜你喜欢
Yahoo! Answers-数据集
An工具介绍之3D工具
Sogou news-数据集
An工具介绍之宽度工具、变形工具与套索工具
An基本工具介绍之选择线条工具(包教会)
B站回应“HR 称核心用户都是 Loser”:该面试官去年底已被劝退,会吸取教训加强管理
【云原生 · Kubernetes】部署Kubernetes集群
How to build an overseas purchasing system/purchasing website - source code analysis
Win11怎么禁止软件后台运行?Win11系统禁止应用在后台运行的方法
setTimeout 、setInterval、requestAnimationFrame
随机推荐
Apache APISIX 2.15 版本发布,为插件增加更多灵活性
Key points for account opening of futures companies
Random forest project combat - temperature prediction
2022 年 CISO 最关心的是什么?
Image fusion GAN-FM study notes
An工具介绍之宽度工具、变形工具与套索工具
无监督学习KMeans学习笔记和实例
类型转换、常用运算符
__unaligned修饰指针
How does Filebeat maintain file state?
leetcode16最接近的三数之和 (排序+ 双指针)
An动画基础之元件的图形动画与按钮动画
setTimeout, setInterval requestAnimationFrame
可重入锁详解(什么是可重入)
云计算服务主要安全风险及应对措施初探
【蓝桥杯选拔赛真题48】Scratch跳舞机游戏 少儿编程scratch蓝桥杯选拔赛真题讲解
7月份最后一篇博客
秋招招工作
【精品必知】Pod生命周期
setTimeout 、setInterval、requestAnimationFrame