当前位置:网站首页>Sogou news - dataset
Sogou news - dataset
2022-08-03 13:03:00 【51CTO】
2,909,551 news articles from 5 categories of SogouCA and SogouCS news corpora.Each category contains 90,000 training samples and 12,000 test samples, respectively.These Chinese characters have been converted into Pinyin.
This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks couldachieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.
Translation:
This paper conducts an empirical study on the application of character-level convolutional networks (ConvNets) in text classification.We construct several large-scale datasets to demonstrate that character-level convolutional networks can achieve state-of-the-art or competitive results.Traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks are compared.
You can download the dataset from the official website address, and I myself shared a copy on Baidu Netdisk.You can follow my official account and reply "2020082502" to get the download link.
As long as I have time, I try to write articles and share them with everyone.
My official account:

边栏推荐
猜你喜欢

Notepad++ 安装jsonview插件

leetcode 11. 盛最多水的容器

【云原生 · Kubernetes】部署Kubernetes集群

别再用if-else了,分享一下我使用“策略模式”的项目经验...

In order to counteract the drop in sales and explore the low-end market, Weilai's new brand products are priced as low as 100,000?

GameFi 行业下滑但未出局| June Report

An工具介绍之骨骼工具

From the physical level of the device to the circuit level

【蓝桥杯选拔赛真题48】Scratch跳舞机游戏 少儿编程scratch蓝桥杯选拔赛真题讲解

什么是分布式锁?几种分布式锁分别是怎么实现的?
随机推荐
图像融合DDcGAN学习笔记
An工具介绍之形状工具及渐变变形工具
基于php志愿者服务平台管理系统获取(php毕业设计)
An动画基础之按钮动画与基础代码相结合
信创建设看广州|海泰方圆亮相2022 信创生态融合发展论坛
Image fusion DDcGAN study notes
安全自定义 Web 应用程序登录
使用 %Status 值
层次分析法
Database basics one (MySQL) [easy to understand]
An工具介绍之钢笔工具、铅笔工具与画笔工具
(通过页面)阿里云云效上传jar
The new interface, jingdong comment interface
新评论接口——京东评论接口
Five, the function calls
php microtime encapsulates the tool class, calculates the running time of the interface (breakpoint)
2022 年 CISO 最关心的是什么?
7月份最后一篇博客
An动画优化之补间形状与传统补间的优化
awk入门教程