当前位置:网站首页>Sogou news - dataset
Sogou news - dataset
2022-08-03 13:03:00 【51CTO】
2,909,551 news articles from 5 categories of SogouCA and SogouCS news corpora.Each category contains 90,000 training samples and 12,000 test samples, respectively.These Chinese characters have been converted into Pinyin.
This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks couldachieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.
Translation:
This paper conducts an empirical study on the application of character-level convolutional networks (ConvNets) in text classification.We construct several large-scale datasets to demonstrate that character-level convolutional networks can achieve state-of-the-art or competitive results.Traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks are compared.
You can download the dataset from the official website address, and I myself shared a copy on Baidu Netdisk.You can follow my official account and reply "2020082502" to get the download link.
As long as I have time, I try to write articles and share them with everyone.
My official account:

边栏推荐
猜你喜欢

899. 有序队列

An introduction to the skeleton tool

超多精美礼品等你来拿!2022年中国混沌工程调查启动

Image fusion DDcGAN study notes

基于php旅游网站管理系统获取(php毕业设计)

In order to counteract the drop in sales and explore the low-end market, Weilai's new brand products are priced as low as 100,000?

How to build an overseas purchasing system/purchasing website - source code analysis

(通过页面)阿里云云效上传jar

PolarFormer: Multi-camera 3D Object Detection with Polar Transformers 论文笔记

Filebeat 如何保持文件状态?
随机推荐
为什么越来越多的开发者放弃使用Postman,而选择Eolink?
AMS simulation
业界新标杆!阿里开源自研高并发编程核心笔记(2022最新版)
基于php网上零食商店管理系统获取(php毕业设计)
欧曼自动挡、银河大马力、行星新产品 欧曼全新产品以燎原之势赢领市场
YOLOv5训练数据提示No labels found、with_suffix使用、yolov5训练时出现WARNING: Ignoring corrupted image and/or label
【R】用grafify搞定统计绘图、方差分析、干预比较等!
通过点击CheckBox实现背景变换小案例
Nodejs 安装依赖cpnm时,install 出现Error: Cannot find module ‘fs/promises‘
Image fusion DDcGAN study notes
Random forest project combat - temperature prediction
Image fusion GAN-FM study notes
【蓝桥杯选拔赛真题48】Scratch跳舞机游戏 少儿编程scratch蓝桥杯选拔赛真题讲解
Mysql重启后innodb和myisam插入的主键id变化总结
第十五章 源代码文件 REST API 简介
新评论接口——京东评论接口
JS获得浏览器类型
R语言ggplot2可视化:使用ggpubr包的ggsummarystats函数可视化箱图(通过ggfunc参数设置)、在可视化图像的下方添加描述性统计结果表格
R language ggplot2 visualization: use the patchwork bag plot_layout function will be more visual image together, ncol parameter specifies the number of rows, specify byrow parameters configuration dia
随机森林项目实战---气温预测