当前位置:网站首页>Free machine learning dataset website (6300+ dataset)
Free machine learning dataset website (6300+ dataset)
2022-06-26 13:45:00 【The star light blog in 2021 cloud computing top3】
Today, I'd like to share with you a free website for acquiring machine learning data sets :
Machine Learning Datasets | Papers With Code
Good news for students who have ideas but do not have data sets , The website is very simple , And all kinds of data sets that are generally available are provided in this book , We can make all kinds of images 、 A collection of data sets such as comments and point clouds .

CIFAR-10
from Krizhevsky Et al . stay Learning multi-layer features from micro imagesCIFAR -10 Data sets ( Canadian Institute of advanced studies ,10 Categories ) yes Tiny Images A subset of a dataset , from 60000 Zhang 32x32 Color image composition . These images are marked with 10 One of the four mutually exclusive categories : The plane 、 automobile ( But not a truck or pickup truck )、 bird 、 cat 、 deer 、 Dog 、 frog 、 Horse 、 Boats and trucks ( But not a pickup truck ). Each kind has 6000 Zhang image , Each kind has 5000 Training images and 1000 Test images .
The criteria for determining whether an image belongs to a certain category are as follows :
- The class name should be in “ What's in this picture ?” Top of the list of possible answers to questions .
- The image should be photo realistic . The labeler was instructed to refuse to draw a line .
- The image should contain only one of the objects referred to in this class Highlight examples . As long as the reporter still knows the identity of the object , Objects may be partially obscured or seen from an unusual angle .

Urban landscape
from Cordts Et al . stay For semantic city scene understanding Cityscapes Data setCityscapes It is a large database focusing on the semantic understanding of urban street view . It is divided into 8 Categories ( Plane 、 human beings 、 vehicle 、 Architecture 、 object 、 natural 、 Sky and void ) Of 30 Two categories provide semantics 、 Instance and dense pixel annotation . The data set consists of approximately 5000 A finely labeled image and 20000 A rough labeled image . In a few months 、 During the day and in good weather , stay 50 Cities captured data . It was originally recorded as a video , Therefore, the frame is manually selected to have the following characteristics : A large number of dynamic objects 、 Changing scene layout and changing background .
resources : A survey of deep learning techniques applied to semantic segmentation
Pennsylvania tree vault
from Mitchell P. Marcus Et al . stay Build a large annotated English corpus :Penn TreebankEnglish Penn Treebank ( PTB ) corpus , Especially with the Wall Street Journal (WSJ) The corresponding part of the corpus , It is one of the most well-known and commonly used corpora for evaluating sequence label models . This task includes annotating each word with a part of speech tag . In the most common segmentation of this corpus , from 0 To 18 Part of the is used for training (38 219 A sentence ,912 344 A sign ), from 19 To 21 The section of is used to verify (5 527 A sentence ,131 768 A sign ), from 22 To 24 Used for testing (5 462 A sentence ,129 654 A sign ). Corpora are also commonly used in character level and word level language modeling .
resources :Seq2Biseq: A bi-directional output recurrent neural network for sequence modeling
IMDb Movie reviews
from Andrew L. Maas Et al . stay Learn word vectors for emotion analysisIMDb Movie reviews The data set is a binary affective analysis data set , From the Internet Movie Database (IMDb) Of 50,000 Comments make up , Mark as positive or negative . The dataset contains an even number of positive and negative comments . Consider only highly polarized comments . Score for negative comments ≤4( Full marks 10), Positive comment scores ≥7( Full marks 10). Each film contains no more than comments 30 strip . The dataset contains other unlabeled data .
resources :Sentiment analysis | NLP-progress
Model network
Introduced by Wu et al . stay 3D ShapeNets in : The depth of the volume shape representsModelNet 40 data Set contains composite object point clouds . As the most widely used point cloud analysis benchmark ,ModelNet40 Because of its variety 、 Clear shape 、 Data sets are well structured and popular . The original ModelNet40 from 40 Categories ( Like a plane 、 automobile 、 plant , The lamp ), among 9,843 For training , rest 2,468 For testing . The corresponding point cloud data points are uniformly sampled from the mesh surface , Then it is further preprocessed by moving to the origin and scaling to a unit sphere .
resources : Geometric feedback network for point cloud classificationCARLA( Automobile learning action )
from Dosovitskiy Et al . stay CARLA: An open urban driving simulatorCARLA(CAR Learning to Act) Is an open urban driving simulator , As Unreal Engine 4 And an open source layer on the . Technically speaking , It works in a way similar to Unreal Engine 4 An open source layer on , Sensors are provided in the following form RGB camera ( Customizable location )、 Actual ground depth map 、 have 12 One for driving ( road 、 Lane markings 、 traffic sign 、 Sidewalk, etc ) The design of the semantic categories of the ground live semantic segmentation map 、 The bounding box of dynamic objects in the environment , And the measurement of the agent itself ( Vehicle position and direction ).
resources : Synthetic data for deep learning
The above is a brief introduction to several commonly used data sets , Please go to the website to get more data .
边栏推荐
- Network remote access using raspberry pie
- Cloudcompare - Poisson reconstruction
- 7-3 minimum toll
- Go language - pipeline channel
- Solutions to the failure of last child and first child styles of wechat applet
- 证券开户安全吗,有没有什么危险啊
- 【MySQL从入门到精通】【高级篇】(二)MySQL目录结构与表在文件系统中的表示
- Detailed sorting of HW blue team traceability process
- Pytorch based generation countermeasure Network Practice (7) -- using pytorch to build SGAN (semi supervised GaN) to generate handwritten digits and classify them
- Wechat applet -picker component is repackaged and the disabled attribute is added -- above
猜你喜欢

Mysql database explanation (III)

使用 Performance 看看浏览器在做什么

NVM installation tutorial

Here Document免交互及Expect自动化交互

Ring queue PHP

Embedded virlog code running process

创建一个自己的跨域代理服务器

Wechat applet magic bug - choose to replace the token instead of clearing the token, wx Getstoragesync will take the old token value instead of the new token value

Beifu PLC model selection -- how to see whether the motor is a multi turn absolute value encoder or a single turn absolute value encoder

Detailed sorting of HW blue team traceability process
随机推荐
Mongodb series window environment deployment configuration
Wechat applet SetData dynamic variable value sorting
LAMP编译安装
mysql配置提高数据插入效率
MongoDB系列之Window环境部署配置
Aesthetic experience (episode 238) Luo Guozheng
团队管理的最关键因素
Global variable vs local variable
Tips for using nexys A7 development board resources
Exercise set 1
Ring queue PHP
【Proteus仿真】Arduino UNO按键启停 + PWM 调速控制直流电机转速
Applicable and inapplicable scenarios of mongodb series
Awk tools
三维向量的夹角
Custom encapsulation drop-down component
微信小程序注册指引
Input text to automatically generate images. It's fun!
Firewall introduction
遍历指定目录获取当前目录下指定后缀(如txt和ini)的文件名