当前位置:网站首页>Machine learning notes - explore object detection datasets using streamlit
Machine learning notes - explore object detection datasets using streamlit
2022-07-07 20:10:00 【Sit and watch the clouds rise】
One 、 Explore the importance of data sets
Datasets are usually the weakest link in most data science projects . Building a good dataset can be very difficult . So at least understand the dataset we use , And there are ways to explore and discuss it .
1、 For the project team
Explain metrics and errors and improve the quality of data sets : The team can easily check what is wrong with a set of predictions . Maybe the data doesn't look as expected ? Maybe the comment is wrong ? If that's the case , It should be easy to repair .
Find out if the model can handle a situation and plan new functions : Sometimes the product owner / Customers will ask whether the model can handle new labels or new situations . The team should be able to give the first answer and some actions faster .
2、 For annotation team
If the annotation team has a dashboard to explore the current dataset , It can use it to answer questions :
“ How to mark this item ?” : By looking for similar examples in the dashboard and finding the correct labels
“ What should I do in case of accident ?”: It can find situations such as blocked tags in the current data set .
Will improve data quality ! But it will also make them more autonomous , And give the project team more time to deal with the rest of the project .
3、 For end users 、 Integrators 、 sales
They don't have to ask what the label means , And be able to display examples and solve problems by yourself .
Two 、 Use Streamlit Explore COCO Object detection dataset
Let's look at a use Streamlit Explore examples of object detection data sets .
We will use MS COCO Data sets toaster Class .
Why choose toaster ? This is one of the smallest presentation classes : The training focuses on 225 A bounding box , The validation set has 9 individual . It looks simple , Because this is a widely used and relatively standard object .
1、COCO What is the toaster of a dataset ?
The first thing we can notice by exploring datasets is that there are two types of toasters :
2、 Mislabeled data
3、 Callout box error
4、 Overlapping objects
When multiple objects are close , What are you doing ? Are you making a bounding box or a grouping for each object ? The common answer is to place one for each object . But sometimes it can be really difficult , For example, a pile of oranges below :
5、 Occluded
When an object is hidden behind another object , The usual way is Note only visible surfaces , As long as it is greater than the given proportion of the expected object ( Usually > 20%). however , It may affect what you do with it in the future ( Training and post-processing ). Another good practice is to mark comments as truncated or blocked .
6、 Difficult to recognize
Some objects may not be recognized : Too small 、 Blocked or the image quality may not be good enough . Most of the time The answer is not to comment on it at all , But you can also mark it as hard.
The real problem is : Do you want to teach your model to test these ? Do you want to use this object to affect your indicators ? Is this the goal of your final product ?
7、 Picture in picture / Mirror
Often overlooked , It happens in most crowdsourcing / Crawl / End user based datasets , Rather than data sets in a controlled environment . The usual decision is to mark them normally , But you can decide not to do so in a specific use case , Or decide to mark them so that they can be distinguished later . Here it represents the training set 5%.
3、 ... and 、 Set up Streamlit instrument panel
There are several ways to set up a dashboard to explore your object detection data set .
First , You can use prefabricated tools , for example Voxel51 The team's amazing FiftyOne, Maybe one day Google Will use Know You Data. secondly , You can use Streamlit or Dash Wait for dashboard solutions to build your own solutions . Choosing the first solution will make it easier to empower your advanced tools , For example, user authentication 、 Dataset Management 、 Model to evaluate …… And easier to maintain .
Setting up the second solution may be faster , It will also provide you with greater flexibility for custom solutions , And you will be able to share code and practices between project code and different dashboards .
I will show you how to use Streamlit Operate the dashboard , Because it shows Streamlit And a good way to easily integrate it to explore object detection data sets .
The complete code of this small example is in here .
If you want to see the complete code of the deployed application , You can go to here .
边栏推荐
- 开源重器!九章云极DataCanvas公司YLearn因果学习开源项目即将发布!
- Force buckle 1790 Can two strings be equal by performing string exchange only once
- mock. JS returns an array from the optional data in the object array
- 如何在软件研发阶段落地安全实践
- R language uses ggplot2 function to visualize the histogram distribution of counting target variables that need to build Poisson regression model, and analyzes the feasibility of building Poisson regr
- Implement secondary index with Gaussian redis
- vulnhub之tre1
- Force buckle 912 Sort array
- TS快速入门-泛型
- RESTAPI 版本控制策略【eolink 翻译】
猜你喜欢
LeetCode力扣(剑指offer 36-39)36. 二叉搜索树与双向链表37. 序列化二叉树38. 字符串的排列39. 数组中出现次数超过一半的数字
Simulate the implementation of string class
整型int的拼接和拆分
Leetcode force buckle (Sword finger offer 36-39) 36 Binary search tree and bidirectional linked list 37 Serialize binary tree 38 Arrangement of strings 39 Numbers that appear more than half of the tim
国家网信办公布《数据出境安全评估办法》:累计向境外提供10万人信息需申报
Detailed explanation of Flink parallelism and slot
YoloV6:YoloV6+Win10---训练自己得数据集
Force buckle 599 Minimum index sum of two lists
多个线程之间如何协同
mock.js从对象数组中任选数据返回一个数组
随机推荐
Boot 和 Cloud 的版本选型
Graduation season | regretful and lucky graduation season
The DBSCAN function of FPC package of R language performs density clustering analysis on data, checks the clustering labels of all samples, and the table function calculates the two-dimensional contin
Data island is the first danger encountered by enterprises in their digital transformation
The state cyberspace Office released the measures for data exit security assessment: 100000 information provided overseas needs to be declared
Gorilla official: sample code for golang to open websocket client
831. KMP字符串
equals 方法
ASP. Net gymnasium integrated member management system source code, free sharing
torch. nn. functional. Pad (input, pad, mode= 'constant', value=none) record
关于自身的一些安排
Dynamic addition of El upload upload component; El upload dynamically uploads files; El upload distinguishes which component uploads the file.
开源OA开发平台:合同管理使用手册
The boundary of Bi: what is bi not suitable for? Master data, Martech? How to expand?
强化学习-学习笔记8 | Q-learning
Introduction to bit operation
Automatic classification of defective photovoltaic module cells in electroluminescence images-論文閱讀筆記
力扣 1232.缀点成线
vulnhub之school 1
JVM GC garbage collection brief