当前位置:网站首页>Machine learning notes - explore object detection datasets using streamlit
Machine learning notes - explore object detection datasets using streamlit
2022-07-07 20:10:00 【Sit and watch the clouds rise】
One 、 Explore the importance of data sets
Datasets are usually the weakest link in most data science projects . Building a good dataset can be very difficult . So at least understand the dataset we use , And there are ways to explore and discuss it .
1、 For the project team
Explain metrics and errors and improve the quality of data sets : The team can easily check what is wrong with a set of predictions . Maybe the data doesn't look as expected ? Maybe the comment is wrong ? If that's the case , It should be easy to repair .
Find out if the model can handle a situation and plan new functions : Sometimes the product owner / Customers will ask whether the model can handle new labels or new situations . The team should be able to give the first answer and some actions faster .
2、 For annotation team
If the annotation team has a dashboard to explore the current dataset , It can use it to answer questions :
“ How to mark this item ?” : By looking for similar examples in the dashboard and finding the correct labels
“ What should I do in case of accident ?”: It can find situations such as blocked tags in the current data set .
Will improve data quality ! But it will also make them more autonomous , And give the project team more time to deal with the rest of the project .
3、 For end users 、 Integrators 、 sales
They don't have to ask what the label means , And be able to display examples and solve problems by yourself .
Two 、 Use Streamlit Explore COCO Object detection dataset
Let's look at a use Streamlit Explore examples of object detection data sets .
We will use MS COCO Data sets toaster Class .
Why choose toaster ? This is one of the smallest presentation classes : The training focuses on 225 A bounding box , The validation set has 9 individual . It looks simple , Because this is a widely used and relatively standard object .
1、COCO What is the toaster of a dataset ?
The first thing we can notice by exploring datasets is that there are two types of toasters :


2、 Mislabeled data

3、 Callout box error

4、 Overlapping objects
When multiple objects are close , What are you doing ? Are you making a bounding box or a grouping for each object ? The common answer is to place one for each object . But sometimes it can be really difficult , For example, a pile of oranges below :

5、 Occluded
When an object is hidden behind another object , The usual way is Note only visible surfaces , As long as it is greater than the given proportion of the expected object ( Usually > 20%). however , It may affect what you do with it in the future ( Training and post-processing ). Another good practice is to mark comments as truncated or blocked .

6、 Difficult to recognize
Some objects may not be recognized : Too small 、 Blocked or the image quality may not be good enough . Most of the time The answer is not to comment on it at all , But you can also mark it as hard.
The real problem is : Do you want to teach your model to test these ? Do you want to use this object to affect your indicators ? Is this the goal of your final product ?

7、 Picture in picture / Mirror
Often overlooked , It happens in most crowdsourcing / Crawl / End user based datasets , Rather than data sets in a controlled environment . The usual decision is to mark them normally , But you can decide not to do so in a specific use case , Or decide to mark them so that they can be distinguished later . Here it represents the training set 5%.

3、 ... and 、 Set up Streamlit instrument panel
There are several ways to set up a dashboard to explore your object detection data set .
First , You can use prefabricated tools , for example Voxel51 The team's amazing FiftyOne, Maybe one day Google Will use Know You Data. secondly , You can use Streamlit or Dash Wait for dashboard solutions to build your own solutions . Choosing the first solution will make it easier to empower your advanced tools , For example, user authentication 、 Dataset Management 、 Model to evaluate …… And easier to maintain .
Setting up the second solution may be faster , It will also provide you with greater flexibility for custom solutions , And you will be able to share code and practices between project code and different dashboards .
I will show you how to use Streamlit Operate the dashboard , Because it shows Streamlit And a good way to easily integrate it to explore object detection data sets .
The complete code of this small example is in here .
If you want to see the complete code of the deployed application , You can go to here .

边栏推荐
猜你喜欢

LeetCode_ 7_ five

Some important knowledge of MySQL

vulnhub之Funfox2

Cloud 组件发展升级

Interpretation of transpose convolution theory (input-output size analysis)

Force buckle 2319 Judge whether the matrix is an X matrix

干货分享|DevExpress v22.1原版帮助文档下载集合

Compiler optimization (4): inductive variables

Vulnhub's funfox2

Force buckle 599 Minimum index sum of two lists
随机推荐
Force buckle 1790 Can two strings be equal by performing string exchange only once
ASP.NET学习& asp‘s one word
Force buckle 599 Minimum index sum of two lists
整型int的拼接和拆分
Compiler optimization (4): inductive variables
CSDN语法说明
Semantic SLAM源码解析
The boundary of Bi: what is bi not suitable for? Master data, Martech? How to expand?
831. KMP字符串
MRS离线数据分析:通过Flink作业处理OBS数据
[sword finger offer] sword finger offer II 012 The sum of left and right subarrays is equal
openEuler 有奖捉虫活动,来参与一下?
R language dplyr package mutate_ At function and min_ The rank function calculates the sorting sequence number value and ranking value of the specified data column in the dataframe, and assigns the ra
CSDN syntax description
强化学习-学习笔记8 | Q-learning
【哲思与实战】程序设计之道
Force buckle 2315 Statistical asterisk
A pot of stew, a collection of common commands of NPM and yarn cnpm
Cloud 组件发展升级
有了ST7008, 蓝牙测试完全拿捏住了