当前位置:网站首页>Machine learning notes - explore object detection datasets using streamlit
Machine learning notes - explore object detection datasets using streamlit
2022-07-07 20:10:00 【Sit and watch the clouds rise】
One 、 Explore the importance of data sets
Datasets are usually the weakest link in most data science projects . Building a good dataset can be very difficult . So at least understand the dataset we use , And there are ways to explore and discuss it .
1、 For the project team
Explain metrics and errors and improve the quality of data sets : The team can easily check what is wrong with a set of predictions . Maybe the data doesn't look as expected ? Maybe the comment is wrong ? If that's the case , It should be easy to repair .
Find out if the model can handle a situation and plan new functions : Sometimes the product owner / Customers will ask whether the model can handle new labels or new situations . The team should be able to give the first answer and some actions faster .
2、 For annotation team
If the annotation team has a dashboard to explore the current dataset , It can use it to answer questions :
“ How to mark this item ?” : By looking for similar examples in the dashboard and finding the correct labels
“ What should I do in case of accident ?”: It can find situations such as blocked tags in the current data set .
Will improve data quality ! But it will also make them more autonomous , And give the project team more time to deal with the rest of the project .
3、 For end users 、 Integrators 、 sales
They don't have to ask what the label means , And be able to display examples and solve problems by yourself .
Two 、 Use Streamlit Explore COCO Object detection dataset
Let's look at a use Streamlit Explore examples of object detection data sets .
We will use MS COCO Data sets toaster Class .
Why choose toaster ? This is one of the smallest presentation classes : The training focuses on 225 A bounding box , The validation set has 9 individual . It looks simple , Because this is a widely used and relatively standard object .
1、COCO What is the toaster of a dataset ?
The first thing we can notice by exploring datasets is that there are two types of toasters :
2、 Mislabeled data
3、 Callout box error
4、 Overlapping objects
When multiple objects are close , What are you doing ? Are you making a bounding box or a grouping for each object ? The common answer is to place one for each object . But sometimes it can be really difficult , For example, a pile of oranges below :
5、 Occluded
When an object is hidden behind another object , The usual way is Note only visible surfaces , As long as it is greater than the given proportion of the expected object ( Usually > 20%). however , It may affect what you do with it in the future ( Training and post-processing ). Another good practice is to mark comments as truncated or blocked .
6、 Difficult to recognize
Some objects may not be recognized : Too small 、 Blocked or the image quality may not be good enough . Most of the time The answer is not to comment on it at all , But you can also mark it as hard.
The real problem is : Do you want to teach your model to test these ? Do you want to use this object to affect your indicators ? Is this the goal of your final product ?
7、 Picture in picture / Mirror
Often overlooked , It happens in most crowdsourcing / Crawl / End user based datasets , Rather than data sets in a controlled environment . The usual decision is to mark them normally , But you can decide not to do so in a specific use case , Or decide to mark them so that they can be distinguished later . Here it represents the training set 5%.
3、 ... and 、 Set up Streamlit instrument panel
There are several ways to set up a dashboard to explore your object detection data set .
First , You can use prefabricated tools , for example Voxel51 The team's amazing FiftyOne, Maybe one day Google Will use Know You Data. secondly , You can use Streamlit or Dash Wait for dashboard solutions to build your own solutions . Choosing the first solution will make it easier to empower your advanced tools , For example, user authentication 、 Dataset Management 、 Model to evaluate …… And easier to maintain .
Setting up the second solution may be faster , It will also provide you with greater flexibility for custom solutions , And you will be able to share code and practices between project code and different dashboards .
I will show you how to use Streamlit Operate the dashboard , Because it shows Streamlit And a good way to easily integrate it to explore object detection data sets .
The complete code of this small example is in here .
If you want to see the complete code of the deployed application , You can go to here .
边栏推荐
- 开源重器!九章云极DataCanvas公司YLearn因果学习开源项目即将发布!
- openEuler 有奖捉虫活动,来参与一下?
- 力扣 88.合并两个有序数组
- 如何在软件研发阶段落地安全实践
- Mysql, sqlserver Oracle database connection mode
- PMP practice once a day | don't get lost in the exam -7.7
- Dynamic addition of El upload upload component; El upload dynamically uploads files; El upload distinguishes which component uploads the file.
- Automatic classification of defective photovoltaic module cells in electroluminescence images-论文阅读笔记
- Openeuler prize catching activities, to participate in?
- Data island is the first danger encountered by enterprises in their digital transformation
猜你喜欢
Redis——基本使用(key、String、List、Set 、Zset 、Hash、Geo、Bitmap、Hyperloglog、事务 )
Dynamic addition of El upload upload component; El upload dynamically uploads files; El upload distinguishes which component uploads the file.
开源OA开发平台:合同管理使用手册
【STL】vector
最多可以参加的会议数目[贪心 + 优先队列]
The state cyberspace Office released the measures for data exit security assessment: 100000 information provided overseas needs to be declared
Open source heavy ware! Chapter 9 the open source project of ylarn causal learning of Yunji datacanvas company will be released soon!
Navicat连接2002 - Can‘t connect to local MySQL server through socket ‘/var/lib/mysql/mysql.sock‘解决
openEuler 资源利用率提升之道 01:概论
剑指 Offer II 013. 二维子矩阵的和
随机推荐
torch. nn. functional. Pad (input, pad, mode= 'constant', value=none) record
剑指 Offer II 013. 二维子矩阵的和
Force buckle 599 Minimum index sum of two lists
841. 字符串哈希
PMP practice once a day | don't get lost in the exam -7.7
华南X99平台打鸡血教程
Force buckle 674 Longest continuous increasing sequence
第二十章 使用工作队列管理器(三)
Vulnhub's funfox2
RESTAPI 版本控制策略【eolink 翻译】
Gorilla official: sample code for golang to open websocket client
Force buckle 1232 Dotted line
Some important knowledge of MySQL
R language ggplot2 visualization: use the ggdensity function of ggpubr package to visualize the packet density graph, and use stat_ overlay_ normal_ The density function superimposes the positive dist
Welcome to the markdown editor
力扣 1037.有效的回旋镖
力扣674. 最长连续递增序列
Flink并行度和Slot详解
Force buckle 599 Minimum index sum of two lists
毕业季|遗憾而又幸运的毕业季