当前位置:网站首页>Machine learning notes - explore object detection datasets using streamlit
Machine learning notes - explore object detection datasets using streamlit
2022-07-07 20:10:00 【Sit and watch the clouds rise】
One 、 Explore the importance of data sets
Datasets are usually the weakest link in most data science projects . Building a good dataset can be very difficult . So at least understand the dataset we use , And there are ways to explore and discuss it .
1、 For the project team
Explain metrics and errors and improve the quality of data sets : The team can easily check what is wrong with a set of predictions . Maybe the data doesn't look as expected ? Maybe the comment is wrong ? If that's the case , It should be easy to repair .
Find out if the model can handle a situation and plan new functions : Sometimes the product owner / Customers will ask whether the model can handle new labels or new situations . The team should be able to give the first answer and some actions faster .
2、 For annotation team
If the annotation team has a dashboard to explore the current dataset , It can use it to answer questions :
“ How to mark this item ?” : By looking for similar examples in the dashboard and finding the correct labels
“ What should I do in case of accident ?”: It can find situations such as blocked tags in the current data set .
Will improve data quality ! But it will also make them more autonomous , And give the project team more time to deal with the rest of the project .
3、 For end users 、 Integrators 、 sales
They don't have to ask what the label means , And be able to display examples and solve problems by yourself .
Two 、 Use Streamlit Explore COCO Object detection dataset
Let's look at a use Streamlit Explore examples of object detection data sets .
We will use MS COCO Data sets toaster Class .
Why choose toaster ? This is one of the smallest presentation classes : The training focuses on 225 A bounding box , The validation set has 9 individual . It looks simple , Because this is a widely used and relatively standard object .
1、COCO What is the toaster of a dataset ?
The first thing we can notice by exploring datasets is that there are two types of toasters :
2、 Mislabeled data
3、 Callout box error
4、 Overlapping objects
When multiple objects are close , What are you doing ? Are you making a bounding box or a grouping for each object ? The common answer is to place one for each object . But sometimes it can be really difficult , For example, a pile of oranges below :
5、 Occluded
When an object is hidden behind another object , The usual way is Note only visible surfaces , As long as it is greater than the given proportion of the expected object ( Usually > 20%). however , It may affect what you do with it in the future ( Training and post-processing ). Another good practice is to mark comments as truncated or blocked .
6、 Difficult to recognize
Some objects may not be recognized : Too small 、 Blocked or the image quality may not be good enough . Most of the time The answer is not to comment on it at all , But you can also mark it as hard.
The real problem is : Do you want to teach your model to test these ? Do you want to use this object to affect your indicators ? Is this the goal of your final product ?
7、 Picture in picture / Mirror
Often overlooked , It happens in most crowdsourcing / Crawl / End user based datasets , Rather than data sets in a controlled environment . The usual decision is to mark them normally , But you can decide not to do so in a specific use case , Or decide to mark them so that they can be distinguished later . Here it represents the training set 5%.
3、 ... and 、 Set up Streamlit instrument panel
There are several ways to set up a dashboard to explore your object detection data set .
First , You can use prefabricated tools , for example Voxel51 The team's amazing FiftyOne, Maybe one day Google Will use Know You Data. secondly , You can use Streamlit or Dash Wait for dashboard solutions to build your own solutions . Choosing the first solution will make it easier to empower your advanced tools , For example, user authentication 、 Dataset Management 、 Model to evaluate …… And easier to maintain .
Setting up the second solution may be faster , It will also provide you with greater flexibility for custom solutions , And you will be able to share code and practices between project code and different dashboards .
I will show you how to use Streamlit Operate the dashboard , Because it shows Streamlit And a good way to easily integrate it to explore object detection data sets .
The complete code of this small example is in here .
If you want to see the complete code of the deployed application , You can go to here .
边栏推荐
- Force buckle 1232 Dotted line
- Chapter 9 Yunji datacanvas was rated as 36 krypton "the hard core technology enterprise most concerned by investors"
- 力扣 2315.统计星号
- LeetCode力扣(剑指offer 36-39)36. 二叉搜索树与双向链表37. 序列化二叉树38. 字符串的排列39. 数组中出现次数超过一半的数字
- School 1 of vulnhub
- YoloV6:YoloV6+Win10---训练自己得数据集
- PMP practice once a day | don't get lost in the exam -7.7
- pom.xml 配置文件标签:dependencies 和 dependencyManagement 区别
- Dynamic addition of El upload upload component; El upload dynamically uploads files; El upload distinguishes which component uploads the file.
- Cuda版本不一致,编译apex报错
猜你喜欢
机器学习笔记 - 使用Streamlit探索对象检测数据集
Detailed explanation of Flink parallelism and slot
Simulate the implementation of string class
Chapter 9 Yunji datacanvas company won the highest honor of the "fifth digital finance innovation competition"!
Force buckle 599 Minimum index sum of two lists
Make this crmeb single merchant wechat mall system popular, so easy to use!
Opencv学习笔记 高动态范围 (HDR) 成像
The state cyberspace Office released the measures for data exit security assessment: 100000 information provided overseas needs to be declared
Compiler optimization (4): inductive variables
openEuler 有奖捉虫活动,来参与一下?
随机推荐
Force buckle 599 Minimum index sum of two lists
LeetCode_ 7_ five
开源OA开发平台:合同管理使用手册
Welcome to the markdown editor
pom.xml 配置文件标签作用简述
华南X99平台打鸡血教程
Throughput
openEuler 资源利用率提升之道 01:概论
力扣 2315.统计星号
数据孤岛是企业数字化转型遇到的第一道险关
九章云极DataCanvas公司获评36氪「最受投资人关注的硬核科技企业」
力扣 599. 两个列表的最小索引总和
vulnhub之Funfox2
【Auto.js】自动化脚本
使用高斯Redis实现二级索引
mysql 的一些重要知识
Vulnhub's funfox2
vulnhub之tre1
Chapter 20 using work queue manager (3)
关于cv2.dnn.readNetFromONNX(path)就报ERROR during processing node with 3 inputs and 1 outputs的解决过程【独家发布】