当前位置:网站首页>Machine learning notes - explore object detection datasets using streamlit
Machine learning notes - explore object detection datasets using streamlit
2022-07-07 20:10:00 【Sit and watch the clouds rise】
One 、 Explore the importance of data sets
Datasets are usually the weakest link in most data science projects . Building a good dataset can be very difficult . So at least understand the dataset we use , And there are ways to explore and discuss it .
1、 For the project team
Explain metrics and errors and improve the quality of data sets : The team can easily check what is wrong with a set of predictions . Maybe the data doesn't look as expected ? Maybe the comment is wrong ? If that's the case , It should be easy to repair .
Find out if the model can handle a situation and plan new functions : Sometimes the product owner / Customers will ask whether the model can handle new labels or new situations . The team should be able to give the first answer and some actions faster .
2、 For annotation team
If the annotation team has a dashboard to explore the current dataset , It can use it to answer questions :
“ How to mark this item ?” : By looking for similar examples in the dashboard and finding the correct labels
“ What should I do in case of accident ?”: It can find situations such as blocked tags in the current data set .
Will improve data quality ! But it will also make them more autonomous , And give the project team more time to deal with the rest of the project .
3、 For end users 、 Integrators 、 sales
They don't have to ask what the label means , And be able to display examples and solve problems by yourself .
Two 、 Use Streamlit Explore COCO Object detection dataset
Let's look at a use Streamlit Explore examples of object detection data sets .
We will use MS COCO Data sets toaster Class .
Why choose toaster ? This is one of the smallest presentation classes : The training focuses on 225 A bounding box , The validation set has 9 individual . It looks simple , Because this is a widely used and relatively standard object .
1、COCO What is the toaster of a dataset ?
The first thing we can notice by exploring datasets is that there are two types of toasters :
2、 Mislabeled data
3、 Callout box error
4、 Overlapping objects
When multiple objects are close , What are you doing ? Are you making a bounding box or a grouping for each object ? The common answer is to place one for each object . But sometimes it can be really difficult , For example, a pile of oranges below :
5、 Occluded
When an object is hidden behind another object , The usual way is Note only visible surfaces , As long as it is greater than the given proportion of the expected object ( Usually > 20%). however , It may affect what you do with it in the future ( Training and post-processing ). Another good practice is to mark comments as truncated or blocked .
6、 Difficult to recognize
Some objects may not be recognized : Too small 、 Blocked or the image quality may not be good enough . Most of the time The answer is not to comment on it at all , But you can also mark it as hard.
The real problem is : Do you want to teach your model to test these ? Do you want to use this object to affect your indicators ? Is this the goal of your final product ?
7、 Picture in picture / Mirror
Often overlooked , It happens in most crowdsourcing / Crawl / End user based datasets , Rather than data sets in a controlled environment . The usual decision is to mark them normally , But you can decide not to do so in a specific use case , Or decide to mark them so that they can be distinguished later . Here it represents the training set 5%.
3、 ... and 、 Set up Streamlit instrument panel
There are several ways to set up a dashboard to explore your object detection data set .
First , You can use prefabricated tools , for example Voxel51 The team's amazing FiftyOne, Maybe one day Google Will use Know You Data. secondly , You can use Streamlit or Dash Wait for dashboard solutions to build your own solutions . Choosing the first solution will make it easier to empower your advanced tools , For example, user authentication 、 Dataset Management 、 Model to evaluate …… And easier to maintain .
Setting up the second solution may be faster , It will also provide you with greater flexibility for custom solutions , And you will be able to share code and practices between project code and different dashboards .
I will show you how to use Streamlit Operate the dashboard , Because it shows Streamlit And a good way to easily integrate it to explore object detection data sets .
The complete code of this small example is in here .
If you want to see the complete code of the deployed application , You can go to here .
边栏推荐
- JVM 类加载机制
- MIT科技评论文章:围绕Gato等模型的AGI炒作可能使人们忽视真正重要的问题
- vulnhub之school 1
- Navicat连接2002 - Can‘t connect to local MySQL server through socket ‘/var/lib/mysql/mysql.sock‘解决
- The boundary of Bi: what is bi not suitable for? Master data, Martech? How to expand?
- LeetCode_ 7_ five
- Chapter 9 Yunji datacanvas was rated as 36 krypton "the hard core technology enterprise most concerned by investors"
- Chapter 20 using work queue manager (3)
- Tp6 realize Commission ranking
- mock.js从对象数组中任选数据返回一个数组
猜你喜欢
Openeuler prize catching activities, to participate in?
The state cyberspace Office released the measures for data exit security assessment: 100000 information provided overseas needs to be declared
编译器优化那些事儿(4):归纳变量
BI的边界:BI不适合做什么?主数据、MarTech?该如何扩展?
openEuler 资源利用率提升之道 01:概论
力扣 599. 两个列表的最小索引总和
9 atomic operation class 18 Rohan enhancement
Welcome to the markdown editor
Redis——基本使用(key、String、List、Set 、Zset 、Hash、Geo、Bitmap、Hyperloglog、事务 )
PMP practice once a day | don't get lost in the exam -7.7
随机推荐
Sword finger offer II 013 Sum of two-dimensional submatrix
mysql 的一些重要知识
RESTAPI 版本控制策略【eolink 翻译】
My creation anniversary
841. 字符串哈希
Simulate the implementation of string class
Force buckle 599 Minimum index sum of two lists
力扣599. 两个列表的最小索引总和
Classification automatique des cellules de modules photovoltaïques par défaut dans les images de lecture électronique - notes de lecture de thèse
使用高斯Redis实现二级索引
c语言如何判定是32位系统还是64位系统
MIT science and technology review article: AgI hype around Gato and other models may make people ignore the really important issues
JVM 类加载机制
浅尝不辄止系列之试试腾讯云的TUIRoom(晚上有约,未完待续...)
Force buckle 459 Duplicate substring
Force buckle 2319 Judge whether the matrix is an X matrix
R language dplyr package select function, group_ The by function, filter function and do function obtain the third largest value of a specific numerical data column in a specified level in a specified
力扣 1232.缀点成线
Gorilla official: sample code for golang to open websocket client
Yolov6:yolov6+win10--- train your own dataset