当前位置:网站首页>Machine learning notes - explore object detection datasets using streamlit

Machine learning notes - explore object detection datasets using streamlit

2022-07-07 20:10:00 Sit and watch the clouds rise

One 、 Explore the importance of data sets

         Datasets are usually the weakest link in most data science projects . Building a good dataset can be very difficult . So at least understand the dataset we use , And there are ways to explore and discuss it .

1、 For the project team

         Explain metrics and errors and improve the quality of data sets : The team can easily check what is wrong with a set of predictions . Maybe the data doesn't look as expected ? Maybe the comment is wrong ? If that's the case , It should be easy to repair .

         Find out if the model can handle a situation and plan new functions : Sometimes the product owner / Customers will ask whether the model can handle new labels or new situations . The team should be able to give the first answer and some actions faster .

2、 For annotation team

         If the annotation team has a dashboard to explore the current dataset , It can use it to answer questions :

        “ How to mark this item ?” : By looking for similar examples in the dashboard and finding the correct labels

        “ What should I do in case of accident ?”: It can find situations such as blocked tags in the current data set .

         Will improve data quality ! But it will also make them more autonomous , And give the project team more time to deal with the rest of the project .

3、 For end users 、 Integrators 、 sales

         They don't have to ask what the label means , And be able to display examples and solve problems by yourself .

Two 、 Use Streamlit Explore COCO Object detection dataset

         Let's look at a use Streamlit Explore examples of object detection data sets .

         We will use MS COCO Data sets toaster Class .

         Why choose toaster ? This is one of the smallest presentation classes : The training focuses on 225 A bounding box , The validation set has 9 individual . It looks simple , Because this is a widely used and relatively standard object .

https://share.streamlit.io/wirg/explore-object-detection-demo/main/explore_annotations.py/?label=toasterhttps://share.streamlit.io/wirg/explore-object-detection-demo/main/explore_annotations.py/?label=toaster

1、COCO What is the toaster of a dataset ?

         The first thing we can notice by exploring datasets is that there are two types of toasters :

 

2、 Mislabeled data

 3、 Callout box error

4、 Overlapping objects

         When multiple objects are close , What are you doing ? Are you making a bounding box or a grouping for each object ? The common answer is to place one for each object . But sometimes it can be really difficult , For example, a pile of oranges below :

 5、 Occluded

         When an object is hidden behind another object , The usual way is Note only visible surfaces , As long as it is greater than the given proportion of the expected object ( Usually > 20%). however , It may affect what you do with it in the future ( Training and post-processing ). Another good practice is to mark comments as truncated or blocked .

 6、 Difficult to recognize

         Some objects may not be recognized : Too small 、 Blocked or the image quality may not be good enough . Most of the time The answer is not to comment on it at all , But you can also mark it as hard.
         The real problem is : Do you want to teach your model to test these ? Do you want to use this object to affect your indicators ? Is this the goal of your final product ?

7、 Picture in picture / Mirror  

         Often overlooked , It happens in most crowdsourcing / Crawl / End user based datasets , Rather than data sets in a controlled environment . The usual decision is to mark them normally , But you can decide not to do so in a specific use case , Or decide to mark them so that they can be distinguished later . Here it represents the training set 5%.

  3、 ... and 、 Set up Streamlit instrument panel

         There are several ways to set up a dashboard to explore your object detection data set .

         First , You can use prefabricated tools , for example Voxel51 The team's amazing FiftyOne, Maybe one day Google Will use Know You Data. secondly , You can use Streamlit or Dash Wait for dashboard solutions to build your own solutions . Choosing the first solution will make it easier to empower your advanced tools , For example, user authentication 、 Dataset Management 、 Model to evaluate …… And easier to maintain .

         Setting up the second solution may be faster , It will also provide you with greater flexibility for custom solutions , And you will be able to share code and practices between project code and different dashboards .

         I will show you how to use Streamlit Operate the dashboard , Because it shows Streamlit And a good way to easily integrate it to explore object detection data sets .

         The complete code of this small example is in here .

         If you want to see the complete code of the deployed application , You can go to here .

原网站

版权声明
本文为[Sit and watch the clouds rise]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071802552283.html