当前位置：网站首页>Machine learning notes - explore object detection datasets using streamlit

Machine learning notes - explore object detection datasets using streamlit

2022-07-07 20:10:00 【Sit and watch the clouds rise】

One 、 Explore the importance of data sets

Datasets are usually the weakest link in most data science projects . Building a good dataset can be very difficult . So at least understand the dataset we use , And there are ways to explore and discuss it .

1、 For the project team

Explain metrics and errors and improve the quality of data sets ： The team can easily check what is wrong with a set of predictions . Maybe the data doesn't look as expected ？ Maybe the comment is wrong ？ If that's the case , It should be easy to repair .

Find out if the model can handle a situation and plan new functions ： Sometimes the product owner / Customers will ask whether the model can handle new labels or new situations . The team should be able to give the first answer and some actions faster .

2、 For annotation team

If the annotation team has a dashboard to explore the current dataset , It can use it to answer questions ：

“ How to mark this item ？” ： By looking for similar examples in the dashboard and finding the correct labels

“ What should I do in case of accident ？”： It can find situations such as blocked tags in the current data set .

Will improve data quality ！ But it will also make them more autonomous , And give the project team more time to deal with the rest of the project .

3、 For end users 、 Integrators 、 sales

They don't have to ask what the label means , And be able to display examples and solve problems by yourself .

Two 、 Use Streamlit Explore COCO Object detection dataset

Let's look at a use Streamlit Explore examples of object detection data sets .

We will use MS COCO Data sets toaster Class .

Why choose toaster ？ This is one of the smallest presentation classes ： The training focuses on 225 A bounding box , The validation set has 9 individual . It looks simple , Because this is a widely used and relatively standard object .

https://share.streamlit.io/wirg/explore-object-detection-demo/main/explore_annotations.py/?label=toasterhttps://share.streamlit.io/wirg/explore-object-detection-demo/main/explore_annotations.py/?label=toaster

1、COCO What is the toaster of a dataset ？

The first thing we can notice by exploring datasets is that there are two types of toasters ：

2、 Mislabeled data

3、 Callout box error

4、 Overlapping objects

When multiple objects are close , What are you doing ？ Are you making a bounding box or a grouping for each object ？ The common answer is to place one for each object . But sometimes it can be really difficult , For example, a pile of oranges below ：

5、 Occluded

When an object is hidden behind another object , The usual way is Note only visible surfaces , As long as it is greater than the given proportion of the expected object （ Usually > 20%）. however , It may affect what you do with it in the future （ Training and post-processing ）. Another good practice is to mark comments as truncated or blocked .

6、 Difficult to recognize

Some objects may not be recognized ： Too small 、 Blocked or the image quality may not be good enough . Most of the time The answer is not to comment on it at all , But you can also mark it as hard.
The real problem is ： Do you want to teach your model to test these ？ Do you want to use this object to affect your indicators ？ Is this the goal of your final product ？

7、 Picture in picture / Mirror

Often overlooked , It happens in most crowdsourcing / Crawl / End user based datasets , Rather than data sets in a controlled environment . The usual decision is to mark them normally , But you can decide not to do so in a specific use case , Or decide to mark them so that they can be distinguished later . Here it represents the training set 5%.

3、 ... and 、 Set up Streamlit instrument panel

There are several ways to set up a dashboard to explore your object detection data set .

First , You can use prefabricated tools , for example Voxel51 The team's amazing FiftyOne, Maybe one day Google Will use Know You Data. secondly , You can use Streamlit or Dash Wait for dashboard solutions to build your own solutions . Choosing the first solution will make it easier to empower your advanced tools , For example, user authentication 、 Dataset Management 、 Model to evaluate …… And easier to maintain .

Setting up the second solution may be faster , It will also provide you with greater flexibility for custom solutions , And you will be able to share code and practices between project code and different dashboards .

I will show you how to use Streamlit Operate the dashboard , Because it shows Streamlit And a good way to easily integrate it to explore object detection data sets .

The complete code of this small example is in here .

If you want to see the complete code of the deployed application , You can go to here .