当前位置：网站首页>Sub-Category Optimization for Multi-View Multi-Pose Object Detection

Sub-Category Optimization for Multi-View Multi-Pose Object Detection

2022-06-21 05:57:00 【Wanderer001】

Reference resources Sub-Category Optimization for Multi-View Multi-Pose Object Detection - cloud + Community - Tencent cloud

Abstract

Object class detection with large changes in appearance is a basic problem in the field of computer vision . Because of the variability inside the class 、 Viewing angle and lighting , The appearance of the target category may change . For target categories with large changes in appearance , You need to use a subcategory based approach . This paper puts forward a kind of A subcategory optimization method that automatically divides a target category into an appropriate number of subcategories based on appearance changes . We do not use predefined in class subcategories based on domain knowledge or validation data sets , Instead, unsupervised clustering based on distinguishing image features is used to divide the sample space . Then the clustering performance is verified by sub - category discriminant analysis . Clustering performance based on unsupervised method and results of subcategory discriminant analysis , The optimal number of subcategories for each target category is determined . A large number of experimental results show that the use of two standards and the author's own database . The comparison shows that , Our method is superior to the most advanced one .

1、 brief introduction

Classify general target categories with large appearance changes 、 Detection and clustering are challenging tasks in computer vision . The image appearance of the object category will change due to many factors , For example, there are great changes within the category ( Different textures and colors )、 Different lighting conditions 、 Direction / Viewing direction and posture . For target categories with small intra class changes ,Viola and Jones The proposed cascade structure classifier is an effective solution . however , For more diverse patterns with large appearance changes , Such as multi view automobile 、 Cows and dogs , A more powerful classifier model is needed . One of the basic principles of this classifier is [2] and [3] Divide and conquer method used in . In their approach , When a category cannot be modeled as a whole , It is divided into several subcategories , Then the system learns a model for them . however , When the appearance changes due to many factors , It is difficult to find a major attribute to divide the sample . Manually assigning subcategory labels to training samples can be difficult , And it's time-consuming . Besides , Sub classification based on domain knowledge may not be the optimal classification task .

This article defines some simple rules , It can be used to automatically determine the optimal number of subcategories of a target category . Specially , We put forward two criteria . In the first rule , We do not use the predefined subclass method based on domain knowledge , Instead, an unsupervised clustering algorithm based on distinguishing image features is used to divide the sample space . So , We use probabilistic latent semantic analysis (pLSA) Model [4] Target specific clustering probability . Such a topic model has recently been classified in the target class [5]、[6] The success of the project has aroused great interest in the frontiers of topic optimization and sub classification . especially ,Fritz Et al. Proposed a representation , Use the topic model to decompose 、 Discover and detect visual target categories . However , In their approach , The total number of subcategories in the learning process is fixed , Therefore, their number has not been optimized on the training data set .

In the second criterion , The number of clusters under each category is taken as the number of subcategories , We use subcategory discriminant analysis (SDA) technology [7] To analyze the discrimination ability of a given number of sub categories . Since the real challenge and our goal is to find the optimal number of subcategories of an object class , We use SDA To find the discriminant power that optimizes the number of subcategories , Not like it Zhu Et al. Proposed to optimize the classification performance on cross validation data sets by using the same number of subcategories .

2、 Subclass optimization

In this section , We describe our subcategory optimization method , It combines clustering performance analysis with subcategory discriminant analysis . Start with the image , We first show our data representation . Then we describe how to apply the topic model to this representation and generate clusters for each target category . Last , Take the generated cluster as the classification , The mixing coefficient is used as the distinguishing feature , The discriminant power is analyzed .

A、 The data shows

In order to build pLSA Visual vocabulary and vocabulary of the model , We detect and describe interest points from all training images . The combination of point of interest detector and invariant local descriptor shows the interesting function of describing image and target . In this study , We use Canny The edge detector constructs the edge mapping of the image . Then the local interest points on the edge graph are detected in two stages . First , We use He and Yung Proposed technology , Local and global curvature properties based on edges , Reliably detect corners in edge mapping , And take the whole corner as a set of keys . In the second phase , Select the remaining total key points by uniformly sampling the edge of the object . Each generated key uses a radius of r = 10 Round patch Upper 128 dimension SIFT Descriptors to describe . utilize k- Mean clustering algorithm for SIFT Descriptor for vector quantization , Form a visual vocabulary . stay pLSA In the formula of , We calculate a co-occurrence table , Each of these targets represents a collection of visual terms provided for the visual vocabulary .

B、 Cluster performance analysis

In the learning phase ,pLSA The model will target d A visual word in Each observation with a cluster variable Connect , The maximum likelihood principle is used to determine $P(w \mid z)$ and $P(d \mid z)$ . For all training images , Mixing coefficient $P\left(z_{k} \mid d_{j}\right)$ Can be regarded as target characteristics , Used for classification . We calculated $P\left(z_{k} \mid d_{j}\right)$ by ：

If D_i It means the first one i A collection of class targets , that $P(z_k|D_i) = \frac{1}{n_i}\sum_{l= 1}^{n_i} P(z_k|d_l)$ It means that D_i Next z_k The probability of a cluster . here n_i It's No i Total number of class targets . $\sum_{z\in Z}p(z|D_i)=1$ , among $Z=\left\{z_{1}, z_{2}, \ldots z_{K}\right\}$ and Is the total number of clusters . about Target categories ,. We can calculate the total cluster probability sum as ,), among Li Is assigned to i A list of clusters of categories . To determine the optimal number of clusters , We need to count each possible number of clusters Use the following formula to calculate $E_{K}$ value , Then select the minimum value .

If D_i Represents the target category i Set , then For given D_i when , cluster z_k Probability . So for the second i A target class , among , also K Is the total number of clusters . Yes C Target class of . We can calculate the total cluster probability sum as , among L_i Is assigned to the A list of clusters of categories . To determine the optimal number of clusters , We need to count each possible number of clusters K Use the following formula to calculate E_K value , Then select the minimum value .

However , It can be seen from the experimental evaluation that , For more complex target categories with large changes in appearance , Using this criterion alone is not always stable . So in the next section , In addition to this rule , We will use the above clustering and mixing coefficients to determine a more stable optimization criterion .

C、 Subclass distinction analysis

Once the mixed Gaussian distribution is used to determine the data distribution of each cluster , It is easy to use the following generalized eigenvalue decomposition equation to find the discriminant vector for the best classification of data ,

S_W Is the scattering matrix in the subclass , S_B Is the scattering matrix between subclasses ( Both matrices are symmetric , Positive semidefinite ),V Is a matrix whose column corresponds to the distinguishing vector , $\Lambda$ Is the corresponding eigenvalue of a diagonal matrix ., here $p_{ij}$ and $\mu_{ij}$ They are the first i Of the categories j A subclass , A priori and mean .Ki It's a category i Number of subcategories in , Is the total number of subcategories , be equal to pLSA The total number of clusters generated by the model . We can define the scattering matrix within the subcategory as , among $x_{i j}$ For the category i Of the Samples ( Mixing coefficient ).

For subcategory discriminant analysis , We have to make the scattering matrix at the same time S_B The given measure is the largest $x_{ij}$ and S_W The calculated measure is the smallest . however , In some cases , This may not be done in parallel . under these circumstances ,(3) Possible misclassification . Fortunately, , This problem is easy to find , Because when this happens , S_B and S_W The angle between the first eigenvectors is very small . This can be formally calculated as [9],, among u_i and w_i yes S_W and S_B Eigenvector of , With the first i Large eigenvalue correlation ,m < rank( S_B ). We want the above defined The value of is as small as possible . We can calculate the normalized value of this equation for Every possible value of ,

To determine stability criteria , We calculate by pLSA For each possible cluster generated by the model E_K and P_K Average value . We are right. K = C To (C + r) Repeat the process , among r The value of can be specified by the user , To find the one that minimizes the average K Value .

3、 Experimental setup

A、 Subclass optimization performance

To measure classification and optimize performance , We used ETH-80 database , It contains 8 Category 3280 Zhang image . Each category is included in a total of 41 Shot from a viewpoint 10 An image of an object . In our experiment , A category of 10 In one example 10 Images from different views were used as test data sets , The rest of the images are used as training data sets . therefore , At each stage we used 80 Test images and 3200 Training images . Because each instance has 41 Individual view , So there are the same number (41 individual ) Training for / Testing phase . At each training stage , We use our algorithm to determine the optimal number of subcategories of the target category . chart 1(a) For one of the training stages, the optimization result shown by the circular mark on the graph . In this picture ,y The axis represents a conflict metric , This measure is achieved by taking the E_K and P_K To calculate the average of . Optimization occurs in 53 Subcategory . The number of subcategories of each target category is shown in Table 1 . chart 2 For the category car Of 8 Of the optimal subcategories 5 Subcategory ( The first 1 To 5 That's ok ) Several typical images of .

In the process of classification , We start with an image d_j Extract a visual word from w_i , Then each visual word is clustered with the highest probability of a specific word P(w_i|z_k) To classify . then , Classify targets according to the maximum number of visual words that support a particular cluster . chart 1(b) Average classification results for eight target categories . If there are no subcategories , We can only get 59.5% The recognition accuracy of . After subcategory optimization , The best average classification accuracy is 84.75%. Our average classification results are consistent with Zhu Etc. [7] Quite a . They used a feature-based approach to get 82% The recognition accuracy of . And [7] comparison , Our better results may be due to the subcategory optimization of each category , Instead of dividing each target category into the same number of subcategories .

B、ETHZ Performance of shape database

In this section , We compare our method with others [10]、[5] Method in ETHZ Compare the performance on the shape class ,ETHZ The shape class contains a total of 255 Images , These images follow the apple logo (40)、 Bottle (48)、 giraffe (87)、 glass (48) And swan (32) division . We use and [10] Compare our methods with the same parameter settings described in . Use [10] Five fold cross validation proposed in , We got the watch II Results in .

The optimization algorithm will 5 Target categories are broken down into 11 Subcategory . In the identification phase , We extracted a visual vocabulary bag (BOVW), And use the optimized model , Use our recent [11] To generate a promising hypothesis . The assumed position is then verified using a support vector machine classifier using the following features specifically $\chi^{2}$ Merge kernels ,

In the first part on the right (5) Measure the similarity between the direction gradient histograms of the pyramid (PHOG) Quantity level of feature set and pyramid L = 2 , Part II measures BOVW Similarity between feature sets , $\alpha$ and $\beta$ The weight of PHOG BOVW kernel . In all cases , Except for the apple logo , Our method performs better than the other two methods . Our average detection and location performance ratio Fritz Etc. [5] Improved 4.3%, Than Ferrari Etc. [10] Improved 12.3%

C、 Recognition performance of author database

In this section , We will show 10 Detection and location performance of class database . It consists of working with our application ( Service robots ) Related daily goals consist of , These goals are in different environments with a messy 、 The real background corresponds to . Our database contains images of multiple objects, each image , And create a truth bounding box with the ground . All in all 809 A picture , contain 2138 An object . among 630 Goals (340 Images ) Used for training ,1508 Objects (469 Images ) Used to test the system . During training , Optimization occurs in 10 Target categories 21 Subcategory . As shown in Table 3 , If you don't subclass , The average detection and location rate of our system (DLR) by 61%, by 0.64 FPPI. On the other hand , Use the optimal number of subcategories , The system will average DLR Up to 84.5%, take FPPI from 0.64 Down to 0.61. chart 3 Some test results of our method are shown , Different databases are recorded in the clutter background 、 Partial occlusion 、 Performance under significant scale and viewpoint changes .

4、 Conclusion

In this paper , We propose a subcategory optimization method , It can optimize a target category into an appropriate number of subcategories . Our method can also distinguish different target categories and use specific functions χ2 Merge kernel shape and appearance features . Experimental results show that , This method can learn the model from the image labeled by the bounding box , And there are a lot of clutter 、 Detect and locate the boundaries of new class instances in the case of scale and viewpoint changes and intra class variability .

原网站

版权声明
本文为[Wanderer001]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/172/202206210550153795.html