当前位置:网站首页>[data mining] visual pattern mining: hog feature + cosine similarity /k-means clustering

[data mining] visual pattern mining: hog feature + cosine similarity /k-means clustering

2022-07-07 15:07:00 zstar-_

1. An overview of the experiment

This experiment uses VOC2012 Data sets , First, randomly sample image blocks from the image , And then use it Hog Methods extract image block features , Finally, cosine similarity and k-means Clustering two methods to mine visual patterns .

2. Data set description

This experiment uses VOC2012 Data sets .VOC2012 Data sets are often used for target detection 、 Image segmentation 、 Network comparison experiment and model effect evaluation . For image segmentation tasks ,VOC2012 The training validation set of contains 2007-2011 All corresponding images of the year , contains 2913 Pictures and 6929 Goals , The test set contains only 2008-2011 year .
Because this data set is mostly used in tasks such as target detection , So in this experiment , Use only... In this dataset 8 Class data .
Dataset download link :
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html

3. Introduction to algorithm model

3.1 Hog feature extraction

Direction gradient histogram (Histogram of Oriented Gradient, HOG) Feature is a kind of feature descriptor used for object detection in computer vision and image processing . It constructs the feature by calculating and counting the histogram of gradient direction in the local region of the image , Its main steps are as follows :

  1. Graying
    Grayscale the picture , Filter out irrelevant color information .
  2. Standardization of color space
    use Gamma The input image is normalized in color space by correction method , You can adjust the contrast of the image , Reduce the impact of local shadow and light changes in the image , At the same time, it can suppress the interference of noise .
  3. Gradient calculation
    Calculate the gradient of each pixel in the image ( Including size and direction ), It can capture contour information , At the same time, further weaken the interference of light .
  4. Cell Divide
    Divide the image into small cells, With 8x8 The size of Cell For example , As shown in the figure :
     Insert picture description here
  5. Count each one cell The gradient histogram of
     Insert picture description here
    As shown in the figure , take 360° Divide into 18 Share , Each angle is 20°. meanwhile , Regardless of direction , That is, a total of 9 Angle category , Count the frequency of each category .
  6. Will each several cell Form a block
    With 2x2 The size of block For example , As shown in the figure :
     Insert picture description here
    The blue box in the figure indicates a cell, The yellow box indicates a block, take 2x2 individual cell Make up a block, At the same time in each block Normalize the gradient histogram . There are four main normalization methods :
     Insert picture description here
    The original paper states L2-Hys It's the best way [1], So in this experiment , The normalization method also uses L2-Hys.
  7. Move block
    Each one block With cell Move horizontally and vertically for spacing , Finally, all feature vectors are concatenated to get the Hog features .

3.2 Cosine similarity

Get the of each image block Hog After feature , The classification is carried out by calculating the cosine similarity of each image block feature vector , The calculation formula of cosine similarity is as follows :
 Insert picture description here

3.3 K-means clustering

Get the of each image block Hog After features , You can also use K-means Clustering to mine visual patterns .K-means The process of clustering is shown in the figure :
 Insert picture description here
First, two points are initialized randomly as clustering centers , Calculate the distance from each point to the cluster center , And cluster to the nearest cluster . after , Calculate the average coordinates of all points in each cluster , And take this average as a new clustering center ; Repeat these two steps , Until the cluster center of each class no longer changes , Complete clustering .

4. Frequency and discriminant evaluation indicators

4.1 Frequency evaluation index

If a pattern appears many times in a positive image , It is called frequent . In this experiment , The evaluation index of frequency refers to Wang Qiannan et al [2] The evaluation criteria of , Define the frequency formula as follows :
 Insert picture description here
In style ,N Represents the total number of samples of a certain type , S u , v S_{u,v} Su,v Represents the sample in this class u And the sample v Cosine similarity of , T f T_f Tf Represents the threshold .

4.2 Discriminant evaluation index

If a mode value appears in a positive image , Not in negative images , It is called discriminative . In this experiment , The average classification accuracy of visual patterns is used to define discrimination , The formula is as follows :

 Insert picture description here

In style ,M Represents the total number of sample categories , S m , l S_{m,l} Sm,l Represents the average value of the visual pattern of the sample in the class m And the sample v Cosine similarity of , T f T_f Tf Represents the threshold .

5. The experimental steps

5.1 Data set classification

This experiment adopts VOC2012 Data sets , Pictures are not classified by category . Therefore, first of all, according to the category , Divide the pictures containing this category . A total of 7 Categories :“ vehicle 、 Horse 、 cat 、 Dog 、 bird 、 sheep 、 cattle ”.
The core code for executing the partition is as follows :

def get_my_classes(Annotations_path, image_path, save_img_path, classes):
    xml_path = os.listdir(Annotations_path)
    for i in classes:
        if not os.path.exists(save_img_path+"/"+i):
            os.mkdir(save_img_path+i)
    for xmls in xml_path:
        print(Annotations_path+"/"+xmls)
        in_file = open(os.path.join(Annotations_path, xmls))
        print(in_file)
        tree = ET.parse(in_file)
        root = tree.getroot()
        if len(set(root.iter('object'))) != 1:
            continue
        for obj in root.iter('object'):
            cls_name = obj.find('name').text
            print(cls_name)
        try:
            shutil.copy(image_path+"/"+xmls[:-3]+"jpg", save_img_path+"/"+cls_name+"/"+xmls[:-3]+"jpg")
        except:
            continue

The result of code execution is as follows :
 Insert picture description here

5.2 Image block sampling

To sample image blocks , In this experiment, the method of random clipping is selected . Take the center point of each image as the benchmark , stay [- Picture length and width /6, Picture length and width /6] The center point is offset within the limited range of , So as to obtain the sampled image block , The sampling process is shown in the figure :
 Insert picture description here
For each image , Total random sampling 10 Sample blocks , The core code is as follows :

for each_image in os.listdir(IMAGE_INPUT_PATH):
  #  Full path of each image 
  image_input_fullname = IMAGE_INPUT_PATH + "/" + each_image
  #  utilize PIL Library opens every image 
  img = Image.open(image_input_fullname)
  #  Define crop picture left 、 On 、 Right 、 Pixel coordinates under 
  x_max = img.size[0]
  y_max = img.size[1]
  mid_point_x = int(x_max/2)
  mid_point_y = int(y_max/2)
  for i in range(0, 10):
      #  The center point is randomly offset 
      crop_x = mid_point_x + \
          random.randint(int(-mid_point_x/3), int(mid_point_x/3))
      crop_y = mid_point_y + \
          random.randint(int(-mid_point_y/3), int(mid_point_y/3))
      dis_x = x_max-crop_x
      dis_y = y_max-crop_y
      dis_min = min(dis_x, dis_y, crop_x, crop_y)  #  Get the range of change 
      down = crop_y + dis_min
      up = crop_y - dis_min
      right = crop_x + dis_min
      left = crop_x - dis_min
      BOX_LEFT, BOX_UP, BOX_RIGHT, BOX_DOWN = left, up, right, down
      #  Return a rectangular area from the original image , Area is a 4 Tuples define the upper left and lower right pixel coordinates 
      box = (BOX_LEFT, BOX_UP, BOX_RIGHT, BOX_DOWN)
      #  Conduct roi tailoring 
      roi_area = img.crop(box)
      #  The path of each image after clipping + name 
      image_output_fullname = IMAGE_OUTPUT_PATH + \
          "/" + str(i) + "_" + each_image
      #  Store the cropped image 
      roi_area.save(image_output_fullname)

5.3 Hog feature extraction

For each image block , Adjust its size to 256x256, And normalize it , Extract it Hog features , The core code is as follows :

#  Image preprocessing 
def preprocessing(src):
    gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)  #  Convert image to grayscale image 
    img = cv2.resize(gray, (256, 256))  #  Size adjustment g
    img = img/255.0    #  Data normalization 
    return img
#  extract hog features 
def extract_hog_features(X):
    image_descriptors = []
    for i in range(len(X)):
        '''  Parameter interpretation : orientations: Direction number  pixels_per_cell: Cell size  cells_per_block: Block size  block_norm: Optional block normalization method L2-Hys(L2 norm ) '''
        fd, _ = hog(X[i], orientations=9, pixels_per_cell=(
            16, 16), cells_per_block=(16, 16), block_norm='L2-Hys')
        image_descriptors.append(fd)  #  Stitching all the images hog features 
    return image_descriptors        #  All images of the training part are returned hog features 

there cell The size is (16,16),block The size is (16,16).

5.4 Mining through cosine similarity

According to the Hog features , Use the criteria of frequency and discrimination to calculate , The core code is as follows :

threshold = 0.6
group1 = []
group2 = []
group1.append(X_features[0])
for i in range(1, len(X_features)):
   res = cosine_similarity(X_features[0].reshape(1, -1), X_features[i].reshape(1, -1))
   if res > threshold:
       group1.append(X_features[i])
   else:
       group2.append(X_features[i])

According to the literature [2] Experience , The thresholds here are 0.6,0.7,0.8, The experimental results are shown in the next section .

5.5 adopt k-means Clustering method for mining

In this experiment , Another way is also adopted, that is k-means Clustering is a way to mine visual patterns , The core code is as follows :

cluster = KMeans(n_clusters=2, random_state=0)
y = cluster.fit_predict(X_features)
colors = ['blue', 'red']
plt.figure()
for i in range(len(y)):
    plt.scatter(X_features[i][0], X_features[i][1], color=colors[y[i]])
plt.title(" The first two dimensions are clustered to represent ")
plt.savefig("cluster.png")
plt.show()

With “ sheep ” As a frequent mining category , The first two dimensions of the mined positive and negative samples are visualized as follows :
 Insert picture description here

After extracting frequent visual patterns , Mining for discrimination , The method of cosine similarity is still used , The results are shown in the next section .

6. experimental result

The category selected for this experiment is “ sheep ”(sheep), Two methods and different cosine similarity thresholds are used , The numerical results are shown in the table below :

Cosine similarity method visual pattern mining frequency
threshold Frequency
0.60.623
0.70.863
0.80.999

You can find , As the threshold increases , The more frequent the mined visual patterns are .

Cosine similarity method visual pattern mining discriminant
threshold carhorsecatdogbirdcow Average discriminant
0.60.8870.2910.3150.2580.4890.1440.397
0.70.9570.5130.640.5670.7080.3820.628
0.80.9880.7910.9140.8410.8730.7160.854

Similar , As the threshold increases , The more discriminative the mined visual patterns are , When the threshold value is 0.8 when , The highest average discriminant is 0.854.

The visual patterns mined by this method , The visualization part is as follows :

 Insert picture description here

In this experiment K-means Mining visual patterns by clustering , The obtained frequency value is 0.707, For discriminative calculation , The method of cosine similarity is still used , The results are shown in the table below :

K-means The discrimination of visual patterns mined by clustering method
threshold carhorsecatdogbirdcow Average discriminant
0.60.8830.260.2990.2420.4930.1460.387
0.70.9560.5030.6320.5690.7130.3910.627
0.80.9870.7930.910.8430.8790.7240.856

Similar to the previous method , As the threshold increases , The more discriminative the mined visual patterns are , When the threshold value is 0.8 when , The highest average discriminant is 0.856.

The visual patterns mined by this method , The visualization part is as follows :
 Insert picture description here
The figure is visible , Although the visual patterns mined by the two methods have little difference in value , But the visualization results are different . The visual patterns mined by cosine similarity method are more about the facial features of sheep , and K-means The visual patterns mined by clustering are more about the physical characteristics of sheep .

7. Summary of the experiment

This experiment , Using the traditional Hog Feature extraction method , And use cosine similarity and K-means Mining visual patterns by clustering . Through this experiment , It can be found that there may be more than one visual mode of a certain kind of picture , In this experiment , Multiple visual modes are not considered . In this case , Adopt density based clustering [2] May be more suitable .

Besides , For visual pattern mining , Such as Hog、Sift And other traditional feature extraction methods , The representation ability is not strong , No abstract ability . Use similar CNN Neural network is a more mainstream and effective way at present .

reference

[1] N Dalal, B Triggs. Histograms of Oriented Gradients for Human Detection [J]. IEEE Computer Society Conference on Computer Vision & Pattern Recognition, 2005, 1(12): 886-893.
[2] Wang Qiannan . Research on visual pattern mining algorithm based on deep learning [D]. Xi'an University of Electronic Science and Technology ,2021.DOI:10.27389/d.cnki.gxadu.2021.002631.

Complete source code

Experimental report + Source code :
link :https://pan.baidu.com/s/131RbDp_LNGhmEaFREvgFdA?pwd=8888
Extraction code :8888

原网站

版权声明
本文为[zstar-_]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071301406694.html