当前位置：网站首页>[pytorch] kaggle image classification competition arcface + bounding box code learning

[pytorch] kaggle image classification competition arcface + bounding box code learning

2022-06-13 02:11:00 【liyihao76】

[pytorch] Kaggle Picture classification competition ArcFace + bounding box Code learning

Code link
Basic knowledge of
Data preprocessing
Code details

The data in the competition contains data from 28 Of different research institutions 30 Different species （ Whales and dolphins ） Of 15,000 Images of several unique individual marine mammals . The competition requirement is to test the set of individuals id The classification of .
kaggle Game data details and data set download ：Happywhale - Whale and Dolphin Identification

Code link

arcface Is one of the best ways to perform in this competition . Code ：

Basic knowledge of

[ theory ] Measure learning Metric Learning

Measure learning （Metric Learning） It is a method often used in the process of machine learning , It can use a series of observations , Construct the corresponding measurement function , So as to learn the distance or difference between the data , Effectively describe the similarity between samples . This measure function is applicable to observations with high similarity , A small distance value will be returned ; For observations that vary widely , A large distance value will be returned . When the sample size is small , Measurement learning in dealing with the accuracy and efficiency of classification tasks , Showing significant advantages .

However , If the classification task to be handled is very complex , With multiple categories 、 Small sample, etc , Deep measurement learning combining deep learning and measurement learning （（Deep Metric Learning, abbreviation DML））, Is the real king . Depth metric learning is also called distance metric learning （Distance Metric Learning）. Compared to measuring learning , Depth measurement learning can make nonlinear mapping to input features .

By training a person based on CNN Nonlinear feature extraction module or encoder , Depth measurement learning can combine the extracted image features （Embedding） Embedded in the nearest neighbor location , At the same time, with the help of Euclidean distance 、cosine Equidistance measurement , Distinguish different image features .

Depth measurement learning in CV Some extreme classification tasks in the field （ There are many categories 、 The sample size is insufficient ） Excellent performance in , It is widely used in face recognition 、 Pedestrian recognition 、 image retrieval 、 Target tracking 、 Feature matching and other scenarios .

Reference link ：

[ theory ] bounding box object detection

In the image classification task , We assume that there is only one main object in the image , We only focus on how to identify its categories . However , Many times, there are multiple targets in the image that we are interested in , We don't just want to know their categories , Also want to get their specific position in the image . In computer vision , We call this kind of task target detection （object detection） Or target recognition （object recognition）.

In target detection , We usually use bounding boxes （bounding box） To describe the spatial location of the object . The bounding box is rectangular , From the top left corner and the bottom right corner of the rectangle x and y Coordinates determine . Another common boundary box representation is the center of the boundary box (x,y) Axis coordinates and the width and height of the box .
Insert picture description here

Reference link ：
1.CNN: bounding box prediction 01 problem
2.CNN: bounding box prediction - specify bounding box
3.CNN: bounding box prediction - YOLO algo
4.CNN: 3.9 YOLO Algorithm part1
5.CNN: 3.9 YOLO Algorithm part2

[python] logging modular

So in Python in , How can it be regarded as a standard logging process ？ Maybe a lot of people will use print Statement to output some running information , And then watch it on the console , At run time, the output is redirected to a file, and the output stream is saved to a file , In fact, this is very irregular , stay Python There is a standard logging modular , We can use it for tagging logging , With it, we can log more easily , At the same time, it can also make more convenient level distinction and record some additional log information , Such as time 、 Running module information, etc .
Let's take a look at the overall framework of the logging process .
Insert picture description here

Reference link ：

Data preprocessing

First , According to the results of our statistics ：[pytorch] Kaggle Large image data sets Data analysis + visualization
The size of data pictures varies greatly , Among them, the location of whales or dolphins we want to detect is also at sixes and sevens
Things to know before starting image preprocessing
Here you can see some extreme cases

therefore , The first step in image processing is to identify dolphins / The position of the whale in the picture , So , We used bounding box[YOLOv5].
Happywhale: BoundingBox [YOLOv5]
In this code , We will use YOLOv5 Generate bounding box . The purpose of this is to make the following image crop Provide direction , Thus, the images of data sets with different sizes can be cropped , Finally, better classification results can be achieved .

We use Whale Flute Data sets ( the other one Kaggle Competition data , Whale tail fin positioning ) To train and test BoundingBox Model , We have 1200 Samples with bounding boxes . after , We will use Whale Flute Model to our Whale and Dolphin Data sets for prediction .

Whales Fluke The bounding box in the dataset is very large , and Whales & Dolphin Data sets have both small and large bounding boxes . To adjust this problem , You can try to change hyp.yaml In the document scale Parameters . The default value is 0.5, You can try increasing the value . You can also try to bbox Zoom in , for example 1.5x or 1.7x. This will ensure that you do not crop to whales or dolphins .

After determining the position of the bounding box , We continue to clip the image to get the image we need for classification
Happywhale: Cropped Dataset [YOLOv5]
Insert picture description here
Final , After resizing , We get a new dataset image .
Data sets ：JPEG Happywhale 384x384

Code details

To configure

!pip install timm
!pip install pytorch-metric-learning[with-hooks]

Open source measurement learning library pytorch-metric-learning, It integrates various commonly used measurement learning methods , It's a very useful tool .

import os
import glob
import pandas as pd
import numpy as np
import logging
import timm
from tqdm.notebook import tqdm # Progress bar 

import torch
import torch.nn as nn
import torch.optim as optim

from torch.utils.data import Dataset, DataLoader
from torchvision.io import ImageReadMode, read_image
from torchvision.transforms import Compose, Lambda, Normalize, AutoAugment, AutoAugmentPolicy

import pytorch_metric_learning
import pytorch_metric_learning.utils.logging_presets as LP
from pytorch_metric_learning.utils import common_functions
from pytorch_metric_learning import losses, miners, samplers, testers, trainers
from pytorch_metric_learning.utils.accuracy_calculator import AccuracyCalculator
from pytorch_metric_learning.utils.inference import InferenceModel

for handler in logging.root.handlers[:]:
    logging.root.removeHandler(handler) # remove exactly the preexisting handler object

logging.getLogger().setLevel(logging.INFO) #  obtain logger example   Specify the minimum output level of the log 
logging.info("VERSION %s" % pytorch_metric_learning.__version__) #  Print library version

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device) # cuda:0
print(torch.cuda.get_device_name(0)) # NVIDIA RTX A6000

Parameters

MODEL_NAME='tf_efficientnet_b4_ns'
N_CLASSES=15587  # Number of individuals 
OUTPUT_SIZE = 1792
EMBEDDING_SIZE = 512
N_EPOCH=15
BATCH_SIZE=16
ACCUMULATION_STEPS = int(256 / BATCH_SIZE)
MODEL_LR = 1e-3
PCT_START=0.3
PATIENCE=5
N_WORKER=2
N_NEIGHBOURS = 750

Read csv data

df = pd.read_csv('./happy-whale-and-dolphin/train.csv')
df.head()

Insert picture description here

df['label'] = df.groupby('individual_id').ngroup()
df['label'].describe()

Insert picture description here
It realizes the conversion from species to tag numbers

df.groupby
groupby The process is to take the original DataFrame according to groupby Field of （ Here is individual_id）, Divided into groups DataFrame, There are as many groups as you are divided into DataFrame.Pandas course | Super easy to use Groupby Usage details
GroupBy.ngroup(self, ascending:bool = True) return= Unique number of each group .
Data summary df.describe()
A statistical table of all numeric columns with multiple rows will be returned , Each row is a statistical indicator , Total 、 The average 、 Standard deviation 、 Max min 、 Quartile, etc , It is still very useful for us to get a preliminary understanding of the data . If it is a time type, it will be time related, such as start and end time 、 Period and so on .

Divide the data set

Training set and verification set

valid_proportion = 0.1

valid_df = df.sample(frac=valid_proportion, replace=False, random_state=1).copy()
train_df = df[~df['image'].isin(valid_df['image'])].copy()

print(train_df.shape) # (45930, 4)
print(valid_df.shape) # (5103, 4)

Reset index on both since we want to use it for KNN lookups later:？

train_df.reset_index(drop=True, inplace=True)
valid_df.reset_index(drop=True, inplace=True)

Read image data

Create... For loading images dataset class .

class HappyWhaleDataset(Dataset):
    def __init__(
        self,
        df: pd.DataFrame,
        image_dir: str,
        return_labels=True,
    ):
        self.df = df
        self.images = self.df["image"]
        self.image_dir = image_dir
        self.image_transform = Compose(
            [
                AutoAugment(AutoAugmentPolicy.IMAGENET),
                Lambda(lambda x: x / 255),
                
            ]
        )
        self.return_labels = return_labels

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        
        image_path = os.path.join(self.image_dir, self.images.iloc[idx])
        image = read_image(path=image_path)
        image = self.image_transform(image)
        
        if self.return_labels:
            label = self.df['label'].iloc[idx] # iloc function ： Get line data by line number 
            return image, label
        else:
            return image

train_dataset = HappyWhaleDataset(df=train_df, image_dir=TRAIN_DIR, return_labels=True)
len(train_dataset)#45930
valid_dataset = HappyWhaleDataset(df=valid_df, image_dir=TRAIN_DIR, return_labels=True)
len(valid_dataset)#5103

dataset_dict = {
    "train": train_dataset, "val": valid_dataset}

Look at the training set
Insert picture description here

To be continued …

原网站

版权声明
本文为[liyihao76]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202280546486224.html