当前位置：网站首页>[pytorch] picture enlargement

[pytorch] picture enlargement

2022-07-26 06:16:00 【Li Junfeng】

Preface

In the process of neural network training , Often need a lot of pictures , A lot of data , Otherwise, it may cause over fitting and under fitting . However, not all of them can find the right data , Because the cost of labeling is too high , Therefore, it is very necessary to make good use of the existing data .

Picture enlargement

In layman's terms , It is through the pictures with labels , Generate new pictures . This sounds a little incredible , But it is indeed an effective method .
Consider such a picture ：
Insert picture description here
It's a rose , Look at the picture below ：

It is still a rose .
It's not hard to see. , By adjusting the brightness 、 Contrast, etc , And cutting , You can quickly generate a different picture with labels .

torchvision

This is in computer vision , A very easy-to-use bag , especially torchvision.transforms You can almost complete the operations mentioned above .
Let's look at an example ：

class Flower_Dataset(Dataset):
    def __init__(self, path , is_train, augs):
        data_root = pathlib.Path(path)
        all_image_paths = list(data_root.glob('*/*'))
        self.all_image_paths = [str(path) for path in all_image_paths]
        label_names = sorted(item.name for item in data_root.glob('*/') if item.is_dir())
        label_to_index = dict((label, index) for index, label in enumerate(label_names))
        self.all_image = [cv.imread(path) for path in self.all_image_paths]
        self.all_image_labels = [label_to_index[path.parent.name] for path in all_image_paths]
        if is_train:  
            self.transformer = transforms.Compose([
                transforms.ToPILImage(),
                transforms.Resize((224,224)),
                augs,
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
            ])
        else:
            self.transformer = transforms.Compose([
                transforms.ToPILImage(),
                transforms.Resize((224,224)),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
            ])
        
    def __getitem__(self, index):
        img = self.all_image[index]
        img = self.transformer(img)
        label = self.all_image_labels[index]
        label = torch.tensor(label)
        return img, label

    def __len__(self):
        return len(self.all_image_paths)

color_aug = torchvision.transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)
augs = torchvision.transforms.Compose([torchvision.transforms.RandomHorizontalFlip(), color_aug])

Here we define a dataset with image augmentation , It can randomly reverse the horizontal direction of the picture , And change the brightness .

Horizontal reversal and vertical reversal should be used with caution , Because some things are not the original things after reversal . For example, in identifying English letters , Letter b After reversing, it becomes p. This is not only unable to bring new data to Neural Networks , On the contrary, it will mislead Neural Networks .

Function and significance

Image augmentation plays a great role in image recognition .

In a very cheap way , Just need a little more computing power , You can get a lot of data . A picture is randomly cropped , Change highlights, etc , Theoretically, you can get countless photos .
Avoid over fitting , Through various transformations , It can make the neural network see more pictures , Improve the generalization ability .

Deficiencies and areas needing attention

Although image enlargement has many functions , But it also has some shortcomings and needs attention .

Be careful when reversing , This point has also been mentioned above .
May cause under fitting , This situation is often caused by the difference between the picture enlargement and the actual situation . For example, an automatic vending machine , It needs to identify items that customers take from vending machines . Under normal circumstances , There are lights in the vending machine , Bright objects . And if a large number of dim pictures are produced in the picture enlargement （ The distribution of training set and test set is different ）, It will let the neural network learn how to recognize dim objects , Instead of recognizing bright objects .
May mislead Neural Networks , For example, when cutting pictures , Cut a cat to a tail , This will obviously mislead the neural network in identifying cylindrical objects such as sticks . That is to say, the label of the picture may be changed after the picture is enlarged .

原网站

版权声明
本文为[Li Junfeng]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/207/202207260611041408.html