当前位置:网站首页>[pytorch] picture enlargement
[pytorch] picture enlargement
2022-07-26 06:16:00 【Li Junfeng】
Preface
In the process of neural network training , Often need a lot of pictures , A lot of data , Otherwise, it may cause over fitting and under fitting . However, not all of them can find the right data , Because the cost of labeling is too high , Therefore, it is very necessary to make good use of the existing data .
Picture enlargement
In layman's terms , It is through the pictures with labels , Generate new pictures . This sounds a little incredible , But it is indeed an effective method .
Consider such a picture :
It's a rose , Look at the picture below :

It is still a rose .
It's not hard to see. , By adjusting the brightness 、 Contrast, etc , And cutting , You can quickly generate a different picture with labels .
torchvision
This is in computer vision , A very easy-to-use bag , especially torchvision.transforms You can almost complete the operations mentioned above .
Let's look at an example :
class Flower_Dataset(Dataset):
def __init__(self, path , is_train, augs):
data_root = pathlib.Path(path)
all_image_paths = list(data_root.glob('*/*'))
self.all_image_paths = [str(path) for path in all_image_paths]
label_names = sorted(item.name for item in data_root.glob('*/') if item.is_dir())
label_to_index = dict((label, index) for index, label in enumerate(label_names))
self.all_image = [cv.imread(path) for path in self.all_image_paths]
self.all_image_labels = [label_to_index[path.parent.name] for path in all_image_paths]
if is_train:
self.transformer = transforms.Compose([
transforms.ToPILImage(),
transforms.Resize((224,224)),
augs,
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
else:
self.transformer = transforms.Compose([
transforms.ToPILImage(),
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
def __getitem__(self, index):
img = self.all_image[index]
img = self.transformer(img)
label = self.all_image_labels[index]
label = torch.tensor(label)
return img, label
def __len__(self):
return len(self.all_image_paths)
color_aug = torchvision.transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)
augs = torchvision.transforms.Compose([torchvision.transforms.RandomHorizontalFlip(), color_aug])
Here we define a dataset with image augmentation , It can randomly reverse the horizontal direction of the picture , And change the brightness .
- Horizontal reversal and vertical reversal should be used with caution , Because some things are not the original things after reversal . For example, in identifying English letters , Letter
bAfter reversing, it becomesp. This is not only unable to bring new data to Neural Networks , On the contrary, it will mislead Neural Networks .
Function and significance
Image augmentation plays a great role in image recognition .
- In a very cheap way , Just need a little more computing power , You can get a lot of data . A picture is randomly cropped , Change highlights, etc , Theoretically, you can get countless photos .
- Avoid over fitting , Through various transformations , It can make the neural network see more pictures , Improve the generalization ability .
Deficiencies and areas needing attention
Although image enlargement has many functions , But it also has some shortcomings and needs attention .
- Be careful when reversing , This point has also been mentioned above .
- May cause under fitting , This situation is often caused by the difference between the picture enlargement and the actual situation . For example, an automatic vending machine , It needs to identify items that customers take from vending machines . Under normal circumstances , There are lights in the vending machine , Bright objects . And if a large number of dim pictures are produced in the picture enlargement ( The distribution of training set and test set is different ), It will let the neural network learn how to recognize dim objects , Instead of recognizing bright objects .
- May mislead Neural Networks , For example, when cutting pictures , Cut a cat to a tail , This will obviously mislead the neural network in identifying cylindrical objects such as sticks . That is to say, the label of the picture may be changed after the picture is enlarged .
边栏推荐
- Modifiers should be declared in the correct order 修饰符应按正确的顺序声明
- Traversal of the first, middle, and last order of a binary tree -- Essence (each node is a "root" node)
- Webapi collation
- 二叉树的前中后序遍历——本质(每个节点都是“根”节点)
- Matlab vector and matrix
- Latex merges multiple rows and columns of a table at the same time
- H. Take the elevator greedy
- Xiao He shows his sharp corners and says hello to flutter app
- Redis sentinel cluster setup
- Niuke network: TOPK problem of additive sum between two ordinal groups
猜你喜欢

移动web

Introduction of four redis cluster schemes + comparison of advantages and disadvantages

VS中使用动态库

Excitation method and excitation voltage of hand-held vibrating wire vh501tc acquisition instrument

YOLOv6:又快又准的目标检测框架开源啦

Docking wechat payment (II) unified order API

【无标题】

Registration conditions for system integration project management engineer (intermediate level of soft exam) in the second half of 2022

【Day_06 0423】不要二

How can machinery manufacturing enterprises do well in production management with the help of ERP system?
随机推荐
2022年下半年系统集成项目管理工程师(软考中级)报名条件
Jz36 binary search tree and bidirectional linked list
The time complexity of two recursive entries in a recursive function
招标信息获取
【Day_07 0425】合法括号序列判断
Optical quantum milestone: 3854 variable problems solved in 6 minutes
将金额数字转换为大写
Mysql45 talks about transaction isolation: why can't I see it after you change it?
[Hangzhou][15k-20k] medical diagnosis company recruits golang development engineers without overtime! No overtime! No overtime!
递归处理——子问题
Amd zen4 game God u reached 208mb cache within this year, which is unprecedented
Alibaba cloud OSS binding custom domain name
WebAPI整理
C语言进阶——可存档通讯录(文件)
[day03_0420] C language multiple choice questions
CCTV dialogue ZTE: why must the database be in your own hands?
Knowledge precipitation I: what does an architect do? What problems have been solved
Leetcode:934. The shortest Bridge
H. Take the elevator greedy
[(SV & UVM) knowledge points encountered in written interview] ~ phase mechanism