当前位置:网站首页>PyTorch crop images differentiablly
PyTorch crop images differentiablly
2022-07-03 15:08:00 【trium_ KW】
Intro
PyTorch provides a variety of means to crop images. For example, torchvision.transforms provides several functions to crop PIL images; PyTorch Forum provides an answer of how to crop image in a differentiable way (differentiable with respect to the image). However, sometimes we need a fully differentiable approach for the cropping action itself. How shall we implement that?
Theory: Affine transformation
Before reaching the answer, we need first to learn about the image coordinate system in PyTorch. It is a left-handed Cartesian system origined at the middle of an image. The coordinate has been normalized to range [ − 1 , 1 ] [-1,1] [−1,1], where ( − 1 , − 1 ) (-1,-1) (−1,−1) indicates the top-left corner, and ( 1 , 1 ) (1,1) (1,1) indicates the bottom-right corner, as pointed out by the doc.
Let ( x , y ) (x,y) (x,y) be the top-left corner of the cropped image with respect to the coordinate of the original image; likewise, we denote ( x ′ , y ′ ) (x',y') (x′,y′) as the bottom-right corner of the cropped image. It’s clear that ( x , y ) (x,y) (x,y) corresponds to ( − 1 , − 1 ) (-1,-1) (−1,−1) with respect to the cropped image coordinate system, and ( x ′ , y ′ ) (x',y') (x′,y′) corresponds to ( 1 , 1 ) (1,1) (1,1). We’d like a function f f f that maps from the cropped image system to the original image system for every point in the cropped image. Since only scaling and translation are involved, the function f f f can be parameterized by an affine transformation matrix Θ \Theta Θ such that
Θ = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) \Theta = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=⎝⎛θ11000θ220θ13θ231⎠⎞
where θ 12 = θ 21 = 0 \theta_{12}=\theta_{21}=0 θ12=θ21=0 since skewing is not involved. Denote u H \mathbf{u}_H uH as the homogeneous coordinate of u = ( u v ) T \mathbf{u}=\begin{pmatrix}u & v\\ \end{pmatrix}^T u=(uv)T such that u H = ( u T 1 ) T \mathbf{u}_H=\begin{pmatrix}\mathbf{u}^T&1\end{pmatrix}^T uH=(uT1)T, Θ \Theta Θ maps u H \mathbf{u}_H uH with respect to the cropped image system to x H \mathbf{x}_H xH with respect to the original image system, i.e. x H = Θ u H \mathbf{x}_H = \Theta \mathbf{u}_H xH=ΘuH. Thus,
( x x ′ y y ′ 1 1 ) = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) ( − 1 1 − 1 1 1 1 ) \begin{pmatrix} x & x'\\ y & y'\\ 1 & 1 \end{pmatrix} = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} \begin{pmatrix} -1 & 1\\ -1 & 1\\ 1 & 1\\ \end{pmatrix} ⎝⎛xy1x′y′1⎠⎞=⎝⎛θ11000θ220θ13θ231⎠⎞⎝⎛−1−11111⎠⎞
Solving the equations,
Θ = ( x ′ − x 2 0 x ′ + x 2 0 y ′ − y 2 y ′ + y 2 0 0 1 ) \Theta = \begin{pmatrix} \frac{x'-x}{2} & 0 & \frac{x'+x}{2}\\ 0 & \frac{y'-y}{2} & \frac{y'+y}{2}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=⎝⎛2x′−x0002y′−y02x′+x2y′+y1⎠⎞
where x ′ ≥ x , y ′ ≥ y x'\ge x, y' \ge y x′≥x,y′≥y.
Coding time
We’ll need two functions:
torch.nn.functional.affine_gridto convert the Θ \Theta Θ parameterization to f f ftorch.nn.functional.grid_sampleto find the corresponding original image coordinate from each cropped image coordinate
import torch
import torch.nn.functional as F
B, C, H, W = 16, 3, 224, 224 # batch size, input channels
# original image height and width
# Let `I` be our original image
I = torch.rand(B, C, H, W)
# Set the (x,y) and (x',y') to define the rectangular region to crop
x, y = -0.5, -0.3 # some examplary random coordinates;
x_, y_ = 0.7, 0.8 # in practice, (x,y,x_,y_) might be predicted
# as a tensor in the computation graph
# Set the affine parameters
theta = torch.tensor([
[(x_-x)/2, 0, (x_+x)/2],
[ 0,(y_-y)/2, (y_+y)/2],
]).unsqueeze_(0).expand(B, -1, -1)
# compute the flow field;
# where size is the output size (scaling involved)
# `align_corners` option must be the same throughout the code
f = F.affine_grid(theta, size=(B, C, H//2, W//2), align_corners=False)
I_cropped = F.grid_sample(I, f, align_corners=False)
Read also
边栏推荐
- [ue4] material and shader permutation
- .NET六大设计原则个人白话理解,有误请大神指正
- 【pytorch学习笔记】Datasets and Dataloaders
- 解决pushgateway数据多次推送会覆盖的问题
- Open under vs2019 UI file QT designer flash back problem
- Didi off the shelf! Data security is national security
- SQL server安装位置改不了
- How to color ordinary landscape photos, PS tutorial
- Leetcode sword offer find the number I (nine) in the sorted array
- Redis主从、哨兵、集群模式介绍
猜你喜欢

B2020 分糖果

Vs+qt application development, set software icon icon

5-1 blocking / non blocking, synchronous / asynchronous

什么是embedding(把物体编码为一个低维稠密向量),pytorch中nn.Embedding原理及使用

4-24--4-28

C string format (decimal point retention / decimal conversion, etc.)

QT program font becomes larger on computers with different resolutions, overflowing controls

Introduction to opengl4.0 tutorial computing shaders

【云原生训练营】模块八 Kubernetes 生命周期管理和服务发现

Devaxpress: range selection control rangecontrol uses
随机推荐
什么是Label encoding?one-hot encoding ,label encoding两种编码该如何区分和使用?
[ue4] Niagara's indirect draw
What is machine reading comprehension? What are the applications? Finally someone made it clear
Puppet自动化运维排错案例
How does vs+qt set the software version copyright, obtain the software version and display the version number?
Série yolov5 (i) - - netron, un outil de visualisation de réseau
【pytorch学习笔记】Transforms
Besides lying flat, what else can a 27 year old do in life?
Using Tengine to solve the session problem of load balancing
.NET六大设计原则个人白话理解,有误请大神指正
Global and Chinese markets for transparent OLED displays 2022-2028: Research Report on technology, participants, trends, market size and share
Byte practice plane longitude 2
Global and Chinese market of optical fiber connectors 2022-2028: Research Report on technology, participants, trends, market size and share
Global and Chinese market of solder bars 2022-2028: Research Report on technology, participants, trends, market size and share
Yolov5 advanced 8 format conversion between high and low versions
Chapter 14 class part 1
[Yu Yue education] scientific computing and MATLAB language reference materials of Central South University
【可能是全中文网最全】pushgateway入门笔记
Troubleshooting method of CPU surge
Vs+qt application development, set software icon icon