当前位置:网站首页>PyTorch crop images differentiablly
PyTorch crop images differentiablly
2022-07-03 15:08:00 【trium_ KW】
Intro
PyTorch provides a variety of means to crop images. For example, torchvision.transforms provides several functions to crop PIL images; PyTorch Forum provides an answer of how to crop image in a differentiable way (differentiable with respect to the image). However, sometimes we need a fully differentiable approach for the cropping action itself. How shall we implement that?
Theory: Affine transformation
Before reaching the answer, we need first to learn about the image coordinate system in PyTorch. It is a left-handed Cartesian system origined at the middle of an image. The coordinate has been normalized to range [ − 1 , 1 ] [-1,1] [−1,1], where ( − 1 , − 1 ) (-1,-1) (−1,−1) indicates the top-left corner, and ( 1 , 1 ) (1,1) (1,1) indicates the bottom-right corner, as pointed out by the doc.
Let ( x , y ) (x,y) (x,y) be the top-left corner of the cropped image with respect to the coordinate of the original image; likewise, we denote ( x ′ , y ′ ) (x',y') (x′,y′) as the bottom-right corner of the cropped image. It’s clear that ( x , y ) (x,y) (x,y) corresponds to ( − 1 , − 1 ) (-1,-1) (−1,−1) with respect to the cropped image coordinate system, and ( x ′ , y ′ ) (x',y') (x′,y′) corresponds to ( 1 , 1 ) (1,1) (1,1). We’d like a function f f f that maps from the cropped image system to the original image system for every point in the cropped image. Since only scaling and translation are involved, the function f f f can be parameterized by an affine transformation matrix Θ \Theta Θ such that
Θ = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) \Theta = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=⎝⎛θ11000θ220θ13θ231⎠⎞
where θ 12 = θ 21 = 0 \theta_{12}=\theta_{21}=0 θ12=θ21=0 since skewing is not involved. Denote u H \mathbf{u}_H uH as the homogeneous coordinate of u = ( u v ) T \mathbf{u}=\begin{pmatrix}u & v\\ \end{pmatrix}^T u=(uv)T such that u H = ( u T 1 ) T \mathbf{u}_H=\begin{pmatrix}\mathbf{u}^T&1\end{pmatrix}^T uH=(uT1)T, Θ \Theta Θ maps u H \mathbf{u}_H uH with respect to the cropped image system to x H \mathbf{x}_H xH with respect to the original image system, i.e. x H = Θ u H \mathbf{x}_H = \Theta \mathbf{u}_H xH=ΘuH. Thus,
( x x ′ y y ′ 1 1 ) = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) ( − 1 1 − 1 1 1 1 ) \begin{pmatrix} x & x'\\ y & y'\\ 1 & 1 \end{pmatrix} = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} \begin{pmatrix} -1 & 1\\ -1 & 1\\ 1 & 1\\ \end{pmatrix} ⎝⎛xy1x′y′1⎠⎞=⎝⎛θ11000θ220θ13θ231⎠⎞⎝⎛−1−11111⎠⎞
Solving the equations,
Θ = ( x ′ − x 2 0 x ′ + x 2 0 y ′ − y 2 y ′ + y 2 0 0 1 ) \Theta = \begin{pmatrix} \frac{x'-x}{2} & 0 & \frac{x'+x}{2}\\ 0 & \frac{y'-y}{2} & \frac{y'+y}{2}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=⎝⎛2x′−x0002y′−y02x′+x2y′+y1⎠⎞
where x ′ ≥ x , y ′ ≥ y x'\ge x, y' \ge y x′≥x,y′≥y.
Coding time
We’ll need two functions:
torch.nn.functional.affine_gridto convert the Θ \Theta Θ parameterization to f f ftorch.nn.functional.grid_sampleto find the corresponding original image coordinate from each cropped image coordinate
import torch
import torch.nn.functional as F
B, C, H, W = 16, 3, 224, 224 # batch size, input channels
# original image height and width
# Let `I` be our original image
I = torch.rand(B, C, H, W)
# Set the (x,y) and (x',y') to define the rectangular region to crop
x, y = -0.5, -0.3 # some examplary random coordinates;
x_, y_ = 0.7, 0.8 # in practice, (x,y,x_,y_) might be predicted
# as a tensor in the computation graph
# Set the affine parameters
theta = torch.tensor([
[(x_-x)/2, 0, (x_+x)/2],
[ 0,(y_-y)/2, (y_+y)/2],
]).unsqueeze_(0).expand(B, -1, -1)
# compute the flow field;
# where size is the output size (scaling involved)
# `align_corners` option must be the same throughout the code
f = F.affine_grid(theta, size=(B, C, H//2, W//2), align_corners=False)
I_cropped = F.grid_sample(I, f, align_corners=False)
Read also
边栏推荐
- Global and Chinese markets for infrared solutions (for industrial, civil, national defense and security applications) 2022-2028: Research Report on technology, participants, trends, market size and sh
- App global exception capture
- el-switch 赋值后状态不变化
- What is one hot encoding? In pytoch, there are two ways to turn label into one hot coding
- [opengl] pre bake using computational shaders
- [engine development] in depth GPU and rendering optimization (basic)
- Basic SQL tutorial
- . Net six design principles personal vernacular understanding, please correct if there is any error
- Tensor ellipsis (three points) slice
- C string format (decimal point retention / decimal conversion, etc.)
猜你喜欢

北京共有产权房出租新规实施的租赁案例

远程服务器后台挂起 nohup

There are links in the linked list. Can you walk three steps faster or slower
![[attention mechanism] [first vit] Detr, end to end object detection with transformers the main components of the network are CNN and transformer](/img/9b/6ca8375ef8689a80d437665909ae30.png)
[attention mechanism] [first vit] Detr, end to end object detection with transformers the main components of the network are CNN and transformer

【可能是全中文网最全】pushgateway入门笔记
![[ue4] cascading shadow CSM](/img/83/f4dfda3bd5ba0172676c450ba7693b.jpg)
[ue4] cascading shadow CSM

QT program font becomes larger on computers with different resolutions, overflowing controls

Vs+qt multithreading implementation -- run and movetothread

Solve the problem that pushgateway data will be overwritten by multiple push

B2020 分糖果
随机推荐
【微信小程序】WXSS 模板样式
Tensor ellipsis (three points) slice
高并发下之redis锁优化实战
【pytorch学习笔记】Datasets and Dataloaders
Basic SQL tutorial
Stress test WebService with JMeter
Besides lying flat, what else can a 27 year old do in life?
Explanation of time complexity and space complexity
mmdetection 学习率与batch_size关系
Global and Chinese market of lighting control components 2022-2028: Research Report on technology, participants, trends, market size and share
[opengl] geometry shader
【pytorch学习笔记】Transforms
使用JMeter对WebService进行压力测试
Qt—绘制其他东西
PS tips - draw green earth with a brush
基于SVN分支开发模式流程浅析
[opengl] advanced chapter of texture - principle of flowmap
. Net six design principles personal vernacular understanding, please correct if there is any error
创业团队如何落地敏捷测试,提升质量效能?丨声网开发者创业讲堂 Vol.03
TPS61170QDRVRQ1