当前位置:网站首页>PyTorch crop images differentiablly
PyTorch crop images differentiablly
2022-07-03 15:08:00 【trium_ KW】
Intro
PyTorch provides a variety of means to crop images. For example, torchvision.transforms provides several functions to crop PIL
images; PyTorch Forum provides an answer of how to crop image in a differentiable way (differentiable with respect to the image). However, sometimes we need a fully differentiable approach for the cropping action itself. How shall we implement that?
Theory: Affine transformation
Before reaching the answer, we need first to learn about the image coordinate system in PyTorch. It is a left-handed Cartesian system origined at the middle of an image. The coordinate has been normalized to range [ − 1 , 1 ] [-1,1] [−1,1], where ( − 1 , − 1 ) (-1,-1) (−1,−1) indicates the top-left corner, and ( 1 , 1 ) (1,1) (1,1) indicates the bottom-right corner, as pointed out by the doc.
Let ( x , y ) (x,y) (x,y) be the top-left corner of the cropped image with respect to the coordinate of the original image; likewise, we denote ( x ′ , y ′ ) (x',y') (x′,y′) as the bottom-right corner of the cropped image. It’s clear that ( x , y ) (x,y) (x,y) corresponds to ( − 1 , − 1 ) (-1,-1) (−1,−1) with respect to the cropped image coordinate system, and ( x ′ , y ′ ) (x',y') (x′,y′) corresponds to ( 1 , 1 ) (1,1) (1,1). We’d like a function f f f that maps from the cropped image system to the original image system for every point in the cropped image. Since only scaling and translation are involved, the function f f f can be parameterized by an affine transformation matrix Θ \Theta Θ such that
Θ = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) \Theta = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=⎝⎛θ11000θ220θ13θ231⎠⎞
where θ 12 = θ 21 = 0 \theta_{12}=\theta_{21}=0 θ12=θ21=0 since skewing is not involved. Denote u H \mathbf{u}_H uH as the homogeneous coordinate of u = ( u v ) T \mathbf{u}=\begin{pmatrix}u & v\\ \end{pmatrix}^T u=(uv)T such that u H = ( u T 1 ) T \mathbf{u}_H=\begin{pmatrix}\mathbf{u}^T&1\end{pmatrix}^T uH=(uT1)T, Θ \Theta Θ maps u H \mathbf{u}_H uH with respect to the cropped image system to x H \mathbf{x}_H xH with respect to the original image system, i.e. x H = Θ u H \mathbf{x}_H = \Theta \mathbf{u}_H xH=ΘuH. Thus,
( x x ′ y y ′ 1 1 ) = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) ( − 1 1 − 1 1 1 1 ) \begin{pmatrix} x & x'\\ y & y'\\ 1 & 1 \end{pmatrix} = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} \begin{pmatrix} -1 & 1\\ -1 & 1\\ 1 & 1\\ \end{pmatrix} ⎝⎛xy1x′y′1⎠⎞=⎝⎛θ11000θ220θ13θ231⎠⎞⎝⎛−1−11111⎠⎞
Solving the equations,
Θ = ( x ′ − x 2 0 x ′ + x 2 0 y ′ − y 2 y ′ + y 2 0 0 1 ) \Theta = \begin{pmatrix} \frac{x'-x}{2} & 0 & \frac{x'+x}{2}\\ 0 & \frac{y'-y}{2} & \frac{y'+y}{2}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=⎝⎛2x′−x0002y′−y02x′+x2y′+y1⎠⎞
where x ′ ≥ x , y ′ ≥ y x'\ge x, y' \ge y x′≥x,y′≥y.
Coding time
We’ll need two functions:
torch.nn.functional.affine_grid
to convert the Θ \Theta Θ parameterization to f f ftorch.nn.functional.grid_sample
to find the corresponding original image coordinate from each cropped image coordinate
import torch
import torch.nn.functional as F
B, C, H, W = 16, 3, 224, 224 # batch size, input channels
# original image height and width
# Let `I` be our original image
I = torch.rand(B, C, H, W)
# Set the (x,y) and (x',y') to define the rectangular region to crop
x, y = -0.5, -0.3 # some examplary random coordinates;
x_, y_ = 0.7, 0.8 # in practice, (x,y,x_,y_) might be predicted
# as a tensor in the computation graph
# Set the affine parameters
theta = torch.tensor([
[(x_-x)/2, 0, (x_+x)/2],
[ 0,(y_-y)/2, (y_+y)/2],
]).unsqueeze_(0).expand(B, -1, -1)
# compute the flow field;
# where size is the output size (scaling involved)
# `align_corners` option must be the same throughout the code
f = F.affine_grid(theta, size=(B, C, H//2, W//2), align_corners=False)
I_cropped = F.grid_sample(I, f, align_corners=False)
Read also
边栏推荐
- 【日常训练】395. 至少有 K 个重复字符的最长子串
- What is one hot encoding? In pytoch, there are two ways to turn label into one hot coding
- el-switch 赋值后状态不变化
- The first character of leetcode sword offer that only appears once (12)
- The method of parameter estimation of user-defined function in MATLAB
- Global and Chinese markets for sterile packaging 2022-2028: Research Report on technology, participants, trends, market size and share
- [attention mechanism] [first vit] Detr, end to end object detection with transformers the main components of the network are CNN and transformer
- 5-1 blocking / non blocking, synchronous / asynchronous
- 406. 根据身高重建队列
- [pytorch learning notes] transforms
猜你喜欢
【云原生训练营】模块八 Kubernetes 生命周期管理和服务发现
[engine development] in depth GPU and rendering optimization (basic)
Redis主从、哨兵、集群模式介绍
【Transformer】入门篇-哈佛Harvard NLP的原作者在2018年初以逐行实现的形式呈现了论文The Annotated Transformer
5-1 blocking / non blocking, synchronous / asynchronous
CentOS7部署哨兵Redis(带架构图,清晰易懂)
QT program font becomes larger on computers with different resolutions, overflowing controls
The latest M1 dedicated Au update Adobe audit CC 2021 Chinese direct installation version has solved the problems of M1 installation without flash back!
Composite type (custom type)
什么是embedding(把物体编码为一个低维稠密向量),pytorch中nn.Embedding原理及使用
随机推荐
socket.io搭建分布式Web推送服务器
Global and Chinese market of lighting control components 2022-2028: Research Report on technology, participants, trends, market size and share
4-29——4.32
Pytorch深度学习和目标检测实战笔记
5.4-5.5
Composite type (custom type)
Tencent internship interview sorting
Global and Chinese market of trimethylamine 2022-2028: Research Report on technology, participants, trends, market size and share
TPS61170QDRVRQ1
Neon global and Chinese markets 2022-2028: Research Report on technology, participants, trends, market size and share
B2020 points candy
Stress test WebService with JMeter
Global and Chinese markets for sterile packaging 2022-2028: Research Report on technology, participants, trends, market size and share
Kubernetes - YAML文件解读
Use of Tex editor
[set theory] inclusion exclusion principle (complex example)
Leetcode the smallest number of the rotation array of the offer of the sword (11)
C # realizes the login interface, and the password asterisk is displayed (hide the input password)
Solve the problem that PR cannot be installed on win10 system. Pr2021 version -premiere Pro 2021 official Chinese version installation tutorial
基于SVN分支开发模式流程浅析