当前位置:网站首页>PyTorch crop images differentiablly
PyTorch crop images differentiablly
2022-07-03 15:08:00 【trium_ KW】
Intro
PyTorch provides a variety of means to crop images. For example, torchvision.transforms provides several functions to crop PIL images; PyTorch Forum provides an answer of how to crop image in a differentiable way (differentiable with respect to the image). However, sometimes we need a fully differentiable approach for the cropping action itself. How shall we implement that?
Theory: Affine transformation
Before reaching the answer, we need first to learn about the image coordinate system in PyTorch. It is a left-handed Cartesian system origined at the middle of an image. The coordinate has been normalized to range [ − 1 , 1 ] [-1,1] [−1,1], where ( − 1 , − 1 ) (-1,-1) (−1,−1) indicates the top-left corner, and ( 1 , 1 ) (1,1) (1,1) indicates the bottom-right corner, as pointed out by the doc.
Let ( x , y ) (x,y) (x,y) be the top-left corner of the cropped image with respect to the coordinate of the original image; likewise, we denote ( x ′ , y ′ ) (x',y') (x′,y′) as the bottom-right corner of the cropped image. It’s clear that ( x , y ) (x,y) (x,y) corresponds to ( − 1 , − 1 ) (-1,-1) (−1,−1) with respect to the cropped image coordinate system, and ( x ′ , y ′ ) (x',y') (x′,y′) corresponds to ( 1 , 1 ) (1,1) (1,1). We’d like a function f f f that maps from the cropped image system to the original image system for every point in the cropped image. Since only scaling and translation are involved, the function f f f can be parameterized by an affine transformation matrix Θ \Theta Θ such that
Θ = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) \Theta = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=⎝⎛θ11000θ220θ13θ231⎠⎞
where θ 12 = θ 21 = 0 \theta_{12}=\theta_{21}=0 θ12=θ21=0 since skewing is not involved. Denote u H \mathbf{u}_H uH as the homogeneous coordinate of u = ( u v ) T \mathbf{u}=\begin{pmatrix}u & v\\ \end{pmatrix}^T u=(uv)T such that u H = ( u T 1 ) T \mathbf{u}_H=\begin{pmatrix}\mathbf{u}^T&1\end{pmatrix}^T uH=(uT1)T, Θ \Theta Θ maps u H \mathbf{u}_H uH with respect to the cropped image system to x H \mathbf{x}_H xH with respect to the original image system, i.e. x H = Θ u H \mathbf{x}_H = \Theta \mathbf{u}_H xH=ΘuH. Thus,
( x x ′ y y ′ 1 1 ) = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) ( − 1 1 − 1 1 1 1 ) \begin{pmatrix} x & x'\\ y & y'\\ 1 & 1 \end{pmatrix} = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} \begin{pmatrix} -1 & 1\\ -1 & 1\\ 1 & 1\\ \end{pmatrix} ⎝⎛xy1x′y′1⎠⎞=⎝⎛θ11000θ220θ13θ231⎠⎞⎝⎛−1−11111⎠⎞
Solving the equations,
Θ = ( x ′ − x 2 0 x ′ + x 2 0 y ′ − y 2 y ′ + y 2 0 0 1 ) \Theta = \begin{pmatrix} \frac{x'-x}{2} & 0 & \frac{x'+x}{2}\\ 0 & \frac{y'-y}{2} & \frac{y'+y}{2}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=⎝⎛2x′−x0002y′−y02x′+x2y′+y1⎠⎞
where x ′ ≥ x , y ′ ≥ y x'\ge x, y' \ge y x′≥x,y′≥y.
Coding time
We’ll need two functions:
torch.nn.functional.affine_gridto convert the Θ \Theta Θ parameterization to f f ftorch.nn.functional.grid_sampleto find the corresponding original image coordinate from each cropped image coordinate
import torch
import torch.nn.functional as F
B, C, H, W = 16, 3, 224, 224 # batch size, input channels
# original image height and width
# Let `I` be our original image
I = torch.rand(B, C, H, W)
# Set the (x,y) and (x',y') to define the rectangular region to crop
x, y = -0.5, -0.3 # some examplary random coordinates;
x_, y_ = 0.7, 0.8 # in practice, (x,y,x_,y_) might be predicted
# as a tensor in the computation graph
# Set the affine parameters
theta = torch.tensor([
[(x_-x)/2, 0, (x_+x)/2],
[ 0,(y_-y)/2, (y_+y)/2],
]).unsqueeze_(0).expand(B, -1, -1)
# compute the flow field;
# where size is the output size (scaling involved)
# `align_corners` option must be the same throughout the code
f = F.affine_grid(theta, size=(B, C, H//2, W//2), align_corners=False)
I_cropped = F.grid_sample(I, f, align_corners=False)
Read also
边栏推荐
- Global and Chinese market of postal automation systems 2022-2028: Research Report on technology, participants, trends, market size and share
- Adobe Premiere Pro 15.4 has been released. It natively supports Apple M1 and adds the function of speech to text
- 开启 Chrome 和 Edge 浏览器多线程下载
- 5.4-5.5
- 【日常训练】395. 至少有 K 个重复字符的最长子串
- Puppet自动化运维排错案例
- 基于SVN分支开发模式流程浅析
- Global and Chinese markets for transparent OLED displays 2022-2028: Research Report on technology, participants, trends, market size and share
- [set theory] inclusion exclusion principle (complex example)
- What is machine reading comprehension? What are the applications? Finally someone made it clear
猜你喜欢

Introduction to opengl4.0 tutorial computing shaders

Redis主从、哨兵、集群模式介绍

Tencent internship interview sorting
![[engine development] rendering architecture and advanced graphics programming](/img/a4/3526a4e0f68e49c1aa5ce23b578781.jpg)
[engine development] rendering architecture and advanced graphics programming

北京共有产权房出租新规实施的租赁案例

【pytorch学习笔记】Datasets and Dataloaders

Pytorch深度学习和目标检测实战笔记
![[graphics] hair simulation in tressfx](/img/41/cef55811463d3a25a29ddab5278af0.jpg)
[graphics] hair simulation in tressfx

redis缓存穿透,缓存击穿,缓存雪崩解决方案
![[ue4] cascading shadow CSM](/img/83/f4dfda3bd5ba0172676c450ba7693b.jpg)
[ue4] cascading shadow CSM
随机推荐
TPS61170QDRVRQ1
Use of Tex editor
Leetcode sword offer find the number I (nine) in the sorted array
Global and Chinese markets for ionization equipment 2022-2028: Research Report on technology, participants, trends, market size and share
Troubleshooting method of CPU surge
链表有环,快慢指针走3步可以吗
Kubernetes - YAML文件解读
How to color ordinary landscape photos, PS tutorial
Global and Chinese market of solder bars 2022-2028: Research Report on technology, participants, trends, market size and share
Explanation of time complexity and space complexity
socket.io搭建分布式Web推送服务器
How can entrepreneurial teams implement agile testing to improve quality and efficiency? Voice network developer entrepreneurship lecture Vol.03
What is label encoding? How to distinguish and use one hot encoding and label encoding?
[ue4] geometry drawing pipeline
Global and Chinese markets for sterile packaging 2022-2028: Research Report on technology, participants, trends, market size and share
Nppexec get process return code
Zero copy underlying analysis
Composite type (custom type)
Global and Chinese market of iron free motors 2022-2028: Research Report on technology, participants, trends, market size and share
Global and Chinese market of trimethylamine 2022-2028: Research Report on technology, participants, trends, market size and share