当前位置:网站首页>PyTorch crop images differentiablly
PyTorch crop images differentiablly
2022-07-03 15:08:00 【trium_ KW】
Intro
PyTorch provides a variety of means to crop images. For example, torchvision.transforms provides several functions to crop PIL
images; PyTorch Forum provides an answer of how to crop image in a differentiable way (differentiable with respect to the image). However, sometimes we need a fully differentiable approach for the cropping action itself. How shall we implement that?
Theory: Affine transformation
Before reaching the answer, we need first to learn about the image coordinate system in PyTorch. It is a left-handed Cartesian system origined at the middle of an image. The coordinate has been normalized to range [ − 1 , 1 ] [-1,1] [−1,1], where ( − 1 , − 1 ) (-1,-1) (−1,−1) indicates the top-left corner, and ( 1 , 1 ) (1,1) (1,1) indicates the bottom-right corner, as pointed out by the doc.
Let ( x , y ) (x,y) (x,y) be the top-left corner of the cropped image with respect to the coordinate of the original image; likewise, we denote ( x ′ , y ′ ) (x',y') (x′,y′) as the bottom-right corner of the cropped image. It’s clear that ( x , y ) (x,y) (x,y) corresponds to ( − 1 , − 1 ) (-1,-1) (−1,−1) with respect to the cropped image coordinate system, and ( x ′ , y ′ ) (x',y') (x′,y′) corresponds to ( 1 , 1 ) (1,1) (1,1). We’d like a function f f f that maps from the cropped image system to the original image system for every point in the cropped image. Since only scaling and translation are involved, the function f f f can be parameterized by an affine transformation matrix Θ \Theta Θ such that
Θ = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) \Theta = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=⎝⎛θ11000θ220θ13θ231⎠⎞
where θ 12 = θ 21 = 0 \theta_{12}=\theta_{21}=0 θ12=θ21=0 since skewing is not involved. Denote u H \mathbf{u}_H uH as the homogeneous coordinate of u = ( u v ) T \mathbf{u}=\begin{pmatrix}u & v\\ \end{pmatrix}^T u=(uv)T such that u H = ( u T 1 ) T \mathbf{u}_H=\begin{pmatrix}\mathbf{u}^T&1\end{pmatrix}^T uH=(uT1)T, Θ \Theta Θ maps u H \mathbf{u}_H uH with respect to the cropped image system to x H \mathbf{x}_H xH with respect to the original image system, i.e. x H = Θ u H \mathbf{x}_H = \Theta \mathbf{u}_H xH=ΘuH. Thus,
( x x ′ y y ′ 1 1 ) = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) ( − 1 1 − 1 1 1 1 ) \begin{pmatrix} x & x'\\ y & y'\\ 1 & 1 \end{pmatrix} = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} \begin{pmatrix} -1 & 1\\ -1 & 1\\ 1 & 1\\ \end{pmatrix} ⎝⎛xy1x′y′1⎠⎞=⎝⎛θ11000θ220θ13θ231⎠⎞⎝⎛−1−11111⎠⎞
Solving the equations,
Θ = ( x ′ − x 2 0 x ′ + x 2 0 y ′ − y 2 y ′ + y 2 0 0 1 ) \Theta = \begin{pmatrix} \frac{x'-x}{2} & 0 & \frac{x'+x}{2}\\ 0 & \frac{y'-y}{2} & \frac{y'+y}{2}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=⎝⎛2x′−x0002y′−y02x′+x2y′+y1⎠⎞
where x ′ ≥ x , y ′ ≥ y x'\ge x, y' \ge y x′≥x,y′≥y.
Coding time
We’ll need two functions:
torch.nn.functional.affine_grid
to convert the Θ \Theta Θ parameterization to f f ftorch.nn.functional.grid_sample
to find the corresponding original image coordinate from each cropped image coordinate
import torch
import torch.nn.functional as F
B, C, H, W = 16, 3, 224, 224 # batch size, input channels
# original image height and width
# Let `I` be our original image
I = torch.rand(B, C, H, W)
# Set the (x,y) and (x',y') to define the rectangular region to crop
x, y = -0.5, -0.3 # some examplary random coordinates;
x_, y_ = 0.7, 0.8 # in practice, (x,y,x_,y_) might be predicted
# as a tensor in the computation graph
# Set the affine parameters
theta = torch.tensor([
[(x_-x)/2, 0, (x_+x)/2],
[ 0,(y_-y)/2, (y_+y)/2],
]).unsqueeze_(0).expand(B, -1, -1)
# compute the flow field;
# where size is the output size (scaling involved)
# `align_corners` option must be the same throughout the code
f = F.affine_grid(theta, size=(B, C, H//2, W//2), align_corners=False)
I_cropped = F.grid_sample(I, f, align_corners=False)
Read also
边栏推荐
- [ue4] HISM large scale vegetation rendering solution
- 什么是one-hot encoding?Pytorch中,将label变成one hot编码的两种方式
- [transform] [NLP] first proposed transformer. The 2017 paper "attention is all you need" by Google brain team
- [ue4] Niagara's indirect draw
- App global exception capture
- "Seven weapons" in the "treasure chest" of machine learning: Zhou Zhihua leads the publication of the new book "machine learning theory guide"
- Qt—绘制其他东西
- [opengl] bone animation blending effect
- 【云原生训练营】模块八 Kubernetes 生命周期管理和服务发现
- How to color ordinary landscape photos, PS tutorial
猜你喜欢
What are the composite types of Blackhorse Clickhouse, an OLAP database recognized in the industry
链表有环,快慢指针走3步可以吗
QT program font becomes larger on computers with different resolutions, overflowing controls
高并发下之redis锁优化实战
5.4-5.5
Unity hierarchical bounding box AABB tree
There are links in the linked list. Can you walk three steps faster or slower
High quality workplace human beings must use software to recommend, and you certainly don't know the last one
[ue4] HISM large scale vegetation rendering solution
[graphics] real shading in Unreal Engine 4
随机推荐
Using notepad++ to build an arbitrary language development environment
高并发下之redis锁优化实战
What is embedding (encoding an object into a low dimensional dense vector), NN in pytorch Principle and application of embedding
5.2-5.3
Yolov5 advanced seven target tracking latest environment construction (II)
[Yu Yue education] scientific computing and MATLAB language reference materials of Central South University
运维体系的构建
远程服务器后台挂起 nohup
Didi off the shelf! Data security is national security
[opengl] bone animation blending effect
Global and Chinese market of lighting control components 2022-2028: Research Report on technology, participants, trends, market size and share
Leetcode the smallest number of the rotation array of the offer of the sword (11)
B2020 points candy
Troubleshooting method of CPU surge
QT program font becomes larger on computers with different resolutions, overflowing controls
How does vs+qt set the software version copyright, obtain the software version and display the version number?
Global and Chinese market of postal automation systems 2022-2028: Research Report on technology, participants, trends, market size and share
The first character of leetcode sword offer that only appears once (12)
CentOS7部署哨兵Redis(带架构图,清晰易懂)
使用Tengine解决负载均衡的Session问题