Trying to understand alias-free-gan.

Overview

alias-free-gan-explanation

Trying to understand alias-free-gan in my own way.

[Chinese Version 中文版本]

CC-BY-4.0 License. Tzu-Heng Lin

motivation of this article: So, the thing is, I've been reading the paper for several days, and the paper is written in a way that I find really hard to understand. Thus, I decide to rephrase the main idea of the paper in my language. Some explanation might be different from the original paper. Of course, I might be making mistakes, so please feel free to correct me.

disclaimer: This paragraph is only my personal understanding. One is suggested to read the original paper. Details related to implemtation are not discussed here.

Karras, Tero, et al. Alias-Free Generative Adversarial Networks. arXiv preprint arXiv:2106.12423 (2021).

[Original Paper] [Code]

Overall Logic:

  • Modeling
    • Signals flow through the network are interpreted as continuous signals. The actually used feature maps are just discrete samples of them.
  • Problem Identifying
    • Discovering that current network architectures do not have a clear mechanism to restrict the generator to synthesis images in a strict hierarchical manner. Moreover, due to the fact that the frequencies of feature maps do not meet the condition of Nyquisit-Shannon Sampling Theorem, aliasing happens.
  • Problem Solving
    • Redesign a network that is alias-free and strictly follows the hierarchical synthesizing manner.
  • Analysis
    • We can show that alias-free generators are translation or rotation equivariant.
    • We can show that equivariant generators do not encounter the "texture sticking" phenomenon.

TOC

1. Motivation

1.1 Continuous and Discrete Signals

image-20210916233015961
Firstly, we need to interpret the information flow through the network in a more appropriate way. (with signal processing).
  • The authors utilize the concepts in signal processing, and interpret the information flow through the network as spatially infinite continuous signals. The feature maps we actually used are just discrete samples of the continuous signals in a targeted canvas. They can be seen as a convenient encoding of the continuous singals. If we set the unit square [0, 1] in the continuous singals as our targeted canvas, the size of the feature maps can then represent the sampling rate when converting continuous signals to discrete ones.
  • The high/low frequencies we are talking about are those frequencies we obtained in the frequency domain after we apply fourier transform to the continuous singals.
  • Since the procedure is sampling, the conditions of Nyquist-Shannon Sampling Theorem need to be satisfied. That is to say, the highest frequencies of the continuous signals must be smaller than half of the sampling rate (this is often called the Nyquist Frequency), or else the problem of aliasing would happen. (See Figure below.)

1.2 Problems of Exisiting Architecture

Ideal way for GANs to synthesize information:

  • Hierarchical Manner: From shallow to deep layers, synthesizing features from coarse to fine, from low to high frequencies. (For example, synthesizing a face would follow orders like: overall contour of the face -> ... -> skin -> pores, beard, other textures on skin)

Problems for existing GANs:

  • We find that existing GAN network architectures do not have a mechanism to restrict the generator to synthesis images in a strict hierarchical manner. Although they limit the resolution of feature maps in each layers to let feature maps in shallow layers cannot represent high frequency signals, but the new frequencies generated by operations in each layer, cannot be guarenteed to be smaller than the corresponding Nyquisit Frequency. If the above condition does not meet, the problem of aliasing would happen, which would make the high frequencies to be represented as low frequencies in the frequency domain, contaminating the whole signal.

1.3 Main Contribution

We want to design a network architecture, that strictly follows the ideal hierarchical manner of synthesizing information. Every layer is restricted to only synthesizing frequencies in the range that we designated to them, and thus, removing the problem of aliasing. (That's why the paper is called Alias Free GAN, IMO).

2. Method

2.1 Basic Op Redesign

Existing GANs contain basic Operations like: Conv, Upsamling, Downsampling, Nonlinearity. In the following, we will analyze them respectively, to see if they have the problem of aliasing. And if so, how do we fix them.

  • Conv

    • Convolution, it is used to locally reorganize signals, producing signals that meet our expectations more.
    • Convolution itself does not introduce new frequencies. (Convolution in time domain is equivalent to multiplication in the frequency domain. So where originally 0 is still 0 in the frequency domain).
  • Downsampling (See Figure Below)

    • Resample a signal to a lower sampling rate (s -> s', where s>s'). It is used to let the viable area smaller in the spectrum.

    • Notice that the sampling rate afterwards could be smaller than twice of the highest frequencies of the original signal. Thus, we need to use a Low Pass Filter beforehand to restrict the frequencies of the original signal to be less than half of the lowered sampling rate, then can we do the downsampling procedure (dropping points).

      upsample
  • Upsampling (See FIgure Below)

    • Resample a signal to a higher sampling rate (s -> s', where s<s'). It is use to add headroom in the spectrum, to let the viable area larger (So that subsequent layers can introduce new frequencies). Note that itself does not introduce new frequencies.

    • The procedure is achieved by first interleaving the original signals with 0, then use a Low Pass Filter to remove imaging in the frequency domain. Note that, the LPF used here is using cutoff=s/2, sampling rate=s'.

    • The upsampling and downsampling procedures introduced above might seem a little confusing for one who haven't learnt signal processing lessons before. However, they are actually the widely used procedures in the field of signal processing to resample signals. And they are very intuitive when explaining them with the Figure above.

      upsample
  • Nonlinearity (See Video)

    • Elementwisely nonlinearity (e.g. ReLU). It is used to introduce new frequencies.
    • The new frequencies introduced by nonlinearity contains two parts: the 1st part that meets the condition of the sampling theorem, and the 2nd part that doesn't. We want to preserve the former and eliminate the latter. However, if we directly apply nonlinearity to the discrete feature map, the newly introduced 2nd part frequencies will directly create aliasing.
    • Thus, the authors propose a very interesting method: Firstly, you upsample the signal by m (usually set to 2), then you apply the nonlinearity, and finally you downsample the signal back. The first upsampling is to increase the Nyquisit Frequency, adding headroom for the 2nd part frequencies newly introduced to avoid aliasing. Then, the downsampling procedure (including a LPF to eliminate the 2nd part frequencies) convert the signal back to its original sampling rate.
  • Low Pass Filter

    • Notice that downsampling, upsamling, nonlineaity operation introduced above use LPF.
    • The authors use a Kaiser-Windowed Sinc Filter (a FIR LPF) because it can directly manipulate transition band and attenuation.
    • Two very good links on LPF and Kaier window: link1, link2.

2.2 Equivariant and Texture Sticking

Equivariant means that when the input translate, the output translate equivalently. We can define to kinds of equivariant: Translation Equivariant, and Roation Equivariant.

Translation Equivariant

  • We can show that the alias-free network is translation equivariant naturally.

    • According to the above theoretical analysis, if we treat the signal as infinite continuous signal in the time domain throughout the network, the shift of the signal in the time domain does not actually change the amplitude of the signal in the frequency domain. Therefore, no matter how you move the input signal up, down, left, and right in the time domain, the output of each layer of the network will move along with it, and the final output signal will definitely move along with it.
  • The authors define a metric to evaluate the translation equivariance: EQ-T. Basically, it calculates the difference between two sets of images: translating the input or output of the syntheis network by the same random amount.

    image-20210917130927681

Rotation Equivariant

  • For rotation equivariance, we need some modification to Conv and LPF

    • Conv: We need keernel to be radially symmetry in the time domain. This is easy to understand. If you rotate the input signal, the most intuitive and simple way is to perform the same rotation for Conv kernels. In this way, there is no relative movement between the two, which is equivalent to the original operation.
    • Low Pass Filter: We also need keernel to be radially symmetry in the time domain. The explanation is similar to Conv.
  • The authors define a metric to evaluate the rotation equivariance: EQ-R.

    image-20210917131037667

Texture Sticking (video)

image-20210916002435902
  • We can show that equivariant networks do not have such phenomenon. The manifestation of this phenomenon is that high and low frequency features will not be transformed at the same speed together. But if the network has equivariance, then all features must be transformed together at the same speed, and this phenomenon will naturally not occur.

2.3 Detailed Design of Overall Network Architecture

image-20210917020314793
Apart from the changes of the basic operations, there are other changes in the network architectures.
  • (config B,H) Fourier Features

    • (B) Change original 'learned constant input' to 'Fourier Features'.

      • According to the previous analysis, the input that we essentially deal with is an infinite continuous signal, so the authors use Fourier Features here, which naturally have spatially infinite characteristics. The discrete input signal can be sampled from the continuous expression. At the same time, because there is an actual continuous expression, we can also easily translate and rotate the signal, then sample it and input it into the network, so that we can calculate EQ-T and EQ-R conveniently.

      • What exactly does the Fourier Feature look like? The authors' official implementation in unknown yet. According to rosinality/alias-free-gan-pytorch, it uses each piece of feature map to represent some frequency of sin or cos signal on x or y direction (which makes it 4 feature maps for each frequency). Code is implemented here: plot_fourier_features.py.

    • (H) Transformed Fourier Features (Appendix F)

      • The above Fourier features are randomly rotated or translated in the time domain (that is, the style of w also controls the input signal), and then being fed into the network. w -> t = (rc, rs, tx, ty), t = t/sqrt(rc^2+rs^2). code is implemented here: plot_fourier_features.py
  • (config E) 10px margin expanded to the original feature maps

    • In the above theoretical assumptions, the signals are spatially infinite, and the Conv, Upsampling, and Downsampling calculations at the edge will also use the values outside the boundary of the targeted canvas, so here we can use the following approach to approximate the infinite feature map :

      • Expand the feature map by a 10px margin.

      • If the feature map is upsampled, the margin is also upsampled, so we need to crop the margin after upsampling to make it remain to 10px.

      • If there is no upsampling, then no extra care is needed.

        image-20210919175525459
  • (config E,G,T) Sampling rate and LPF design

    • (E) According to the above analysis, a very intuitive approach (critical sampling) is to set the cutoff fc of the low-pass filter to half of the sampling rate s/2, and set half of transition band fh to (\sqrt{2}-1) (s/2) .

    • (G) However, doing so is actually dangerous, because our low-pass filter is just an approximation, it is not an ideal rectangular window in the frequency domain, so there will be some missing frequencies that can still pass through around the critical point. So here, the authors set cutoff fc to s/2-fh. The intuitive understanding is to keep less and filter out more. It is safer to avoid aliasing. Except for the last few layers, cutoff is still set to s/2, because the last layers really needs more high-frequency features.

    • (T) The authors found that the attenuation of the aforementioned low-pass filter is still insufficient for the low-resolution layers. The original design philosophy have fixed rules for each layer. The authors propose to design each layer separately here. They hope to have as large attenuation as possible in the low resolution layers, and keep more high frequency features in the high resolution layers.

      • The right most figure below shows a N=14 Generator design. The last two layer is critical sampled.
      • The cutoff fc (blue line) grows geometrically from fc= 2 in the first layer to fc= sN/2 in the first critically sampled layer.
      • The minimum acceptable stopband freq ft (yellow line) starts at f_{t,0} = 2^2.1 , and grows geometrically but slower than the cutoff fc. For the last two layers, ft = fc * 2^0.3.
        • f_{t,0} provides an effective way to trade training speed for equivariance quality.
      • The sampling rate s is set to double of the smallest multiple of two which is larger than ft. (but not exceeding the final output resolution).
      • Half of the transition band fh = max(s/2, ft) -fc
      • Now the number of layers N is not completely dependent on the output resolution. The authors then set the number of layers for all resolutions to 14.
      image-20210917215737862 image-20210917214706969 image-20210917012305268
  • (config R) Rotation Equivariance. As stated above, we need to change Conv and LPF to radially symmetry kernels.

    • Conv: replace all 3x3 conv with 1x1.
    • LPF: use jinc filter with the same Kaiser Window: image-20210917222512704
  • (config C, D) Others

    • (C) removing per-pixel noise. Since the spectrum of gaussian noise has the same intensity on all frequency, obviously it does not meet the sampling theorem.
    • (D) simplify generator. including:
      • mapping network 8->2
      • eliminate mixing regularization
      • eliminate path length regularization
      • eliminate skip connection, change to normalization using EMA of sigma

3. Experiments

3.1 Dataset

  • FFHQ-U and MetFaces-U: unaligned version of FFHQ and MetFaces. Difference with the original version: Axis-aligned crop, preserving orginal image angel, random crop face region, no mirrored.
  • AFHQv2: The original AFHQ use inappropriate downsampling, which results in aliasing. The new version use PIL's Lanczos.
  • Beaches: 20,155 photos, 512x512

3.2 Quantitative and Qualitative Results

image-20210917205044756
  • FFHQ (1024×1024)
    • # of params of the three Generator are: 30.0M, 22.3M, 15.8M
    • Training time (GPU hour): 1106, 1576 (+42%), 2248 (+103%)
  • Equivariance (video, video)
  • Texture Sticking phenomenon disappear (video, video)

3.3 Ablation Study

image-20210917205203726 image-20210917205257000
  • mixing reg. does no harm, but is somewhat useless(Appendix A)
  • per-pixel noise compromises equivariances significantly.
  • Fixed Fourier Features harms FID.
  • path length reg. harms FID, but improves equivariance (strange behavior). (Path length regularization is in principle at odds with translation equivariance, as it penalizes image changes upon latent space walk and thus encourages texture sticking. We suspect that the counterintuitive improvement in equivariance may come from slightly blurrier generated images, at a cost of poor FID.)
  • Capacity: halving the number of feature maps harms FID but the network remains equivariant. Doubling the number improves FID, yet with 4x training time.
  • DIfferent window function for sinc/jinc filter: Kaier, Lanczos, Gaussian. Lanczos is best on FID yet compromises equivariance. Gaussian leads to clear worse FID.
  • p4 symmetry G-CNN is not even close compared to Alias-Free-R on rotation equivariance.

3.4 Feature Map Visualization

video

image-20210917204255781
Owner
Tzu-Heng Lin
DL, CV, GAN, RS, DM (see https://lzhbrian.me)
Tzu-Heng Lin
Emblaze - Interactive Embedding Comparison

Emblaze - Interactive Embedding Comparison Emblaze is a Jupyter notebook widget for visually comparing embeddings using animated scatter plots. It bun

CMU Data Interaction Group 77 Nov 24, 2022
Comp445 project - Data Communications & Computer Networks

COMP-445 Data Communications & Computer Networks Change Python version in Conda

Peng Zhao 2 Oct 03, 2022
Transfer Learning library for Deep Neural Networks.

Transfer and meta-learning in Python Each folder in this repository corresponds to a method or tool for transfer/meta-learning. xfer-ml is a standalon

Amazon 245 Dec 08, 2022
Search and filter videos based on objects that appear in them using convolutional neural networks

Thingscoop: Utility for searching and filtering videos based on their content Description Thingscoop is a command-line utility for analyzing videos se

Anastasis Germanidis 354 Dec 04, 2022
Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

Low-light Image Enhancement via Breaking Down the Darkness by Qiming Hu, Xiaojie Guo. 1. Dependencies Python3 PyTorch=1.0 OpenCV-Python, TensorboardX

Qiming Hu 30 Jan 01, 2023
Dual Attention Network for Scene Segmentation (CVPR2019)

Dual Attention Network for Scene Segmentation(CVPR2019) Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang,and Hanqing Lu Introduction W

Jun Fu 2.2k Dec 28, 2022
Plug-n-Play Reinforcement Learning in Python with OpenAI Gym and JAX

coax is built on top of JAX, but it doesn't have an explicit dependence on the jax python package. The reason is that your version of jaxlib will depend on your CUDA version.

128 Dec 27, 2022
Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

Hybrid solving process for combinatorial optimization problems Combinatorial optimization has found applications in numerous fields, from aerospace to

117 Dec 13, 2022
Local Multi-Head Channel Self-Attention for FER2013

LHC-Net Local Multi-Head Channel Self-Attention This repository is intended to provide a quick implementation of the LHC-Net and to replicate the resu

12 Jan 04, 2023
[ECCV2020] Content-Consistent Matching for Domain Adaptive Semantic Segmentation

[ECCV20] Content-Consistent Matching for Domain Adaptive Semantic Segmentation This is a PyTorch implementation of CCM. News: GTA-4K list is available

Guangrui Li 88 Aug 25, 2022
Nerf pl - NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning

nerf_pl Update: an improved NSFF implementation to handle dynamic scene is open! Update: NeRF-W (NeRF in the Wild) implementation is added to nerfw br

AI葵 1.8k Dec 30, 2022
All-in-one Docker container that allows a user to explore Nautobot in a lab environment.

Nautobot Lab This container is not for production use! Nautobot Lab is an all-in-one Docker container that allows a user to quickly get an instance of

Nautobot 29 Sep 16, 2022
A Keras implementation of YOLOv4 (Tensorflow backend)

keras-yolo4 请使用更完善的版本: https://github.com/miemie2013/Keras-YOLOv4 Please visit here for more complete model: https://github.com/miemie2013/Keras-YOLOv

384 Nov 29, 2022
Flexible Option Learning - NeurIPS 2021

Flexible Option Learning This repository contains code for the paper Flexible Option Learning presented as a Spotlight at NeurIPS 2021. The implementa

Martin Klissarov 7 Nov 09, 2022
This repository is for Contrastive Embedding Distribution Refinement and Entropy-Aware Attention Network (CEDR)

CEDR This repository is for Contrastive Embedding Distribution Refinement and Entropy-Aware Attention Network (CEDR) introduced in the following paper

phoenix 3 Feb 27, 2022
Adabelief-Optimizer - Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"

AdaBelief Optimizer NeurIPS 2020 Spotlight, trains fast as Adam, generalizes well as SGD, and is stable to train GANs. Release of package We have rele

Juntang Zhuang 998 Dec 29, 2022
Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"

TSOD Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer" Usage For training, open train_test, run p

Jinming Su 2 Dec 23, 2021
Official PyTorch Implementation of SSMix (Findings of ACL 2021)

SSMix: Saliency-based Span Mixup for Text Classification (Findings of ACL 2021) Official PyTorch Implementation of SSMix | Paper Abstract Data augment

Clova AI Research 52 Dec 27, 2022
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow 190 Dec 30, 2022
Election Exit Poll Prediction and U.S.A Presidential Speech Analysis using Machine Learning

Machine_Learning Election Exit Poll Prediction and U.S.A Presidential Speech Analysis using Machine Learning This project is based on 2 case-studies:

Avnika Mehta 1 Jan 27, 2022