当前位置：网站首页>Exposure: a white box photo post processing framework reading notes

Exposure: a white box photo post processing framework reading notes

2022-07-01 11:24:00 【Cassia tora】

Exposure：A White-Box Photo Post-Processing Framework Reading notes

The paper was published in 2018 Year of TOG.

1 Abstract

problem ：
Post retouching can significantly improve the image quality , But ordinary people lack the professional knowledge of post modification images .

present situation ：
The existing automatic modification system needs paired training images for supervised learning , Paired datasets are difficult to obtain .

Methods of this paper ：
Propose a method of deep learning , Train on unpaired data , That is, a group of photos showing the decoration style that users like , This is easy to collect .
Through an end-to-end learning framework Realize the training of unpaired data , In this framework, various modification operations are represented as a series of resolution independent differentiable filters , These filters can be used in Convolutional Neural Networks （CNN） China China joint training .
Through the generative confrontation network （GAN） Guided deep reinforcement learning （RL） Method , Determine the sequence and parameters of the filter for the input image , This method can be based on the current state of the image , Learn to decide what action to take next .

contribution ：
$∙$ An end-to-end photo post-processing model with a set of differentiable filters .
$∙$ Optimize the model by using reinforcement learning , This system can generate meaningful operation sequences , Provide users with an understanding of a given artistic style , Instead of just outputting black box results .
$∙$ Use GAN structure , You can learn photo retouching without image pairs . According to the author of this article , This is a The first one scales with the image resolution and does not produce distortion artifacts in the image GAN.
$∙$ This method not only provides an effective end-to-end post-processing tool to help ordinary users , It can also help advanced users reverse engineer the style of automatic filters .
Insert picture description here

2 The Model

Given input RAW Photo , The goal of this paper is to generate modified results , To match a given photo set that represents a particular style . This section details the modeling of the modification process .

2.1 motivation

Photo modification It is completed by a series of editing steps , Each of these steps Rely on the output of the previous step to adjust . This dependence on visual feedback even exists in a single step of the process , As shown in the figure ：
Insert picture description here
Feedback is crucial to the selection of operation and its parameters . Assume that the automatic grooming system will also benefit from feedback , And you can Learn more effectively how to select and apply individual actions based on feedback , Instead of inferring the final output directly from the input . Besides , Take modeling and decoration as a series of standard post-processing operations , It helps to maintain the realism of the image , And Make the automatic process easier for users to understand .
In this system The order of different operations also needs to be learned , Unlike previous work, it requires less supervision , Just need a set of modified photos for training .

2.2 Post processing as a decision sequence

Based on the above motivation , The modification process can be modeled as a sequential decision problem , This is a Reinforcement learning （RL） Common problems in .RL It's a sub area of machine learning , And agent（ Here refers to the person who performs image decoration ） How to act in the environment to maximize cumulative rewards . Here is a brief introduction RL Basic concepts of , And how to express image decoration as RL problem .
stay RL in , Express the problem as $P = (S, A)$ , $S$ It's state space , $A$ It's movement space . In the problem of image decoration , $S$ It's image space , Include RAW Input images and all intermediate results in the automatic process , and $A$ Is the set of all filtering operations . One Conversion function $p : S \times A \to S$ Taking action $a \in A$ Then enter the status $s \in S$ Map to its result state $s^{'} \in S$ . The state transition can be expressed as $s_{(i+1)}=p(s_i,a_i)$ . Apply a series of filters to the input RAW Images produce tracks of States and actions ：
Insert picture description here
$s_i∈S$ ： state
$a_i∈A$ ： action
$N$ ： Number of actions
$s_N$ ： Stop state
The specific process is shown in the figure below ：

RL The core element of this is Reward function $r:S×A→\mathbb{R}$ , It is used to evaluate what action should be taken after a given state .
The goal of this article is to choose a strategy $π$ , Make the maximum cumulative reward in the decision-making process . So , This paper uses random policy proxy , among Strategy $π:S→\mathbb{P}(A)$ Change the current state $s \in S$ Set of probability density functions mapped to actions $\mathbb{P}(A)$ . When agent When entering a state , It will sample an action according to the probability density function , Get rewards , Then follow the transition function to the next state .
Given trajectory $t=(s_0,a_0,s_1,a_1,...,s_N)$ , take Return $r_k^γ$ Defined as $s_k$ Sum of discount returns after ：
Insert picture description here
$γ \in [0, 1]$ Is a discount factor , It pays more attention to the rewards of the near unknown .
To evaluate strategies , This article defines the goal ：

$s_0$ ： The input image
$\mathbb{E}$ ： Expectations
$S_0$ ： Input data set
This goal visually describes the strategy $π$ The expected return of all possible trajectories .
agent The goal of is to maximize the goal $J (π)$ , This is related to the reward function r It is related to the final image quality , Because of high-quality images （ state ） Get a bigger reward .
Status and status - The expected total discount reward of the action pair is state - Value function V and action - Value function Q Definition ：
Insert picture description here
In order to make the question of this article suitable for this RL frame , Break the action into two parts ： filter $a_1$ The discrete choice of and Filter parameters $a_2$ Continuous decision making . The strategy also includes two parts ： $π=(π_1,π_2)$ . $π_1$ It's a function , It accepts a state and returns the probability distribution on the filter , namely $a_1$ The choice of ; and $π_2$ It's a take $s,a_1)$ And then generate it directly $a_2$ Function of （ $π_1$ Is random , Sampling is required ）.
There are practical challenges in sampling continuous random variables , Therefore, this article follows the recent practice , Handle with certainty $π_2$ .

3 Filter Design

This section discusses the involvement of filters , That is, the action space in the model $A$ .

3.1 Design principles

Differentiable . For gradient based strategies $π$ optimization problem , The filter needs to be differentiable for its filter parameters to allow training through back propagation CNN. In order that the filter can be simply modeled as a basic neural network layer , This paper presents an approximate method of filter , Replace smooth curve with piecewise linear function .
Resolution independent . Most editing operations can be determined at low resolution , In order to save time and space costs , It can be done to RAW Operate on the down sampled version of the image . That is, in the low resolution of the original image （64×64） Determine the filter parameters on the version , Then apply the same filter to the original high-resolution image . So , Filters need to be resolution independent .
Understandable . Filters should represent intuitive operations , So that the user can understand the generated operation sequence . If they want to , This will also enable them to further adjust parameters .
The above three design principles are shown in the figure ：
Insert picture description here

3.2 Filter details

Based on the above principles , In this paper, the input pixel value is developed $p_I=(r_I,g_I,b_I)$ Map to the output pixel value $p_O=(r_O,g_O,b_O)$ Filter for . Exposure correction 、 white balance and Color curve adjustment , This pixel by pixel mapping function can be used for modeling . The following table lists the operation examples of the system implementation , The following figure shows the visualization of the corresponding operation ：
Insert picture description here

Color curve adjustment , That is, channel independent monotone mapping function , Its filter needs special treatment to make it differentiable . In this paper, the curve is approximated as monotone piecewise linear functions , As shown in the figure ：

Suppose you use $L$ Parameters represent a curve , Expressed as ${t_0,t_1,...,t_{L-1}\}$ . Define the prefix and of the parameter as $T_k=∑_{l=0}^{k-2}t_l$ , The points on the curve are expressed as $k/L,T_k/T_L)$ . For this expression , Enter the strength $x \in [0, 1]$ Will be mapped to
Insert picture description here
Please note that , This mapping function is now represented by a differentiable parameter , Make the function relative to $x$ And parameters ${t_l\}$ Are differentiable .

4 Learning

The whole training cycle is as follows ：
Insert picture description here

4.1 Use DNN Perform function approximation

This work uses CNN,CNN There is Two strategic networks , They map images to action probabilities $π_1$ （ stay softmax after ） Or filter parameters $π_2$ （ stay tanh after ）. For strategy $π_1$ and $π_2$ , The network parameters are expressed as $θ_1$ and $θ_2$ , The training goal is to optimize $θ=(θ_1,θ_2)$ To make the goal $J(π_θ)$ Maximize . In addition, I also learned a Value network And a Discriminator network , This helps with the training described later .
The above network shares the same architecture as shown in the figure below , At the same time, there are different numbers of output neurons according to the output content .
Insert picture description here
For each CNN Use Four convolutions , Every convolution layer has The size is $4 \times 4$ and In steps of 2 Filter for . And then there was a Fully connected layer , To output the quantity Reduced to 128, And then a The final full connection layer , Further, the regression characteristics are transformed into the required parameters of each network . After the first full connection layer , application dropout（ During training and testing 50%） Provide noise for the generator .

4.2 Strategy network training

The strategy network uses the strategy gradient method for training , This method uses gradient descent to optimize the parameterization strategy for the expected return . Due to strategy $π$ By corresponding to two decision-making steps （ That is, filter and parameter selection ） The two part $π_1,π_2)$ form , They learn in a staggered way .
For filter selection , Yes $π_1$ sampling , It is a discrete probability distribution function $π_1 (F_k)=\mathbb{P}[a_1=F_k]$ , Applicable to all filters $F=\{F_1,F_2,...,F_n\}$ . In this paper, the partial derivative is solved by applying the strategic gradient theorem ∂J(π)/∂π(F_k) Problems that cannot be calculated directly to obtain $J (π)$ be relative to $π_1$ The unbiased gradient of Monte Carlo It is estimated that . For filter parameter selection , Strategy $π_2$ It's deterministic , Therefore, it is easier to optimize in continuous space , Here we use the deterministic strategy gradient theorem . therefore , The policy gradient is expressed as
Insert picture description here
$Q$ Is the value function defined above .
$ρ^π$ ： The distribution of discount status defined as

To calculate these gradients , This paper applies actor-critic frame , The participants are represented by the policy network , Critics are value networks , It learns to use by ν A parameterized CNN To approximate the state value function $V^π$ . Use critic, The action value function can be calculated by expanding its definition value function and expressing it with the state value function $Q^π$ ：
Insert picture description here
Optimize the value network by minimizing ：

$δ$ ： Time difference （TD） error ： $δ = r (s, a) + γ V (p (s, a)) - V (s)$
$δ$ It also shows the advantages $A (s, a) = Q (s, a) - V (s)$ Of Monte Carlo It is estimated that , In the state $s$ Next , action $a$ How much does the value of exceed the expected value of the action . Used in the calculation equation $π_1$ Gradient of . $π_2$ The gradient of does not need Monte Carlo It is estimated that , So directly through to $Q$ The gradient of applies the chain rule to calculate it , Instead of using advantages $A$ .
Incentives and discount factors ：
The ultimate goal is to get the best final result after all operations . So , In this paper, the reward is set as the incremental improvement of quality score （ stay 4.3 Modeling by discriminator network ） Add a penalty （ In the 4.4 Section describes ）. Set the discount factor to $γ = 1$ , And allow agent Edit the input image five times . This number of edits is selected to balance the expressiveness and simplicity of the operation sequence . This article uses a fixed number of steps , Because doing so will make training more stable than when to stop online learning .

4.3 Quality assessment through confrontation learning

In order to generate results as close to the target data set as possible , This article uses GAN, It consists of two parts , I.e. generator and discriminator . These two parts are optimized in a confrontational way ： The discriminator is trained to determine whether the image is from the target data set or generated by the generator ; The generator aims to “ cheating ” Judging device , Thus, the discriminator cannot distinguish the difference . Both networks train at the same time , Achieve the ideal balance when the generated image is close to the target .
In this work , This article uses traditional GAN A popular variant of , be called Wasserstein GAN (WGAN), It uses land travel distance (EMD) To measure the difference between two probability distributions . It has been proved to be stable GAN Train and avoid the gradient disappearing . Discriminator $D$ The loss of is defined as ：
Insert picture description here
Discriminator $D$ Modeled as CNN, Its parameter is expressed as $w$ . Generator's “ Negative loss ”（ Mass fraction ）, Its increment is a part of the reward in this system , yes

4.4 Training strategy

In order to solve RL Algorithm and GAN The problem of difficult training , Use the following strategies to stabilize the training process .
Development VS Explore ：
There is a trade-off between development and exploration , That is, invest more energy in improving existing strategies , Or try new actions to find potential greater future returns . This is particularly challenging for the two-stage decision-making problem in this paper , Because focusing on one filter may lead to insufficient learning and utilization of filter parameters of other filters . To avoid this local minimum , If its motion proposal is too concentrated , It has low entropy , Will punish $π_1$ . This is done by reducing rewards ：
Insert picture description here
Besides , Find out agent The filter may be reused in the finishing process , For example, by applying two consecutive exposure adjustments , Instead of combining them into one step . For more concise decoration solutions , We use the penalty filter to reuse “ teach ”agent Avoid such behavior ： If agent Use the filter twice , The second use will produce -1 Extra punishment . To achieve this ,agent You need to know which filters were applied early in the process , encourage agent Making full use of the maximum potential of each filter also leads to more exploration of different filters .
Disorderly training ：
Images along a single track in successive steps may be highly correlated , This correlation is right RL and GAN It's all harmful . To solve this problem , This paper presents a training scheme of disordered order , Instead of sequential training . say concretely , Not start and finish a small amount at the same time （ for example ,64 As a batch ） The trajectory , Instead, maintain a large number of running tracks in the track buffer . In each training iteration , From track buffer （ Not necessarily at the same stage ） Sample a batch of images , Apply an operation step to it , Then put the edited image back into the buffer .
There are two benefits of this mechanism ：
（1） about RL, It partly plays the role of experience playback mechanism , Can be observed “ smooth ” Training data distribution ;
（2） about GAN Training , The effect of this method is similar to “ history ” buffer , This also helps to reduce model oscillations .

5 Result

This section introduces the implementation details of the system 、 Validation and Application .
Implementation details ：
（1） use TensorFlow Realize this system .
（2） In order to estimate modification actions and parameters , Down sample the high-resolution input image to 64×64px, Achieve a balance between performance and network size . And the resulting small network size can prevent over fitting , Lead to fast reasoning , And it is easy to incorporate the model into the application . The estimated actions and parameters are then applied to the full resolution image at runtime .
（3） All networks Use Adam To optimize , The learning rates are The policy network is $1.5×10^{-5}$ , The discriminator is $5×10^{-5}$ , The value network is $5×10^{-4}$ . During training , These learning rates decay exponentially to the original value $10^{-3}$ .
Efficiency and model size ：
Thanks to the resolution independent filter design , stay NVIDIA TITAN X (Maxwell) GPU Reasoning on requires 30 millisecond , The model is small （< 30MB）
Data sets ：
（1）MIT-Adobe FiveK Data sets . Divide the data set randomly into three parts （ The three parts do not intersect with each other ）：
part 1：2,000 Zhang input RAW Images ,
part 2：2,000 Zhang you, an expert C Decorated image
part 3：1,000 Zhang input RAW Images are tested .
（2）500px Data sets . stay 500px.com Professional decoration photos of two artists captured on . The styles of the two groups of data are relatively consistent , By 369 and 397 A picture of .
Error measurement ：
Evaluate the similarity between the generated image and the target image according to the distribution of image attributes . Brightness is used in this work 、 Contrast 、 Saturation is the three descriptive features of image style , The histogram intersection is used to measure their distribution distance between the output image and the target image .

Supplementary materials ：
The number of histogram intersections is defined as follows ：
—— brightness Defined as average pixel brightness .
—— Contrast Defined as twice the variance of pixel brightness .
—— saturation Defined as average pixel saturation （HSL In color space “S” value ）.
The result is in the interval [0,1] Divided into 32 Two equal bin, namely [0,1/32),[1/32,2/32)….
However , Only 1,000 Sample image , Average each bin Only about 31.25 Zhang image , This leads to significant measurement noise . therefore , By cropping in each image 16 individual patch To increase the data of histogram intersection , And measure this 16,000 Images patch Number of histograms on .

5.1 End to end post-processing and style learning

Use from MIT-Adobe FiveK Data sets Part 1 Of RAW Three groups of experiments were carried out with images as input , Use from MIT-Adobe FiveK Experts in data sets C（Part 2） And from the 500px.com The images of two artists are used as the target data set .
The first group of experiments ： With experts C（Part 2） The image of is the experiment of the target image , The visual results are shown in the figure , The quantitative results are shown in the table ：
Insert picture description here

The second group of experiments ： Yes 500px The artist A Style learning experiment , The visual results are shown in the figure , The quantitative results are shown in the table ：

The third group of experiments ： Yes 500px The artist B Style learning experiment , The visual results are shown in the figure , The quantitative results are shown in the table ：
Insert picture description here
（ second 、 In three experiments ,Pix2pix Unable to generate comparison results , Because the downloaded images usually have no matching training data from the network .）
generalization ：
Apply the network to another group RAW Good results were obtained on the photos , As shown in the figure ：
Insert picture description here

5.2 Reverse engineering black box filter

This method not only produces visually pleasing effects , It also reveals how this process is completed step by step , As shown in the figure below ：
Insert picture description here
To the best of the author's knowledge , This is a For the first time, this understandable result is obtained in an image processing system based on deep learning .
With the help of this system , It is even possible to write display code for the black box filter according to the estimated sequence of operations , As shown in the figure below ：
Insert picture description here

6 CONCLUDING REMARKS

Inspired by the process of drawing revision by expert photographers , This paper presents a general framework for automatic photo post-processing , It mainly consists of three parts ：
（1） Reinforcement learning , To reveal an understandable solution consisting of common image operations ;
（2） Generate adversarial networks , Allow training from unpaired image data ;
（3） Differentiable 、 Resolution independent filter , It makes it possible for the network to optimize various editing operators on images of any resolution .
Pixel level denoising is difficult to model as a resolution independent differentiable filter , Therefore, it is necessary to denoise the input image before using the framework in this paper .
Sometimes this method can't generate good hue for human face , And may not improve the content 、 Input photos with poor composition or lighting conditions , As shown in the figure ：
Insert picture description here