当前位置:网站首页>Exposure: a white box photo post processing framework reading notes
Exposure: a white box photo post processing framework reading notes
2022-07-01 11:24:00 【Cassia tora】
Exposure:A White-Box Photo Post-Processing Framework Reading notes
The paper was published in 2018 Year of TOG.
1 Abstract
problem :
Post retouching can significantly improve the image quality , But ordinary people lack the professional knowledge of post modification images .
present situation :
The existing automatic modification system needs paired training images for supervised learning , Paired datasets are difficult to obtain .
Methods of this paper :
Propose a method of deep learning , Train on unpaired data , That is, a group of photos showing the decoration style that users like , This is easy to collect .
Through an end-to-end learning framework Realize the training of unpaired data , In this framework, various modification operations are represented as a series of resolution independent differentiable filters , These filters can be used in Convolutional Neural Networks (CNN) China China joint training .
Through the generative confrontation network (GAN) Guided deep reinforcement learning (RL) Method , Determine the sequence and parameters of the filter for the input image , This method can be based on the current state of the image , Learn to decide what action to take next .
contribution :
∙ ∙ ∙ An end-to-end photo post-processing model with a set of differentiable filters .
∙ ∙ ∙ Optimize the model by using reinforcement learning , This system can generate meaningful operation sequences , Provide users with an understanding of a given artistic style , Instead of just outputting black box results .
∙ ∙ ∙ Use GAN structure , You can learn photo retouching without image pairs . According to the author of this article , This is a The first one scales with the image resolution and does not produce distortion artifacts in the image GAN.
∙ ∙ ∙ This method not only provides an effective end-to-end post-processing tool to help ordinary users , It can also help advanced users reverse engineer the style of automatic filters .
2 The Model
Given input RAW Photo , The goal of this paper is to generate modified results , To match a given photo set that represents a particular style . This section details the modeling of the modification process .
2.1 motivation
Photo modification It is completed by a series of editing steps , Each of these steps Rely on the output of the previous step to adjust . This dependence on visual feedback even exists in a single step of the process , As shown in the figure :
Feedback is crucial to the selection of operation and its parameters . Assume that the automatic grooming system will also benefit from feedback , And you can Learn more effectively how to select and apply individual actions based on feedback , Instead of inferring the final output directly from the input . Besides , Take modeling and decoration as a series of standard post-processing operations , It helps to maintain the realism of the image , And Make the automatic process easier for users to understand .
In this system The order of different operations also needs to be learned , Unlike previous work, it requires less supervision , Just need a set of modified photos for training .
2.2 Post processing as a decision sequence
Based on the above motivation , The modification process can be modeled as a sequential decision problem , This is a Reinforcement learning (RL) Common problems in .RL It's a sub area of machine learning , And agent( Here refers to the person who performs image decoration ) How to act in the environment to maximize cumulative rewards . Here is a brief introduction RL Basic concepts of , And how to express image decoration as RL problem .
stay RL in , Express the problem as P = ( S , A ) P=(S,A) P=(S,A), S S S It's state space , A A A It's movement space . In the problem of image decoration , S S S It's image space , Include RAW Input images and all intermediate results in the automatic process , and A A A Is the set of all filtering operations . One Conversion function p : S × A → S p:S×A→S p:S×A→S Taking action a ∈ A a∈A a∈A Then enter the status s ∈ S s∈S s∈S Map to its result state s ′ ∈ S s'∈S s′∈S. The state transition can be expressed as s ( i + 1 ) = p ( s i , a i ) s_{(i+1)}=p(s_i,a_i) s(i+1)=p(si,ai). Apply a series of filters to the input RAW Images produce tracks of States and actions :
s i ∈ S s_i∈S si∈S: state
a i ∈ A a_i∈A ai∈A: action
N N N: Number of actions
s N s_N sN: Stop state
The specific process is shown in the figure below :
RL The core element of this is Reward function r : S × A → R r:S×A→\mathbb{R} r:S×A→R, It is used to evaluate what action should be taken after a given state .
The goal of this article is to choose a strategy π π π, Make the maximum cumulative reward in the decision-making process . So , This paper uses random policy proxy , among Strategy π : S → P ( A ) π:S→\mathbb{P}(A) π:S→P(A) Change the current state s ∈ S s∈S s∈S Set of probability density functions mapped to actions P ( A ) \mathbb{P}(A) P(A). When agent When entering a state , It will sample an action according to the probability density function , Get rewards , Then follow the transition function to the next state .
Given trajectory t = ( s 0 , a 0 , s 1 , a 1 , . . . , s N ) t=(s_0,a_0,s_1,a_1,...,s_N) t=(s0,a0,s1,a1,...,sN), take Return r k γ r_k^γ rkγ Defined as s k s_k sk Sum of discount returns after :
γ ∈ [ 0 , 1 ] γ∈[0,1] γ∈[0,1] Is a discount factor , It pays more attention to the rewards of the near unknown .
To evaluate strategies , This article defines the goal :
s 0 s_0 s0: The input image
E \mathbb{E} E: Expectations
S 0 S_0 S0: Input data set
This goal visually describes the strategy π π π The expected return of all possible trajectories .
agent The goal of is to maximize the goal J ( π ) J(π) J(π), This is related to the reward function r It is related to the final image quality , Because of high-quality images ( state ) Get a bigger reward .
Status and status - The expected total discount reward of the action pair is state - Value function V and action - Value function Q Definition :
In order to make the question of this article suitable for this RL frame , Break the action into two parts : filter a 1 a_1 a1 The discrete choice of and Filter parameters a 2 a_2 a2 Continuous decision making . The strategy also includes two parts : π = ( π 1 , π 2 ) π=(π_1,π_2) π=(π1,π2). π 1 π_1 π1 It's a function , It accepts a state and returns the probability distribution on the filter , namely a 1 a_1 a1 The choice of ; and π 2 π_2 π2 It's a take ( s , a 1 ) (s,a_1) (s,a1) And then generate it directly a 2 a_2 a2 Function of ( π 1 π_1 π1 Is random , Sampling is required ).
There are practical challenges in sampling continuous random variables , Therefore, this article follows the recent practice , Handle with certainty π 2 π_2 π2.
3 Filter Design
This section discusses the involvement of filters , That is, the action space in the model A A A.
3.1 Design principles
Differentiable . For gradient based strategies π π π optimization problem , The filter needs to be differentiable for its filter parameters to allow training through back propagation CNN. In order that the filter can be simply modeled as a basic neural network layer , This paper presents an approximate method of filter , Replace smooth curve with piecewise linear function .
Resolution independent . Most editing operations can be determined at low resolution , In order to save time and space costs , It can be done to RAW Operate on the down sampled version of the image . That is, in the low resolution of the original image (64×64) Determine the filter parameters on the version , Then apply the same filter to the original high-resolution image . So , Filters need to be resolution independent .
Understandable . Filters should represent intuitive operations , So that the user can understand the generated operation sequence . If they want to , This will also enable them to further adjust parameters .
The above three design principles are shown in the figure :
3.2 Filter details
Based on the above principles , In this paper, the input pixel value is developed p I = ( r I , g I , b I ) p_I=(r_I,g_I,b_I) pI=(rI,gI,bI) Map to the output pixel value p O = ( r O , g O , b O ) p_O=(r_O,g_O,b_O) pO=(rO,gO,bO) Filter for . Exposure correction 、 white balance and Color curve adjustment , This pixel by pixel mapping function can be used for modeling . The following table lists the operation examples of the system implementation , The following figure shows the visualization of the corresponding operation :
Color curve adjustment , That is, channel independent monotone mapping function , Its filter needs special treatment to make it differentiable . In this paper, the curve is approximated as monotone piecewise linear functions , As shown in the figure :
Suppose you use L L L Parameters represent a curve , Expressed as { t 0 , t 1 , . . . , t L − 1 } \{t_0,t_1,...,t_{L-1}\} { t0,t1,...,tL−1}. Define the prefix and of the parameter as T k = ∑ l = 0 k − 2 t l T_k=∑_{l=0}^{k-2}t_l Tk=∑l=0k−2tl, The points on the curve are expressed as ( k / L , T k / T L ) (k/L,T_k/T_L) (k/L,Tk/TL). For this expression , Enter the strength x ∈ [ 0 , 1 ] x∈[0,1] x∈[0,1] Will be mapped to
Please note that , This mapping function is now represented by a differentiable parameter , Make the function relative to x x x And parameters { t l } \{t_l\} { tl} Are differentiable .
4 Learning
The whole training cycle is as follows :
4.1 Use DNN Perform function approximation
This work uses CNN,CNN There is Two strategic networks , They map images to action probabilities π 1 π_1 π1( stay softmax after ) Or filter parameters π 2 π_2 π2( stay tanh after ). For strategy π 1 π_1 π1 and π 2 π_2 π2, The network parameters are expressed as θ 1 θ_1 θ1 and θ 2 θ_2 θ2, The training goal is to optimize θ = ( θ 1 , θ 2 ) θ=(θ_1,θ_2) θ=(θ1,θ2) To make the goal J ( π θ ) J(π_θ) J(πθ) Maximize . In addition, I also learned a Value network And a Discriminator network , This helps with the training described later .
The above network shares the same architecture as shown in the figure below , At the same time, there are different numbers of output neurons according to the output content .
For each CNN Use Four convolutions , Every convolution layer has The size is 4 × 4 4 × 4 4×4 and In steps of 2 Filter for . And then there was a Fully connected layer , To output the quantity Reduced to 128, And then a The final full connection layer , Further, the regression characteristics are transformed into the required parameters of each network . After the first full connection layer , application dropout( During training and testing 50%) Provide noise for the generator .
4.2 Strategy network training
The strategy network uses the strategy gradient method for training , This method uses gradient descent to optimize the parameterization strategy for the expected return . Due to strategy π π π By corresponding to two decision-making steps ( That is, filter and parameter selection ) The two part ( π 1 , π 2 ) (π_1,π_2) (π1,π2) form , They learn in a staggered way .
For filter selection , Yes π 1 π_1 π1 sampling , It is a discrete probability distribution function π 1 ( F k ) = P [ a 1 = F k ] π_1 (F_k)=\mathbb{P}[a_1=F_k] π1(Fk)=P[a1=Fk], Applicable to all filters F = { F 1 , F 2 , . . . , F n } F=\{F_1,F_2,...,F_n\} F={ F1,F2,...,Fn}. In this paper, the partial derivative is solved by applying the strategic gradient theorem ∂J(π)/∂π(F_k) Problems that cannot be calculated directly to obtain J ( π ) J(π) J(π) be relative to π 1 π_1 π1 The unbiased gradient of Monte Carlo It is estimated that . For filter parameter selection , Strategy π 2 π_2 π2 It's deterministic , Therefore, it is easier to optimize in continuous space , Here we use the deterministic strategy gradient theorem . therefore , The policy gradient is expressed as
Q Q Q Is the value function defined above .
ρ π ρ^π ρπ: The distribution of discount status defined as
To calculate these gradients , This paper applies actor-critic frame , The participants are represented by the policy network , Critics are value networks , It learns to use by ν A parameterized CNN To approximate the state value function V π V^π Vπ. Use critic, The action value function can be calculated by expanding its definition value function and expressing it with the state value function Q π Q^π Qπ:
Optimize the value network by minimizing :
δ δ δ: Time difference (TD) error : δ = r ( s , a ) + γ V ( p ( s , a ) ) − V ( s ) δ=r(s,a)+γV (p(s,a))- V(s) δ=r(s,a)+γV(p(s,a))−V(s)
δ δ δ It also shows the advantages A ( s , a ) = Q ( s , a ) − V ( s ) A(s,a)=Q(s,a)-V(s) A(s,a)=Q(s,a)−V(s) Of Monte Carlo It is estimated that , In the state s s s Next , action a a a How much does the value of exceed the expected value of the action . Used in the calculation equation π 1 π_1 π1 Gradient of . π 2 π_2 π2 The gradient of does not need Monte Carlo It is estimated that , So directly through to Q Q Q The gradient of applies the chain rule to calculate it , Instead of using advantages A A A.
Incentives and discount factors :
The ultimate goal is to get the best final result after all operations . So , In this paper, the reward is set as the incremental improvement of quality score ( stay 4.3 Modeling by discriminator network ) Add a penalty ( In the 4.4 Section describes ). Set the discount factor to γ = 1 γ=1 γ=1, And allow agent Edit the input image five times . This number of edits is selected to balance the expressiveness and simplicity of the operation sequence . This article uses a fixed number of steps , Because doing so will make training more stable than when to stop online learning .
4.3 Quality assessment through confrontation learning
In order to generate results as close to the target data set as possible , This article uses GAN, It consists of two parts , I.e. generator and discriminator . These two parts are optimized in a confrontational way : The discriminator is trained to determine whether the image is from the target data set or generated by the generator ; The generator aims to “ cheating ” Judging device , Thus, the discriminator cannot distinguish the difference . Both networks train at the same time , Achieve the ideal balance when the generated image is close to the target .
In this work , This article uses traditional GAN A popular variant of , be called Wasserstein GAN (WGAN), It uses land travel distance (EMD) To measure the difference between two probability distributions . It has been proved to be stable GAN Train and avoid the gradient disappearing . Discriminator D D D The loss of is defined as :
Discriminator D D D Modeled as CNN, Its parameter is expressed as w w w. Generator's “ Negative loss ”( Mass fraction ), Its increment is a part of the reward in this system , yes
4.4 Training strategy
In order to solve RL Algorithm and GAN The problem of difficult training , Use the following strategies to stabilize the training process .
Development VS Explore :
There is a trade-off between development and exploration , That is, invest more energy in improving existing strategies , Or try new actions to find potential greater future returns . This is particularly challenging for the two-stage decision-making problem in this paper , Because focusing on one filter may lead to insufficient learning and utilization of filter parameters of other filters . To avoid this local minimum , If its motion proposal is too concentrated , It has low entropy , Will punish π 1 π_1 π1. This is done by reducing rewards :
Besides , Find out agent The filter may be reused in the finishing process , For example, by applying two consecutive exposure adjustments , Instead of combining them into one step . For more concise decoration solutions , We use the penalty filter to reuse “ teach ”agent Avoid such behavior : If agent Use the filter twice , The second use will produce -1 Extra punishment . To achieve this ,agent You need to know which filters were applied early in the process , encourage agent Making full use of the maximum potential of each filter also leads to more exploration of different filters .
Disorderly training :
Images along a single track in successive steps may be highly correlated , This correlation is right RL and GAN It's all harmful . To solve this problem , This paper presents a training scheme of disordered order , Instead of sequential training . say concretely , Not start and finish a small amount at the same time ( for example ,64 As a batch ) The trajectory , Instead, maintain a large number of running tracks in the track buffer . In each training iteration , From track buffer ( Not necessarily at the same stage ) Sample a batch of images , Apply an operation step to it , Then put the edited image back into the buffer .
There are two benefits of this mechanism :
(1) about RL, It partly plays the role of experience playback mechanism , Can be observed “ smooth ” Training data distribution ;
(2) about GAN Training , The effect of this method is similar to “ history ” buffer , This also helps to reduce model oscillations .
5 Result
This section introduces the implementation details of the system 、 Validation and Application .
Implementation details :
(1) use TensorFlow Realize this system .
(2) In order to estimate modification actions and parameters , Down sample the high-resolution input image to 64×64px, Achieve a balance between performance and network size . And the resulting small network size can prevent over fitting , Lead to fast reasoning , And it is easy to incorporate the model into the application . The estimated actions and parameters are then applied to the full resolution image at runtime .
(3) All networks Use Adam To optimize , The learning rates are The policy network is 1.5 × 1 0 − 5 1.5×10^{-5} 1.5×10−5, The discriminator is 5 × 1 0 − 5 5×10^{-5} 5×10−5, The value network is 5 × 1 0 − 4 5×10^{-4} 5×10−4. During training , These learning rates decay exponentially to the original value 1 0 − 3 10^{-3} 10−3.
Efficiency and model size :
Thanks to the resolution independent filter design , stay NVIDIA TITAN X (Maxwell) GPU Reasoning on requires 30 millisecond , The model is small (< 30MB)
Data sets :
(1)MIT-Adobe FiveK Data sets . Divide the data set randomly into three parts ( The three parts do not intersect with each other ):
part 1:2,000 Zhang input RAW Images ,
part 2:2,000 Zhang you, an expert C Decorated image
part 3:1,000 Zhang input RAW Images are tested .
(2)500px Data sets . stay 500px.com Professional decoration photos of two artists captured on . The styles of the two groups of data are relatively consistent , By 369 and 397 A picture of .
Error measurement :
Evaluate the similarity between the generated image and the target image according to the distribution of image attributes . Brightness is used in this work 、 Contrast 、 Saturation is the three descriptive features of image style , The histogram intersection is used to measure their distribution distance between the output image and the target image .
Supplementary materials :
The number of histogram intersections is defined as follows :
—— brightness Defined as average pixel brightness .
—— Contrast Defined as twice the variance of pixel brightness .
—— saturation Defined as average pixel saturation (HSL In color space “S” value ).
The result is in the interval [0,1] Divided into 32 Two equal bin, namely [0,1/32),[1/32,2/32)….
However , Only 1,000 Sample image , Average each bin Only about 31.25 Zhang image , This leads to significant measurement noise . therefore , By cropping in each image 16 individual patch To increase the data of histogram intersection , And measure this 16,000 Images patch Number of histograms on .
5.1 End to end post-processing and style learning
Use from MIT-Adobe FiveK Data sets Part 1 Of RAW Three groups of experiments were carried out with images as input , Use from MIT-Adobe FiveK Experts in data sets C(Part 2) And from the 500px.com The images of two artists are used as the target data set .
The first group of experiments : With experts C(Part 2) The image of is the experiment of the target image , The visual results are shown in the figure , The quantitative results are shown in the table :
The second group of experiments : Yes 500px The artist A Style learning experiment , The visual results are shown in the figure , The quantitative results are shown in the table :
The third group of experiments : Yes 500px The artist B Style learning experiment , The visual results are shown in the figure , The quantitative results are shown in the table :
( second 、 In three experiments ,Pix2pix Unable to generate comparison results , Because the downloaded images usually have no matching training data from the network .)
generalization :
Apply the network to another group RAW Good results were obtained on the photos , As shown in the figure :
5.2 Reverse engineering black box filter
This method not only produces visually pleasing effects , It also reveals how this process is completed step by step , As shown in the figure below :
To the best of the author's knowledge , This is a For the first time, this understandable result is obtained in an image processing system based on deep learning .
With the help of this system , It is even possible to write display code for the black box filter according to the estimated sequence of operations , As shown in the figure below :
6 CONCLUDING REMARKS
Inspired by the process of drawing revision by expert photographers , This paper presents a general framework for automatic photo post-processing , It mainly consists of three parts :
(1) Reinforcement learning , To reveal an understandable solution consisting of common image operations ;
(2) Generate adversarial networks , Allow training from unpaired image data ;
(3) Differentiable 、 Resolution independent filter , It makes it possible for the network to optimize various editing operators on images of any resolution .
Pixel level denoising is difficult to model as a resolution independent differentiable filter , Therefore, it is necessary to denoise the input image before using the framework in this paper .
Sometimes this method can't generate good hue for human face , And may not improve the content 、 Input photos with poor composition or lighting conditions , As shown in the figure :
边栏推荐
- 银行卡借给别人是否构成犯罪
- 8款最佳实践,保护你的 IaC 安全!
- 流动性质押挖矿系统开发如何制作,dapp丨defi丨nft丨lp流动性质押挖矿系统开发案例分析及源码
- 8 best practices to protect your IAC security!
- Exposure:A White-Box Photo Post-Processing Framework阅读札记
- Intel Labs announces new progress in integrated photonics research
- Exploration and practice of inress in kubernetes
- How to make the development of liquidity pledge mining system, case analysis and source code of DAPP defi NFT LP liquidity pledge mining system development
- escape sequence
- 用于分类任务的数据集划分脚本
猜你喜欢
Redis configuration environment variables
2022/6/28学习总结
華為設備配置大型網絡WLAN基本業務
Network security learning notes 01 network security foundation
英特爾實驗室公布集成光子學研究新進展
Exploration and practice of inress in kubernetes
2022/6/30学习总结
Tianrunyun, invested by Tian Suning, was listed: its market value was 2.2 billion Hong Kong, and its first year profit decreased by 75%
技术分享 | Linkis参数介绍
TEMPEST HDMI泄漏接收 3
随机推荐
ES6 Promise用法小结
TEMPEST HDMI泄漏接收 3
ABBIRB120工业机器人机械零点位置
博途V15添加GSD文件
Jd.com renewed its cooperation with Tencent: issuing class A shares to Tencent with a maximum value of US $220million
8款最佳实践,保护你的 IaC 安全!
Wonderful! MarkBERT
Software project management 9.2 Software project configuration management process
Exposure:A White-Box Photo Post-Processing Framework阅读札记
分享psd格式怎么预览的方法和psd文件缩略图插件[通俗易懂]
MySQL IN 和 NOT IN () 空列表报错
Huawei Equipment configure les services de base du réseau WLAN à grande échelle
Unittest 框架介绍及第一个demo
软件项目管理 9.2.软件项目配置管理过程
Unittest框架中测试用例编写规范以及如何运行测试用例
TEMPEST HDMI泄漏接收 4
Technology sharing | introduction to linkis parameters
Tianrunyun, invested by Tian Suning, was listed: its market value was 2.2 billion Hong Kong, and its first year profit decreased by 75%
In June 2022, it was the first programming language?!
Are the consequences of securities account cancellation safe