当前位置:网站首页>Train 100 pictures for 1 hour, and the style of the photos changes at will. There is a demo at the end of the article | siggraph 2021
Train 100 pictures for 1 hour, and the style of the photos changes at will. There is a demo at the end of the article | siggraph 2021
2022-07-06 16:54:00 【ByteDance Technology】
Photo becomes portrait , Now we have the latest technology .
Pictures of normal people :
It can automatically become a character in a cartoon movie , With big eyes 、 The skin looks slippery , It also retains many original characters :
It can also become a martial arts game style , Girls have sharp Chins , Boys' hair is elegant , It seems that the next second is going to fix immortals :
Or become a character in oil painting , The facial contour is still full , But the light and shadow lines and texture convey a renaissance temperament :
A more special style is no problem , For example, first look at these two photos of people :
It can become hand-painted , eyelash 、 Lips 、 hair , The strokes of every detail are extremely real :
Become a sculpture , It looks like a picture taken in a Museum :
Or sketch , It looks as if I walked into the classroom of the Academy of Fine Arts , Even the hollowed out earrings of the figure on the right can be accurately depicted :
Turn into a doll , Three dimensional feeling is quite strong , It's like the refined product map of online store :
Men, women, old and young 、 How about skin color, hair style and appearance characteristics , Can achieve very good results :
Create models of these changes , named AgileGAN, Jointly produced by ByteDance overseas technical team and Nanyang Technological University in Singapore , Has been selected as the top student in graphics SIGGRAPH 2021.
Make the model generation so beautiful 、 HD pictures , Is it necessary to heap massive amounts of data and computing power ?
no .
Train a new style , Just need to Men and women 100 Zhang Zuo Make a training set with pictures of , stay A piece of NVIDIA Tesla V100 Last training for an hour , You can get the generated 1024×1024 High definition picture model .
Even with this model , You can realize rich photo editing functions . Such as the original picture :
You can modify the light and shadow effect of the generated photos , Backlighting when taking pictures can also save :
Even modify the angle of the generated photo :
The R & D team of this technology is also planning to apply it to Tiktok 、 Flying books and other products , Maybe in the near future , You can be in these App Get a more interesting interactive experience on .
But the good news is , Now? ,AgileGAN Has released Demo( See the end of the article for links ), You can also use your own 、 Relatives and friends 、idol Let's try our photos .
What kind of GAN can 「 Draw 」 Such realistic effect ?
The model that generates these effects consists of two parts :
The front end is hierarchical variational self encoder mapping (hVAE) Enter pictures into StyleGAN2 Hidden space ; The back end is the stylized generator of decoding .
Both parts are based on pre training StyleGAN2—— A method that can generate all kinds of faces GAN Model .
StyleGAN2 Generated faces
StyleGAN2 There are two hidden spaces for image generation , One is with standard Gaussian distribution Z Space ; The other is obtained by a series of nonlinear mappings W Space ,W Space is decoupled , But the distribution is very complex . Usually , Industry R & D personnel are using StyleGAN2 When reconstructing a user entered image , I usually choose W Space .
But this time , The research team found that if the distribution of hidden space can be reflected in the picture , Conform to the original StyleGAN2 Gaussian distribution in hidden space , When generating pictures of various styles , You can reduce the noise , Generate better looking pictures .
In order to achieve such a goal , The team abandoned the commonly used W Space , So as to select the one with standard Gaussian distribution Z Space ; And it adds its hierarchical dimension to express more complex pictures .
after , They also use variational expression to simulate distribution , Then train such a front-end layered variational self encoder , The back end of the encoder is pre trained StyleGAN2 The generator .
In order to better generate user attribute features , They also proposed an attribute aware generator , stay StyleGAN2 Based on the pre training model , Fine tuned the generator , Let it generate cartoons 、 Hand painted and other different styles . And a dynamic stop strategy is adopted , To avoid over fitting small training data sets .
These two training stages can be performed independently , You can train in parallel .
The layered variational automatic encoder introduced here , Structure is shown in figure :
Last , How to measure AgileGAN Generation effect of ?
Evaluate from two perspectives , One is 「 Beauty is not beauty 」, Whether it can meet the preferences of users ; Two is 「 Does it look similar? 」, The generated pictures need to be like art style .
To judge 「 Beauty is not beauty 」, The research team found 100 Famous melon eaters , Show everyone the random 10 Pictures created by Zhang , The models that generate them include AgileGAN And pictures generated by several well-known models before , Let them choose the best picture .
result , As shown in the middle column of the following table ,57.9% Of the votes cast for this work AgileGAN.
And to verify 「 Does it look similar? 」, They evaluated several GAN Model Fréchet Inception Distance(FID)—— A common way to GAN Scoring method , Compare the art style with the neural network feature distribution of the generated image , The lower the score , The more the image 「 Take care of yourself 」,AgileGAN Still the highest fidelity model .
Produced by ByteDance intelligent creation team in North America
The researchers of this achievement are from ByteDance and Nanyang Technological University in Singapore , Some of the ByteDance R & D students are from base The intelligent creative team in mountain view, North America —— It's the one in Tiktok 、TikTok The technical team that creates all kinds of popular special effects .
AgileGAN The implementation of has experienced as long as 8 Months of process , Their initial inspiration came from a group of paintings they saw on social networking sites , An artist turned various portraits into cartoons . These works not only have the exaggerated expression of cartoons , Have a round head 、 Big eyes and flowing hair , And the facial features of the original portrait are properly preserved , Some have a tall nose , Some eye sockets are deep .
They thought of , If you let the algorithm produce a similar effect , It can provide users with better 、 More interesting interactive experience .
therefore , They tried SEAN、CycleGAN、MUNIT as well as 3D warpping And so on , A lot of optimization and debugging have been done on each different idea , Constantly seek the most advanced in the industry 、 The most practical solution , Overcome difficulties on the core effect , Finally, I chose Toonify And StyleGAN Combination of ideas , And found some of the core limitations , Creatively solved the problem , Let the model produce the best effect . This year, 8 month ,AgileGAN Will also be at the top of graphics SIGGRAPH 2021 Show on .
except AgileGAN outside ,base Members of the ByteDance intelligent creation team in mountain view city and Los Angeles have also done things including 3D、 Virtual human 、 Various technologies related to character image, including image generation .
such as , Turn into Mona Lisa in the oil frame :
Or put a virtual wig on the character , If you look carefully , You will find that these dynamic wigs are not only lifelike , It can also match the real light and shadow in the scene .
also 「 Turn into a beauty 」 Special effects props , Let users see what sex transfer looks like :
Change the look ,「 Bald challenge 」:
「 Dynamic old photos 」 The props , Let you move as you did years ago :
Let the still pictures move , Just like the portrait on the wall of Hogwarts , It can also dance with people's movements :
These various effects will be applied to Tiktok 、TikTok And other applications , Bring more rich and novel experience to users .
Related links
Address of thesis :
https://guoxiansong.github.io/homepage/paper/AgileGAN.pdf
Official website of the project :
https://guoxiansong.github.io/homepage/agilegan.html
On-line Demo:
http://www.agilegan.com/
边栏推荐
- The concept of spark independent cluster worker and executor
- Two weeks' experience of intermediate software designer in the crash soft exam
- SQL quick start
- Chapter 6 datanode
- ~76 sprite map
- Basic principles of video compression coding and audio compression coding
- Record the error reason
- Restful style interface design
- LeetCode 1640. Can I connect to form an array
- 字节跳动多篇论文入选 CVPR 2021,精选干货都在这里了
猜你喜欢
README. txt
Fdog series (III): use Tencent cloud SMS interface to send SMS, write database, deploy to server, web finale.
Soft music -js find the number of times that character appears in the string - Feng Hao's blog
ByteDance 2022 school recruitment R & D advance approval publicity meeting, students' top 10 issues
~73 other text styles
[graduation project] QT from introduction to practice: realize imitation of QQ communication, which is also the last blog post in school.
Two weeks' experience of intermediate software designer in the crash soft exam
Some instructions on whether to call destructor when QT window closes and application stops
~68 Icon Font introduction
图像处理一百题(1-10)
随机推荐
Cmake error: could not create named generator visual studio 16 2019 solution
@RequestMapping、@GetMapping
Use JQ to realize the reverse selection of all and no selection at all - Feng Hao's blog
~87 animation
J'ai traversé le chemin le plus fou, le circuit cérébral d'un programmeur de saut d'octets
~82 style of table
这群程序员中的「广告狂人」,把抖音广告做成了AR游戏
LeetCode 1558. Get the minimum number of function calls of the target array
Codeforces Global Round 19
~77 linear gradient
Basic principles of video compression coding and audio compression coding
LeetCode 1640. Can I connect to form an array
我在字节跳动「修电影」
搭建flutter环境入坑集合
Simple records of business system migration from Oracle to opengauss database
7-7 ring the stupid bell
Shell_ 05_ operator
Mp4 format details
视频压缩编码和音频压缩编码基本原理
@RestController、@Controller