当前位置:网站首页>AI clothing generation helps you complete the last step of clothing design
AI clothing generation helps you complete the last step of clothing design
2022-06-25 02:30:00 【Paddlepaddle】
This article has been published on the official account of the flying oar , Please check the link :
AI Clothing generation , Help you finish the last step of fashion design

How to use AI Empowering the fashion design industry , This is what Hong Li, the technical expert of the propeller developer, has been thinking about . After the designer conceives and draws a dress , If you can generate the overall effect of clothes with one click , It can help them according to the version of the finished product 、 Style and other factors to make a better design . After the basic idea of the project is determined , Hong Lizai AI Studio The platform began to practice using the propeller frame . At present, this project can realize garment generation , We look forward to discussing more optimizations with more developers ( For example, the diversity of design presentation ), The following is Hong Li's sharing .
Project background
In order to establish the basic design objectives of the clothing generation project , I need to look for relevant technologies . One of the differences between garment generation task and other generation tasks , It is required to output a “ clean ” Clothes , No fancy backgrounds —— That is, it requires the generator to focus on clothes , Instead of generating a complete picture . So in loss Part of the design , I thought about clothes mask Mask part —— For the model , The input is the semantic segmentation information , But the output is the clothing picture . I came across it on the video website SPADE Display effect of , such “ Ma Liang's magic pen ” The general presentation effect makes me feel very friendly , And realize SPADE The architecture meets my general needs for the model . But when I use SPADE In training , It is difficult to train the model , By chance , I found a new paper Semantically Multi-modal Image Synthesis.
This paper introduces in detail
https://aistudio.baidu.com/aistudio/projectdetail/3454453
This paper is based on SPADE framework , Use deep fashion Data sets , Show good results . among Encoder The output of preserves the spatial structure information , I think this is why the model is easier to train . Based on this , I was able to easily modify my first version of the project , The prototype of the model framework of garment generation project is generated .
Project practice
Click on GET Detailed address of the project
https://aistudio.baidu.com/aistudio/projectdetail/3405079
Problems and data tuning
In the early stage of model training, I mainly encountered two problems :
One is the selection of training sets .
Second, model training is very slow ,1 individual batch It takes tens of seconds .
To solve the above problems , I processed the data according to the following steps :
1. selection FGVC6 Data sets . This data set provides accurate marking areas for each part of clothes , Altogether 46 class , As shown in Figure 1 .

chart 1 FGVC Data set presentation [1]
2. When testing , This is no longer required for model input GT, Only the semantic segmentation information is needed , The details of this part are to be explained in detail below .
3. Input to model semantic segmentation Tensor The format is [batch_size,class_num,H,W], The details are as follows [4,46,256,256],46 Yes, there are 46 individual label, During training batch_size by 4.
4. in addition ,loss The mask of the clothes should be taken into account in the calculation .
5. Because the format of the data I input is 256*256, Therefore, we need to segment the image and semantic information resize, Give Way H and W All for 256. The image size of the original dataset is too large ,H and W Worth even thousands , Lead to resize The operation takes a lot of time . If a doubting friend asks me , Why not crop Cutting cloth ? Here are a few reasons :
First , I designed a clothing generation project , So the model considers the whole clothing , Give as little local information as possible , To prevent a glimpse of the leopard , Let the model have its own “ pattern ”.
secondly , In a real picture , The position of the clothes themselves is not large and the position is not fixed , There is a high probability that 2500*2500 Cut out a 256*256 The area is completely black , Unable to provide information .
Last , use crop Processing data in a tailoring way , It is easy to cause that the visibility probability of some label models is very small , For example, the proportion of shoes is very small , It is easy to cause the inaccuracy of the generated effect .
6. Aiming at the problem that the model training is too slow , I tried a two-step tuning scheme . First , I started with online resize ,1 individual batch It takes tens of seconds . After summing up, I found that , Because the format of the image data I input is 256*256, So it's unlikely to be forward propagation tensor The problem of computation , Therefore, the problem can be located in the data preprocessing part . therefore , Under the guidance of the propeller developer motivator , I tried offline for the first time resize, Save as npy, It ensures the smooth start of model training . Besides , In the offline resize When saving semantic segmentation information , What I originally set up npy yes [256,256,class_num], Too sparse , Takes up a lot of memory , Save only 1000 Group around . actually , A pixel actually has only one label , therefore , I adjust storage , Save it as [256,256,1], Finally, it can be saved as 10000 Group around , Greatly improve storage efficiency .
Thinking about model training and loss
1. I took GAN As the main form of the model : The generator body is Semantically Multi-modal Image Synthesis The model architecture of ; The discriminator uses Multihead Discriminator, This can support the feature alignment of the discriminator .
2. The discriminator has three tasks , It needs to be judged Ground Truth by True, The picture generated by the discrimination generator is False, At the same time, it is required to distinguish semantic segmentation as False( Improvements ), This is to help the generator generate more complex and realistic textures .
3. In order to focus the center of the model on the area with clothes , Generator's featloss Only consider the binary value of clothes (0,1)mask Part of .
4. I will spade.py in nn.conv2d(46,128) Become ordinary convolution , Packet convolution is not used , The reason is that 46 You can't get rid of it group_num = 4.

chart 2 Semantically Multi-modal Image Synthesis Model architecture
loss visualization
Finally, loss Visualizing , Pictured 3 Shown , among :
d_real_loss: The discriminator determines that the real picture is True;
d_fake_loss: The picture generated by the discriminator discriminator generator is False;
d_seg_ganloss: The discriminator distinguishes the semantic segmentation as False;
d_all_loss: d_real_loss + d_fake_loss + d_seg_ganloss;
g_ganloss: It is required that the picture generated by the generator can be judged as True;
g_featloss: In the discriminator, the image generated by the generator is aligned with the real image features ;
g_vggloss: Generator generated pictures and GT adopt VGG Calculate the perceived loss ;
g_styleloss: Generators and GT adopt Gram Matrix computing style loss ;
kldloss: Calculate a positive distribution with the standard kl The divergence ;
g_loss:g_ganloss+g_featloss+g_vggloss+g_styleloss+kldloss.

chart 3 loss visualization
Effect display
It’s show time! Generate effects for the model from left to right 、Ground Truth( It can be understood as the model reference answer )、 Semantic segmentation visualization of model input .
chart 4 Effect display
Review and think
There are still many things worth improving in this project , For example, whether it is possible to optimize the model framework , Provide more delicate feature control for clothes , Or better improve the diversity of the generated models . I will continue to study in the field of image generation , Looking forward to better public projects in the future , Welcome to communicate with me .
reference
[1] Semantic Image Synthesis with Spatially-Adaptive Normalization
[2] Semantically Multi-modal Image Synthesis
Focus on 【 Flying propeller PaddlePaddle】 official account
Get more technical content ~
边栏推荐
- Computing service network: a systematic revolution of multi integration
- Migrate Oracle database from windows system to Linux Oracle RAC cluster environment (2) -- convert database to cluster mode
- Mall project pc--- product details page
- 内网学习笔记(6)
- 高数 | 精通中值定理 解题套路汇总
- Is it out of reach to enter Ali as a tester? Here may be the answer you want
- Planification du réseau | [quatre couches de réseau] points de connaissance et exemples
- 对进程内存的实践和思考
- It is said that Yijia will soon update the product line of TWS earplugs, smart watches and bracelets
- random list随机生成不重复数
猜你喜欢

leecode学习笔记-机器人走到终点的最短路径

Use of hashcat

What is the reason for the disconnection of video playback due to the EHOME protocol access of easycvr platform?

3 years of testing experience. I don't even understand what I really need on my resume. I need 20K to open my mouth?

保险APP适老化服务评测分析2022第06期

Sumati gamefi ecological overview, element design in the magical world

EasyCVR国标协议接入的通道,在线通道部分播放异常是什么原因?

Is it out of reach to enter Ali as a tester? Here may be the answer you want

当他们在私域里,掌握了分寸感

Left hand dreams right hand responsibilities GAC Honda not only pays attention to sales but also children's safety
随机推荐
How to uninstall CUDA
业务与技术双向结合构建银行数据安全管理体系
3年测试经验,连简历上真正需要什么都没搞明白,张口就要20k?
Squid 代理服务器之 ACL 访问控制
Of the seven levels of software testers, it is said that only 1% can achieve level 7
【FPGA】串口以命令控制温度采集
1-6搭建Win7虚拟机环境
Post competition summary of kaggle patent matching competition
分布式事务解决方案和代码落地
Test / development programmers, 30, do you feel confused? And where to go
【直播回顾】战码先锋第七期:三方应用开发者如何为开源做贡献
Using qdomdocument to manipulate XML files in QT
F - spices (linear basis)
调用系统函数安全方案
当一个接口出现异常时候,你是如何分析异常的?
软件测试人员的7个等级,据说只有1%的人能做到级别7
Charles 抓包工具
背了八股文,六月赢麻了……
[STL source code analysis] configurator (to be supplemented)
消息称一加将很快更新TWS耳塞、智能手表和手环产品线