当前位置:网站首页>2021 CIKM |GF-VAE: A Flow-based Variational Autoencoder for Molecule Generation
2021 CIKM |GF-VAE: A Flow-based Variational Autoencoder for Molecule Generation
2022-07-26 04:01:00 【Stunned flounder (】
2021 CIKM |GF-VAE: A Flow-based Variational Autoencoder for Molecule Generation

Paper: https://dl.acm.org/doi/epdf/10.1145/3459637.3482260
Code: https://github.com/chshm/GF-VAE
GF-VAE: An automatic coder of molecular generation variation based on flow
Molecular generation is a challenging but meaningful work , He needs to follow the rules of chemical valence and optimize the given goal . Recently, the more effective method is the combination of molecular graph and generation model , But the cost of calculation is very high . therefore , The author puts forward GF-VAE, A stream based variational automatic encoder for molecular graph generation (VAE) Model . The model is in the original VAE That's an increase from Flow Model decoder . The encoder is mainly used to speed up the training of decoding , The decoder optimizes the performance of the encoder in turn . Because of the reversibility of the flow model , The generation process is easily accomplished by reversing the decoder . therefore ,GF-VAE Inherited VAE And the advantages of flow based methods . In molecular formation and reconstruction 、 Learn the smoothness of potential space 、 The model is verified by attribute optimization and constrained attribute optimization .
Model
In recent years , The generation model mainly includes generation of confrontation network (GAN)、 Variational automatic encoder (VAE) And flow based models . They have been applied to molecular generation , For example, based on GAN Of MolGAN and GCPN、 be based on VAE Of CVAE and JT-VAE, Flow based GraphNVP and MoFlow Model .
GF-VAE, yes VAE Seamless integration with normalized flow model , For one-time molecular graph generation .GF-VAE Use the flow generator to guide VAE Coders learn more meaningful advanced molecular representations , At the same time, it uses VAE The encoder reduces the weight of the stream generator . As shown in the figure below :

L A / L B L_A/L_B LA/LB Represents the entire module stack L A / L B L_A/L_B LA/LB Time , For better mapping performance , K A / K B K_A/K_B KA/KB Represents the stacking times of the coupling layer in the corresponding box , The best combination of parameters is selected through subsequent experiments .
data
- QM9
QM9 contain 134k molecular , most 9 Atoms , It is divided into 4 Different types . - ZINC-250K
ZINC-250K from 250k Molecular composition , most 38 Atoms , It is divided into 9 Different types .
The baseline
be based on VAE Model of
- CVAE
- GVAE
- GraphVAE
Flow based model
- GraphNVP
- GRF
Evaluation indicators
- Validity(V) The percentage of chemically effective molecules in all generated molecules ;
- Novelty(N) The percentage of effective molecules generated that do not appear in the training set ;
- Uniqueness(U) The only effective molecule accounts for the percentage of all generated molecules ;
- Reconstruction The percentage of molecules that can be reconstructed from their own potential vectors .
- Score(S), It is Validity、Novelty and Uniqueness The product of the
experiment

(a) and (b) Is from QM9 Two randomly sampled molecules . and (d) It shows the figure (a) and (b) Middle atom - Atomic cosine similarity heatmap, It is calculated according to the embedded vector of the encoder .(e) and (f) Shows based on GF-VAE The atomic embedding vector of the encoder calculates the similarity .
Qualitatively check the smoothness of the potential space of learning in two ways . One is to use the grid search method in two random orthogonal directions to find the neighborhood of randomly selected molecules in the potential space and visualize them . The other is to interpolate between two potential points of the molecular graph . choice Tanimoto Index as an index of chemical similarity , And pass heatmap Indicates their similarity value .

Upper figure (a) Show that the learned potential space is smooth , Because the adjacent potential points correspond to molecules with small changes .(b) It also shows that , In addition to the first step , The interpolation between the two potential points only slightly changes the molecular graph . This may be because the learned potential space does not follow a uniform distribution , Similar molecules are closely packed together , And different molecules are loosely dispersed around .
Molecular optimization
- One is feature optimization , It generates new molecules with the best characteristic fraction .
- The other is constrained property optimization , This means finding molecules that are similar to a given molecule but have better chemical properties .
Choose a quantitative estimate of drug similarity (QED) and penalized logP(plogp) As a target attribute . Molecular similarity is measured by the Kurimoto similarity of Morgan fingerprints .
surface 4. stay ZINC-250K Top three scores plogP and QED Optimize molecules 
surface 5.ZINC-250k Constraints on plogP Optimize 


Constraint attribute optimization . The arrow points from the original molecule to the optimized molecule . The values on the left and right of the arrow represent the property improvement and similarity of a given molecular pair, respectively
Reference resources
https://baijiahao.baidu.com/s?id=1729293722854317823&wfr=spider&for=pc
边栏推荐
- Wechat applet realizes music player (5)
- 资深报表开发经验总结:明白这一点,没有做不好的报表
- 用GaussDB(for Redis)存画像,推荐业务轻松降本60%
- 5 years, 1.4W times, NFT og's road to immortality Web3 column
- 《opencv学习笔记》-- 边缘检测和canny算子、sobel算子、LapIacian 算子、scharr滤波器
- How to use graffiti magic color product development kit
- Verilog implementation of key dithering elimination
- 想要做好软件测试,可以先了解AST、SCA和渗透测试
- PHP <=> 太空船运算符(组合比较符)
- Laravel8 implements interface authentication encapsulation using JWT
猜你喜欢

Six years of automated testing from scratch, I don't regret turning development to testing

5 years, 1.4W times, NFT og's road to immortality Web3 column

Dracoo master
![[Reading Notes - > data analysis] 01 introduction to data analysis](/img/50/622878bf649e77d5a4fa9732fa6f92.png)
[Reading Notes - > data analysis] 01 introduction to data analysis

ASEMI整流桥GBU1510参数,GBU1510规格,GBU1510封装

资深报表开发经验总结:明白这一点,没有做不好的报表

1311_ Hardware design_ Summary of ICT concept, application, advantages and disadvantages

Save the image with gaussdb (for redis), and the recommended business can easily reduce the cost by 60%
![[unity3d shader] character projection and reflection](/img/00/d0d994d88475ea590dc5cb60a6ad65.png)
[unity3d shader] character projection and reflection

Find My技术|物联网资产跟踪市场规模达66亿美元,Find My助力市场发展
随机推荐
Three ways of redis cluster
Dracoo master
Bond network mode configuration
触觉智能分享-RK3568在景区导览机器人中的应用
JS base64编码和解码
软考 系统架构设计师 简明教程 | 案例分析解题技巧
php 查找 session 存储文件位置的方法
PHP connects to MySQL database, and database connects to static tool classes to simplify the connection.
在 Istio 服务网格内连接外部 MySQL 数据库
laravel8 实现接口鉴权封装使用JWT
《opencv学习笔记》-- 重映射
Failed to install the hcmon driver
Redis如何实现持久化?详细讲解AOF触发机制及其优缺点,带你快速掌握AOF
【云原生之kubernetes】kubernetes集群下ConfigMap使用方法
2.9.4 Ext JS的布尔对象类型处理及便捷方法
Verilog implementation of key dithering elimination
[digital ic/fpga] Hot unique code detection
PHP method to find the location of session storage file
Dtcloud the next day
PHP 对象转换数组