当前位置:网站首页>2020 ACM | MoFlow: An Invertible Flow Model for Generating Molecular Graphs
2020 ACM | MoFlow: An Invertible Flow Model for Generating Molecular Graphs
2022-07-30 08:05:00 【Dazed flounder】
2020 ACM | MoFlow: An Invertible Flow Model for Generating Molecular Graphs

Paper: https://arxiv.org/abs/2006.10137
Code: https://github.com/calvin-zcx/moflow
MoFlow: Generate molecular graph can reverse flow model
Figure generation model usually includes two steps:Learning potential representation and generate molecular figure.然而,Due to the chemical constraints, and combination of molecular figure complexity,From the potential to generate new and effective molecular chemical graph is a very challenging.因此,In this paper, the author puts forward a model based on flow diagram is generatedMoFlow, The model to study reversible mapping between molecular graph and its potential said. MoFlow首先通过基于Glow的模型生成键(边),And then through the flow to generate a given key figure conditions原子(节点),The final assemble them into a chemical effective molecular figure,And to correct the after effect,.Through the four tasks to verify the model:Molecular graph generation and reconstruction、Continuous potential space visualization、性质优化和约束性质优化,Show that this model has potential efficiency and effectiveness of the.
模型
The flow framework
Based on the flow model is designed to study complex high-dimensional data X ∼ p x ( X ) X \sim p_x(X) X∼px(X)和 Z ∼ p z ( Z ) Z \sim p_z(Z) Z∼pz(Z)A series of reversible transformation between f θ = f L ∘ . . . ∘ f 1 f_{\theta} = f_L \circ ... \circ f_1 fθ=fL∘...∘f1,The transformation of dimension is the same,潜在分布 P Z ( Z ) P_Z(Z) PZ(Z)It is easy to model(例如,In such a potential space exists strong independence assumption).Potential complex data in original space modeling,其公式 Z = f Θ ( X ) Z =f_Θ(X) Z=fΘ(X)和
The original data sampling:对 X ∼ P x ( X ) X \sim P_x(X) X∼Px(X)The sampling is through Z ∼ P z ( Z ) Z \sim P_z(Z) Z∼Pz(Z)进行的,然后通过 f θ f_{\theta} fθReverse mapping get X = f θ − 1 ( Z ) X=f^{-1}_{\theta}(Z) X=fθ−1(Z).
设 Z = f θ ( X ) = f L ( X ) ∘ . . . ∘ f 1 ( X ) , H l = f l ( H l − 1 ) Z = f_{\theta}(X) = f_L(X) \circ ... \circ f_1(X), H_l =f_l (H_{l-1}) Z=fθ(X)=fL(X)∘...∘f1(X),Hl=fl(Hl−1)其中 f l ( l = 1 , … L ∈ N + ) f_l(l =1,…L \in N^+) fl(l=1,…L∈N+)是可逆映射. H 0 = X , H L = Z H_0 =X, H_L =Z H0=X,HL=Z和 P Z ( Z ) P_Z(Z) PZ(Z)Follow the independent dimension standard isotropic gaussian distribution.According to the variational formula to get X X X的对数似然:
其中, P Z i ( Z i ) P_{Z_i}(Z_i) PZi(Zi)为 Z Z Z的第 i i i维的概率,KaTeX parse error: Undefined control sequence: \rirc at position 18: …{\theta} = f_L \̲r̲i̲r̲c̲ ̲… \rirc F_1Is a kind of neural network can learn the reversible depth.
Invertible affine coupling layers
Jacobian computation,NICE 和 RealNVP Design the coupling of affine transform:
- 将 X X X分成两个部分 X = ( X 1 : d , X d + 1 : n ) X=(X_{1:d},X_{d+1:n}) X=(X1:d,Xd+1:n), The reversible by the:

- 在 X d + 1 : n X_{d+1:n} Xd+1:nThe affine transformation in the, Expression ability depends onScale函数 S θ → R n − d S_{\theta} \rightarrow R^{n-d} Sθ→Rn−d和Transformation函数 T θ : R d → R n − d T_{\theta : R^d \rightarrow R^{n-d}} Tθ:Rd→Rn−d Any neural structures of the.
- Effectively calculate jacobian:

Splitting Dimensions
基于流的RealNVP和GlowModel using extrusion operation,将空间维度 X c × n × n X^{c \times n \times n} Xc×n×n压缩为 X ( c h 2 ) × n h n h X^{(ch^2)\times \frac{n}{h} \frac{n}{h}} X(ch2)×hnhnIn order to obtain more channel,Then the channel is divided into two and a half for coupling layer.A layer of deep flow model to transform a layer of the same dimension before,To maintain all dimension transformation.
Numerical stability by actnorm
In order to ensure the numerical stability of the model based on flow,在GlowThe introduction of the code of conduct layer,This layer through a study of the scale of the affine transformation and deviation of each channel in batch dimension normalization.Each channel in the scale and bias initialized to batch scale standard deviation of the mean and the inverse.
MOFLOW MODEL
Below is the entire model figure,The molecular figureMWith the characteristic matrixA表示原子,Using adjacency tensorB表示键.推论:Atomic flow figure conditions(GCF) f A ∣ B f_{A|B} fA∣B将给定的BLatent vector into conditions Z A ∣ B Z_{A|B} ZA∣B,键的Glow f B f_{B} fB将 B B BConverted to latent vector Z B Z_B ZB.Latent space obey gaussian distribution.生成:The generation process is operating in front of the reverse transformation,Then validity is a correct program,To ensure the effectiveness of chemical.
A ∈ R n × k A \in R^{n \times k} A∈Rn×kAtomic matrix,And the atomic matrix has the most n n n个,以及 k k kType of atom. A ( i , k ) = 1 A(i,k)=1 A(i,k)=1代表着原子 i i i的类型为 k k k.
B ∈ R c × n × n B \in R^{c \times n \times n} B∈Rc×n×n:Chemical bond matrix. c c c:Represents the number of chemical bond types. B ( c , j , i ) = 1 B(c,j,i)=1 B(c,j,i)=1代表着原子 i i i与原子 j j jThe type of chemical bond between as c c c
M = A × B ∈ R n × k × R c × n × n M=A \times B \in R^{n \times k} \times R^{c \times n \times n} M=A×B∈Rn×k×Rc×n×n即矩阵 M M M是矩阵 A A A和 B B B的笛卡尔积.So the molecular structure of figure M M MCan be considered with multiple types of nodes and a variety of types of the edge of the undirected graph.
In order to capture the composite structure of atomic and molecular figure key,The author will the molecule generation model is decomposed into two parts:
参数 θ B \theta_B θB与 θ A ∣ B \theta_{A|B} θA∣BBy maximizing the type to learn:
Atomic flow figure conditions
A given chemical bond matrixB,Atom flowThe goal is to generate the correct atomic matrixA,To generate effective molecularM,有两部分组成.
- B-conditional flow: Z A ∣ B ∣ B = f A ∣ B ( A ∣ B ) Z_{A|B}|B=f_{A|B}(A|B) ZA∣B∣B=fA∣B(A∣B)Mapping is a reversible and dimension unchanged,Calculate the meetflowThe Jacobi matrix model.则有:

- Flow diagram conditions :By introducing a figure convolution structure,The author designed the each figure scale function of coupling layerSΘ和变换函数TΘ.使用多层感知器(MLP)Output layer stack multiplegraphconv->BatchNorm1d->ReLuLayer to build figure proportion functionSΘAnd figure conversion functionTΘ.

Glow for Bonds
Key flow to study reversible mappingfB,According to the change of variable formula,Get the keys of the log probability and tensor by reverse mapping to generate key. Can the key tensor using any flow model,并基于GlowFramework of a variant to build the key stream f B f_B fB.The author also follow the affine coupling layer scheme to establish the reversible mapping.For each of the affine coupling layer,分割输入 B B B沿着通道 c c cDimension is divided into two parts B = ( B 1 , B 2 ) B=(B_1,B_2) B=(B1,B2),得到输出 Z B = ( Z B 1 , Z B 2 ) Z_B=(Z_{B1} ,Z_{B2}) ZB=(ZB1,ZB2).
作者使用多个3×3 conv2d->BatchNorm2d->ReLuTo construct the affine coupling layer.Each affine coupling of jacobian matrix logarithm is:

Effective correction
Molecules must follow each atom valence constraint,But with the generated key tensor matrix assembly a molecular and atomic may cause chemical is invalid.So you need to define a validity checking for each atom:
B ∈ { 0 , 1 } c × n × n B \in \{0,1\}^{c \times n \times n} B∈{ 0,1}c×n×n是 C∈{1,2,3}化学键(单,双,三)On the hotkey tensor alone,ChRepresents a formal charge,The authors consider the role of the formal charge,It may be electrically charged atoms into additional keys.The author here only considerCh=1的,和,And make other atomsCh=0.
实验
The author through the following four aspects has carried on the experiment to evaluateMoFlow模型:
1)Molecular graph generation and reconstruction:First of all is to restore data set all molecules,Effective and generate as much as possible,Does not belong to the data sets of molecular graph;
2)Visual space potential:MoFlowWhether to embed the molecular graph with reasonable chemical similarity in continuous latent space?
3)性质优化:Model can generate molecules with characteristics of optimization figure;
4)约束属性优化:Model can be generated with optimal properties of novel molecular figure,At the same time as far as possible, keep chemical similarity.
数据集
作者在实验中使用了两个数据集QM9和ZINC250K.
评价指标
包括:
1)Validity:Chemical effective molecular percentage in all of the generated molecular;
2)Uniqueness:All generated molecules do not belong to the original data set and to meet the chemical availability ratio of the number of molecules and the molecule number;
3)Novelty:All generated molecules do not belong to the original data set and meet the number of molecules and the effectiveness of the chemical data set contains the ratio of the number of molecules;
4)Reconstruction:The original data set can completely be model reconstruction of the number of molecules of the ratio of the number of molecules with data set contains;
5)N.U.V.:A molecule that meet:Validity,Uniqueness,NoveltyThe number of molecules with the ratio of the number of molecules generated;
6)Validity without check/correction.
边栏推荐
- 这个终端连接工具,碾压Xshell
- Vue项目通过node连接MySQL数据库并实现增删改查操作
- When does MySQL use table locks and when does it use row locks?
- MySQL基础篇【命名规范】
- 不会吧,Log4j 漏洞还没有完全修复?
- 2020年度总结——品曾经,明得失,展未来
- 人工肌肉智能材料新突破
- From catching up to surpassing, domestic software shows its talents
- 阿里二面:Sentinel vs Hystrix 对比,如何选择?
- The first artificial intelligence safety competition officially launched
猜你喜欢

Pioneer in Distributed Systems - Leslie Lambert

Graphical relational database design ideas, this is too vivid

VR机器人教你如何正确打乒乓球

MySQL master-slave replication configuration construction, one step in place

《心智社会》—马文·明斯基

Electron之初出茅庐——搭建环境并运行第一个程序

Go 使用 freecache 缓存

Ali two sides: List several tips for Api interface optimization

便携小风扇PD取电芯片

C#的访问修饰符,声明修饰符,关键字有哪些?扫盲篇
随机推荐
便携小风扇PD取电芯片
bean的生命周期
Universal js time date format conversion
mysql高阶语句(一)
学生成绩管理系统(C语言)
Vue项目通过node连接MySQL数据库并实现增删改查操作
DNS domain name resolution services
goto语句
go : 使用gorm查询记录
The calculation of the determinant of the matrix and its source code
The first artificial intelligence safety competition officially launched
阿里二面:Sentinel vs Hystrix 对比,如何选择?
专访蚂蚁:这群技术排头兵,如何做好底层开发这件事?| 卓越技术团队访谈录
go : go-redis set操作
Oracle查看表空间使用率及爆满解决方案
interface
Keil软件中map文件解析
大飞机C919都用了哪些新材料?
Boot process and service control
MySQL off-topic [ORM thought analysis]