当前位置:网站首页>2022 ICML | Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
2022 ICML | Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
2022-06-13 04:30:00 【Dazed flounder】
The paper :https://arxiv.org/abs/2205.07249
Code :https://github.com/pengxingang/Pocket2Mol
Pocket2Mol : be based on 3D Efficient molecular sampling of protein pockets
This paper introduces by xingang peng Published in ICML Articles on :Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets. The authors propose a new sampling method that can satisfy multiple geometric constraints imposed by pockets :Pocket2Mol, This is a two module E(3)- Equivariant generative network , It can not only capture the space and bonding relationship between the binding pocket atoms , You can also rely on Markov chain Monte Carlo method (MCMC) In this case, the new candidate drugs are sampled from the easy to handle distribution under the condition of pocket representation . among , Pocket based drug design improvements are as follows : 1) A new depth geometric neural network is developed to accurately model the three-dimensional structure of the pocket ; 2) Design new sampling strategy , Realize more efficient conditional 3D coordinate sampling ; 3) The ability of a model to sample a pair of interatomic chemical bonds . Experimental results show that , from Pocket2Mol The molecules sampled in have significantly better binding affinity and other drug properties , For example, drug similarity and synthetic accessibility .
Introduce
The early approach was to integrate evaluation functions ( Such as the docking fraction between the sampling molecule and the pocket ) To modify the pocketless model , To guide candidate searches 1 ^1 1. Another type of model would 3D The transformation of pocket structures into molecules SMILES String or 2D Molecular diagram 2 ^2 2, Instead of simulating small molecular structures and 3D Interaction between pockets . A conditional generation model is developed to simulate the three-dimensional atomic density distribution in the three-dimensional pocket structure , then , The challenge of this problem shifts to the efficiency of the structure sampling algorithm in the learning distribution . Besides , Previous models overemphasize the importance of the three-dimensional position of atoms , It ignores the formation of chemical bonds , This leads to unrealistic connections between atoms in practice .
Related work
Molecular generation based on three-dimensional protein pockets
- An improved GAN Model to represent the molecules in the hidden space in the protein pocket , And use the caption network to decode these representations into SMILES character string . Or two structure descriptors are designed to encode the pocket , And use conditions RNN Generated SMILES.
- Another working idea began to consider pockets and small molecules 3D Molecular structure . A ligand neural network is proposed to generate 3D Molecular structure , Monte Carlo tree search is used to optimize the candidate molecules combined with specific pockets .
Equivariant network based on vector features
Usually based on gnn To realize the global rotation equivariance of three-dimensional objects . However , They require that the input and hidden features of each layer be equivariant , This is not consistent with the vector characteristics of the side chain angle of each amino acid .
Generation of atomic positions
- A common strategy is to predict the distance distribution between the new atom and all previous atoms , And sampling from node distribution .
- Another strategy is to establish a local spherical coordinate system , Predict location in local space , But the conversion between Euclidean space and spherical space is inefficient and not direct .
Method
Pocket2Mol The core idea of is to know the probability distribution of the atom or bond type at each position in the pocket according to the existing atoms .. To understand this context specific distribution , The authors used autoregressive strategy to predict the random mask part of training drugs .
The generation process
- Formally , The protein pocket is represented as a set with coordinates
2. The generated has n The molecular fragments of atoms are expressed as coordinates - The first i A heavy atom 、 Its coordination and valence bond with other atoms . The model is marked as φ, The generation process is defined as follows :
The production process consists of four main steps , As shown in Figure 1 .
(1) Leading edge predictors of the model F f r o F_{fro} Ffro The leading atom of the current molecular fragment will be predicted . Frontiers are defined as molecular atoms that can be covalently attached to new atoms . If all the atoms are not frontier , It means that the current molecule is complete , The generation process stops .
(2) secondly , The model samples an atom from the boundary set as the focus atom .
(3) then , Based on the focus atom , Model position predictor f p o s f_{pos} fpos Predict the relative positions of new atoms . Last , The atomic element predictor of the model f e l e f_{ele} fele And key type predictors f b o n d f_{bond} fbond The probabilities of element types and bond types with existing atoms will be predicted , Then the element types and valence bonds of the new atoms are sampled .
(4) such , The new atoms are successfully added to the current molecular fragment , The formation process continues until no frontier atoms are found .
Model structure
Based on the above generation process , The model needs to be composed of four modules : Encoder 、 Boundary predictor 、 Position predictor and Element and key predictors .、
E(3) - Hierarchical neural network
Having scalar and vector characteristics 3D The representation of vertices and edges in the graph can help to enhance the expressive ability of neural networks . Protein pockets P(pro) And molecular fragments G(mol)n All vertices and edges in are associated with scalar and vector features , To better capture 3D geometric information .
To the original GVP It's been modified , stay GVP A vector nonlinear activation is added to the output vector of , Write it down as Gper:
Encoder
Protein pockets and molecular fragments are expressed as k a near neighbor (KNN) chart , Where the vertex is an atom , Each atom is associated with k Close neighbors are connected . The input vector vertex features include atomic coordinates , Vector edge feature is the unit direction vector of the edge in three-dimensional space .
First , Use multiple embedding layers to embed vertices ( v i ( 0 ) , v e c v i ( 0 ) ) (v^{(0)}_i, vec{v}^{(0)}_i) (vi(0),vecvi(0)) And edge ( e i j ( 0 ) , e ⃗ i j ( 0 ) ) (e^{(0)}_{ij},\vec{e}^{(0)}_{ij}) (eij(0),eij(0)) features . And then L Messaging modules M l ( L = 1 , … , L ) M_l(L = 1,…,L) Ml(L=1,…,L) And update module U l ( L = 1 , … , L ) U_l(L = 1,…,L) Ul(L=1,…,L) Staggered connection , Learn local structure representation
The message passing module is in the form of :
The calculation method of vector message is to multiply the vector features of vertices and edges by the scalar features and then sum them , Between vertices and edges 、 Information exchange between scalar features and vector features . The update module is in the form of :
forecast
Boundary prediction : Take the geometric vector MLP (GV-MLP) Define as a GVP Block followed by a GVL block , be called G m l p G_{mlp} Gmlp. The leading edge predictor uses atoms i Is characterized by input , Using a GV-MLP Layer to predict the front p f r o p_fro pfro Probability , As shown below :
Location predictors :
The position predictor takes the focus atom i i i Is characterized by input , Predict the relative positions of new atoms . Because vector features are equivariant in the model , They can directly generate atomic coordinates pointing to the focal point r i r_i ri The relative coordinates of Δ r i \Delta r_i Δri. The output of the position predictor is modeled as a Gaussian mixture model with diagonal covariance p ( Δ r i ) = ∑ k = 1 K π i ( i ) N ( u i ( k ) , Σ i ( k ) ) p(\Delta r_i) =\sum^{K}_{k=1}\pi^{(i)}_iN(u^{(k)}_i,\Sigma^{(k)}_i) p(Δri)=∑k=1Kπi(i)N(ui(k),Σi(k)) among , The prediction of parameters by multiple neural networks is as follows :
Element and key predictors : In predicting new atoms i i i After the location of the , Elements - The bond predictor will predict new atoms i i i And valence bonds between all atoms in existing molecular fragments q ( ∀ q ∈ V ( m o l ) ) q(\forall q \in V^{(mol)}) q(∀q∈V(mol)). chart 2 The structure of predictive neural network is shown .
First , We collect... In all the atoms k k k Nearest neighbor atom j ∈ K N N ( i ) j \in KNN(i) j∈KNN(i), Then use a messaging module , Integrate local information from neighbor atoms into new atoms i i i Location , As its expression ( v i , v ⃗ i ) (v^i,\vec{v}^i) (vi,vi), On this basis, the atom i i i Element type of .
In the parallel path , atom i i i and q q q The edges between are expressed as ( z i q , z ⃗ i q ) (z_{iq},\vec{z}_{iq}) (ziq,ziq), It's atoms i i i Characteristics of 、 atom q q q Features and edges of e i q e_{iq} eiq The processing of feature stitching , And then a GV-MLP block , namely :
among ( e i q ′ , e ⃗ i q ) (e'_{iq}, \vec{e}_{iq}) (eiq′,eiq) Insert the processed input edge feature and a GV-MLP block .
For vector features , A new attention module is proposed , The definition is as follows
Training
In the training phase , Random screening of atoms in molecules , The training model recovers the masked atoms . say concretely , For each pair of pocket ligands , From uniformly distributed U[0,1] A mask ratio is sampled from the number of molecular atoms corresponding to the mask . The remaining molecular atoms that have valence bonds with the masked atoms are defined as boundaries . then , Position predictors and elements - The key predictor attempts to predict the position corresponding to the boundary 、 Element type and bond with the remaining molecular atoms to recover the masked atom with valence bond with the boundary . If all the molecules and atoms are masked , The boundary is defined as 4a Protein atoms with masking atoms inside , The masked atoms around the boundary will be recovered . For element type prediction , We added a representation to the query location Nothing Element type of . In the process of training , We not only sample the positions of mask atoms used for element type prediction , Negative positions from the environment space are also sampled , And assign their labels to Nothing.
The loss of Frontier prediction L f r o L_{fro} Lfro Is the binary cross entropy loss at the prediction frontier . Loss of position predictor L p o s L_{pos} Lpos Is the negative log likelihood of the position of the masked atom . Prediction of element types and key types , We use cross entropy loss for classification , Expressed as L e l e L_{ele} Lele and L b o n d L_{bond} Lbond.
The overall loss function is the sum of the above four loss functions
use Adam The optimizer optimizes both the encoder and all three predictors .
result
Pocket2Mol, This is a graph neural network E(3) Equivariant generative network , Chemical and geometric features for modeling 3D protein pockets , A new efficient algorithm is used to sample the new 3D Candidate drugs . Experiments show that ,Pocket2Mol The resulting molecules not only have better affinity and chemical properties , And it has a more real and accurate structure .
Reference resources
- Structure-based de novo drug design using 3d deep generative models
- From target to drug: Generative modeling for the multimodal structure-based ligand design. Molecular Pharmaceutics
- novo molecule design through the molecular generative model conditioned by 3d information of protein binding sites.
边栏推荐
- Solve the problem of running server nodemon reporting errors
- Alipay native components (hotel time selection)
- VGA display based on de2-115 platform
- Principle, composition and functions of sensors of Dajiang UAV flight control system
- Webpack system learning (VIII) how contenthash can prevent browsers from using cache files
- Et framework -22 creating serverinfo entities and events
- The could not find com scwang. smart:refresh-layout-kernel:2.0.3. Required by: project: the app cannot load the third-party package
- ET框架-22 创建ServerInfo实体及事件
- 【Flutter 問題系列第 67 篇】在 Flutter 中使用 Get 插件在 Dialog 彈窗中不能二次跳轉路由問題的解决方案
- SQL advanced challenge (1 - 5)
猜你喜欢
The most detailed swing transformer mask of window attachment in history -- Shaoshuai
Analysis of the implementation principle of an open source markdown to rich text editor
dumi 搭建文档型博客
Knife4j aggregation 2.0.9 supports automatic refresh of routing documents
ET框架-22 创建ServerInfo实体及事件
The could not find com scwang. smart:refresh-layout-kernel:2.0.3. Required by: project: the app cannot load the third-party package
CTFSHOW SQL注入篇(231-253)
Small program imitating Taobao Jiugong grid sliding effect
Solution to failure to download files by wechat scanning QR code
[flutter problem Series Chapter 67] the Solution to the problem of Routing cannot be jumped again in in dialog popup Using get plug - in in flutter
随机推荐
Dumi construit un blog documentaire
Sword finger offer 11 Minimum number of rotation array - binary lookup
120. 三角形最小路径和-动态规划
Catalan number
String full summary
EMC rectification outline
1.4.2 Capital Market Theroy
Introduction to RFM analysis
Collection of wrong questions in soft test -- morning questions in the first half of 2011
剑指 Offer 56 - I. 数组中数字出现的次数
Li Kou brush question 338 Bit count
Summary of webdriver API for web automated testing
10 minutes to thoroughly understand how to configure sub domain names to deploy multiple projects
PAT 1054 The Dominant Color
剑指 Offer 11. 旋转数组的最小数字-二分查找
MCU: EEPROM multi byte read / write operation sequence
Online audio adjustment technology summary
【Flutter 问题系列第 67 篇】在 Flutter 中使用 Get 插件在 Dialog 弹窗中不能二次跳转路由问题的解决方案
出现Could not find com.scwang.smart:refresh-layout-kernel:2.0.3.Required by: project :app 无法加载第三方包情况
Sword finger offer 56 - I. number of occurrences in the array