当前位置:网站首页>AIChE | ab initio drug design framework integrating mathematical programming method and deep learning model

AIChE | ab initio drug design framework integrating mathematical programming method and deep learning model

2022-06-10 17:09:00 DrugAI

This paper will introduce the recent work of Dujian Professor team in Dalian University of technology AIChE Papers published in journals “De novo drug design framework based on mathematical programming method and deep learning model”, His team developed a deep learning model to identify targets with high binding affinity - Ligand complexes , And integrate it with the mathematical programming model , The goal of high-throughput reverse optimization design of small molecule drugs is realized , Further, take the design of rivaroxaban as an alternative to cardiovascular disease and the design of acitinib as an alternative to tumor disease , It proves the generality and effectiveness of the de novo drug design framework proposed in this paper .

1

introduction

Usually , De novo drug design methods include genetic algorithm and depth generation method . However , Both methods are easy to fall into local optimal solution . Mathematical programming is a mathematical optimization method commonly used in the field of systems engineering , It can also be used in molecular design problems . The method is established by the objective function 、 Mixed integer nonlinear programming with molecular structure constraints and molecular property constraints (MINLP) Model and solve it effectively , High flux reverse optimization can be realized to design the molecular structure with the optimal target property . To be specific ,MINLP Models can be built by using explicit mathematical formulas ( Such as octet rule 、 Valence key rules, etc ) And gradient based deterministic optimization algorithm ( Such as BARON Algorithm ) To optimize the combination group , The model does not need to traverse and evaluate all group combinations , All feasible solutions satisfying structural and property constraints can be obtained quickly and the optimal solution with maximum or minimum objective properties can be determined . However , When MINLP When the nonlinear equation in the model is too complex ( That is, when the non convexity of the model is strong ), Solve directly MINLP Modeling is very difficult . To solve this problem , Some scholars have proposed a decomposition algorithm to solve strongly nonconvex problems MINLP Model . Theoretically , When MINLP When the model property constraints are independent of each other , Through the use of decomposition solution algorithm to solve smoothly MINLP Model , Can be found in MINLP Find the global optimal solution within the created chemical space .

Although the mathematical programming method has achieved great success in the design of small molecular solvents , However, there are still two challenges in its application to candidate drug design . First , The structure of the drug ( Especially the ring structure ) Larger and more complex than small molecular solvents , therefore , More ring groups are needed to establish MINLP Models for candidate drug design , This will increase MINLP The scale and difficulty of the model . Even if MINLP The model was successfully solved , The second challenge is tradition MINLP The model is easy to produce some structurally feasible but abnormal ring structures . for example , If MINLP The model selects the cyclic group “aC-C#CH”, The design is similar to “CC(C)c1c(C#C)c(C#C)c(C#C)c(C#C)c1C#C” The molecules of .

To address these two challenges , This article USES the Bemis–Murcko Algorithm batch extraction DrugBank The skeleton structure of drug molecules in the database , Get... Together 2,898 A drug skeleton , And introduce the skeleton into the traditional MINLP Model , To ensure the rationality of the designed candidate drug structure . However , If you use all 2,898 A drug skeleton to design candidate drugs ,MINLP The problem scale of the model is still large . Considering that candidate drugs with similar skeletons may have similar properties , So we're building MINLP Before the model , In this paper, a skeleton based similarity algorithm is used to identify skeleton subsets that are similar to the target drug skeleton , So it's a lot less MINLP The problem scale of the model .

Besides , A deep learning model for predicting binding affinity is also established as MINLP The objective function of the model , send MINLP The model has the ability to design small molecule candidate drugs with optimal binding affinity .

2

Method

In this paper, we first construct a deep learning classification model that can predict the probability of high binding affinity , The input to the model is the ligand SMILES The amino acid sequence of the text and the target , The output of the model is high / The probability of low binding affinity , Pictured 1 Shown .

chart 1 Deep learning model structure

In this paper, ligands SMILES Turn into Mol2vec Descriptors are used to represent the structural characteristics of ligand molecules , The ligand feature matrix is sent to the attention layer based on gate enhancement for feature extraction . Empathy , The target amino acid sequence is transformed into a higher-order amino acid sequence to represent the structural characteristics of the target , The target characteristic matrix is sent to convolution neural network to identify the key amino acid sequences , Further, the reduced dimension amino acid sequence is sent to the attention layer based on gate enhancement for feature extraction . And then , Splice the ligand with the target matrix , Send to the full connection layer to predict the height / Probability of low binding affinity .

And then , This paper presents an optimization based de novo drug design framework , The framework integrates a deep learning model for predicting binding affinity with MINLP Model , Pictured 2 Shown .

chart 2 An optimization based de novo drug design framework

The framework construction steps include :

(a) Establish drug database .

(b) By using RDKit Medium Bemis–Murcko Algorithm from DrugBank Extracting drug skeleton from drug database . In establishment MINLP Before the model , For the skeleton structure of the target drug , A skeleton based similarity algorithm is used to search the skeleton subset similar to the target drug skeleton from the skeleton database G1, At the same time, select a group of commonly used group subsets G2,G1 and G2 As MINLP Model input .

(c) Set up by the objective function 、 Drug structure constraints 、 Composed of drug properties MINLP Model .MINLP The details of the model are as follows :

Objective function :

constraint condition :

  1. Deep learning constraints : General equation (1) Indicates that it is used to identify targets with high binding affinity - A deep learning model for ligand complexes .
  1. Drug structure constraints : General equation (2) Represents the octet rule m1、 Valence bond rule m2 And chemical complexity m3 Structural constraints of , Through the combination of skeleton and group, the molecules with reasonable structure can be generated .
  1. Drug property constraints : General equation (3) And (4) Express “ Ribinsky's five rules ” nature ( Relative molecular mass MW、 Number of hydrogen bonded receptors HBA、 Number of hydrogen bond donors HBD、 Octanol - Water partition coefficient logP、 Number of rotatable angles ROT(ROTfrag)), And the composite feasibility score SA And composite complexity scores SC.
  1. Other constraints : General equation (5) Indicates an improvement based on SMILES Isomer generation algorithm of , The algorithm is used to transform the skeleton of candidate drugs - The group vector is automatically converted to the corresponding drug SMILES character string .

(d) Use the decomposition solution algorithm to solve MINLP Model . If there is no best candidate drug that meets all constraints , Then return to (c) Relax the constraint range and solve again MINLP Model .

(e)MINLP The optimal solution of the model is further verified by molecular docking and molecular dynamics simulation .

3

result

Training set / Verification set / The loss function of the test set (CEL) And supervision function (AUC) Respectively 0.369/0.442/0.401 and 0.914/0.880/0.901.

The training process and classification performance of the deep learning model are shown in the figure 3 Shown . chart 3(a) Sum graph 3(b) The training set and verification set of the deep learning model are recorded respectively CEL and AUC along with Epochs The change of . chart 3(c) Represents the confusion matrix based on binary classification , It is used to evaluate the classification performance of the deep learning model , The number of true negative and true positive samples in the test set in the figure is much larger than the number of false negative and false positive samples , It shows that the deep learning model can better distinguish high / Low binding affinity . chart 3(d) The test set is given ROC curve ( Solid line ), Test set AUC=0.901 It shows that the deep learning model has good classification performance .

chart 3 Deep learning model training process and classification performance

further , By integrating the deep learning model with MINLP Model , The goal of de novo drug design has been achieved . Take the design of rivaroxaban as an example . First , Using a skeleton based similarity algorithm , From include 2,898 Search for a subset of drug skeletons similar to rivaroxaban skeletons in the skeleton database of drug skeletons (G1), Results a total of 14 Skeleton . And then , this 14 A skeleton is matched with a group of subgroups (G2) Input MINLP Model , And set MINLP The objective function of the model 、 Upper and lower bounds of structural constraints and upper and lower bounds of property constraints . then , Use the decomposition solution strategy to solve MINLP Model to solve . In sub problem 1 in , Through mathematical programming method , Under the constraints of structure and linear properties , High throughput reverse design yields N1=17,659 There is a feasible solution ( By the skeleton - The group vector represents ), This process is time-consuming on the desktop 116 second (Intel(R) Core (TM)i7-10700F CPU @ 2.90GHz 24.0 GB RAM).

In sub problem 2~3 in , Use improved based on SMILES Isomer generation algorithm of ( Other constraints ), be based on N1 Skeleton - Group vector generation N2=159,170 Candidate drugs SMILES character string (2,161 second ), And use the nonlinear property to constrain , Eliminate candidate drugs that do not meet the property constraints , The remaining N3=42,932 Candidate drugs SMILES For further analysis . First , This article will be designed to get 42,932 Compounds SMILES stay PubChem Search in the database , Found to have 2,261 individual (5.25%) The designed structure exists in PubChem in , This shows that based on MINLP Our drug design model can not only find the existing candidate drugs , It can also design new candidate drugs (94.75%). secondly , Use ECFP Fingerprint and principal component analysis (PCA) establish 42,932 Chemical spatial map of design candidate drugs , Pictured 4 Shown . The integer in the legend on the right (0~7) representative 8 Skeleton , spot “4” For rivaroxaban . chart 4 It shows that the designed candidate drugs are widely distributed in the chemical space , Show based on MINLP The new drug design model has great potential in designing structurally diverse candidate drugs similar to rivaroxaban .

chart 4 MINLP Principal component analysis cluster diagram of model design results

In sub problem 4 in , Use the deep learning model to predict the results of the design 42,932 The probability of high binding affinity for a candidate drug , And in descending order . The ranking results show that , There are four designed drug candidates SMILES It is superior to rivaroxaban in the probability of high binding affinity (98.76%). With existing drugs ( Rivashaban :SC=4.7152) comparison , The best designed candidate drugs not only have high binding affinity , And it has low synthesis complexity (SC=3.1661). Last , Choose to have 98.78% The best design candidate drugs with high binding affinity probability are used for molecular docking ( Desktop computer (Intel Core i5-10500 CPU @ 3.10 GHz) Up for 414 second ) Molecular dynamics and simulation (Advanced Supercomputing Center (AMD EPYC 7502 CPU @ 2.5GHz 64cores) Last appointment 8 Hours ) To verify the reliability and MINLP Effectiveness of the model in candidate drug design .

In molecular docking , The target is Xa factor (PDB entry 2w26), The ligand is MINLP The optimal design result of the model , The binding energy obtained by docking is ∆Gbind=-42.39 kJ/mol. adopt ∆Gbind=RTlnKi/d The formula calculates the binding affinity Ki/d=0.037 μmol/L(<1 μmol/L Represents high binding affinity ), This indicates that the designed candidate drugs are similar to Xa Factors have high binding affinity . Besides , In this paper, the lowest probability of high binding affinity was evaluated by molecular docking method ( The probability of high binding affinity is 25.82%) Candidate drugs , obtain ∆Gbind=-29.97 kJ/mol and Ki/d=5.61 μmol/L(≥1μmol/L Indicates low binding affinity ). This result shows that the deep learning model in this paper can distinguish targets reliably - Whether the ligand complexes have high binding affinity or low binding affinity . The results of molecular dynamics simulation are shown in Figure 5 Sum graph 6 Shown .

chart 5 Molecular dynamics simulation results

chart 6 Target - Ligand binding mode

4

summary

Sum up , This paper presents an optimization based de novo drug design framework , The framework first develops a deep learning model to predict the target - The ligand complex has a high / Low binding affinity probability , It is integrated into the mathematical programming model , The high-throughput reverse optimization design with optimal binding affinity is realized and “ Ribinsky's five rules ”、 New candidate drugs required by the properties of synthetic feasibility score and synthetic complexity score , Finally, the optimal design results were verified by molecular docking and molecular dynamics .

Reference material

Yujing Zhao, Qilei Liu*, Xinyuan Wu, Lei Zhang, Jian Du*, Qingwei Meng. De novo drug design framework based on mathematical programming method and deep learning model. AIChE Journal. 2022, e17748. https://doi.org/10.1002/aic.17748

原网站

版权声明
本文为[DrugAI]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/161/202206101602368844.html