当前位置:网站首页>2020 Bioinformatics | GraphDTA: predicting drug target binding affinity with graph neural networks

2020 Bioinformatics | GraphDTA: predicting drug target binding affinity with graph neural networks

2022-07-06 22:02:00 Stunned flounder (

2020 Bioinformatics | GraphDTA: predicting drug target binding affinity with graph neural networks


Paper: https://academic.oup.com/bioinformatics/article/37/8/1140/5942970?login=false
Code:https://github.com/thinng/GraphDTA

Abstract

High development cost of new drugs 、 Time consuming , And often accompanied by security issues . Drug reuse can avoid expensive and lengthy drug development processes by finding new uses for approved drugs . In order to effectively reuse drugs , It is useful to know which proteins are targeted by which drugs . Estimate new drugs - The calculation model of target pair interaction intensity may speed up drug reuse . Several models have been proposed for this task . However , These models represent drugs as strings , This is not the natural way to express molecules . We put forward a proposal called GraphDTA It represents drugs as graphs , Graphical neural network is used to predict the affinity between drugs and targets . We show that , Figure neural network not only predicts drugs better than non deep learning model - Target affinity , And it is better than the competitive deep learning method . Our results confirm , The deep learning model is applicable to drugs - Prediction of target binding affinity , And representing drugs as graphs can lead to further improvements .

Introduce

medicine - Target affinity (DTA) There are several methods of prediction and calculation :

  1. molecular docking , It predicts drugs by scoring function - Stability of the target complex 3D structure .
  2. Using collaborative filtering . for example ,SimBoost The model uses affinity similarity between drugs and targets to construct new features .
  3. Use neural networks trained on one-dimensional representations of drug and protein sequences . for example ,DeepDTA The model uses one-dimensional representation and one-dimensional convolution ( With pooling ) To capture prediction patterns in the data

Drug characterization
SMILES It can be done by rdkit Open source software generation graph In the form of , Then, the drug eigenvector is obtained by graph convolution network representation learning . Each node is a multidimensional 01 Eigenvector , Expressed five messages : Atomic symbols 、 Number of adjacent atoms 、 Number of adjacent hydrogen atoms 、 The implied value of the atom 、 Whether the atom is in the aromatic structure .

Protein characterization
Because it is difficult to represent the structure of protein diagram , Protein results are characterized by one-hot Coding means . The gene name of the target is from UniProt Get the protein sequence from the database . The sequence is a string representing amino acids ASCII character . Each amino acid type is encoded with an integer according to its associated alphabetic symbol [ for example , Alanine (A) by 1, Cystine by 3, Aspartic acid (D) by 4, And so on ], So that the protein can be expressed as an integer sequence .

Molecular graph model structure

The author proposes a new graph based neural network and traditional CNN Of DTA prediction model . As shown in the figure below . First, classify and code the protein sequence , Then add the embedded layer to the sequence , Each of them ( code ) The characters are 128 The dimension vector represents . Next , Use three 1D Convolution layer learns different levels of abstract features from input . Last , The expression vector of the input protein sequence is obtained by using the maximum pooling layer . This method is similar to the existing baseline model . For drugs , We used molecular graphs and tested four graph neural network variants , Include GCN ( Kipf and Welling, 2017 )、GAT ( Veličković et al., 2018 ))、GIN ( Xu et al., 2019 ) And combined GAT-GCN framework .

Experiments and results

Researchers mainly compare the non deep learning model with the more popular deep learning model , The consistency index is calculated by measurement CI( Indicates the consistency between predicted and actual values ) And mean square error MSE These two indicators represent the quality of the model . In order to make the experimental results more comparative , Respectively in Davis And Kiba Data sets measure the model .

Davis Data set model measurement results

The measurement results in both data sets are based on GAT-GCN The combined graph representation model has the best prediction performance .

Conclusion

In this work , Researchers have come up with a computational drug - A new method of target binding affinity , be called GraphDTA; To make drug development less difficult , Reduce the time and cost of finding new drug target interactions , Shorten the drug development cycle . The model is used by SMILES Two dimensional graph structure data from data reconstruction , It can express more complete information of drugs , So this method can get better prediction performance .

Reference resources

原网站

版权声明
本文为[Stunned flounder (]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207061350249171.html