当前位置:网站首页>2020 Bioinformatics | GraphDTA: predicting drug target binding affinity with graph neural networks
2020 Bioinformatics | GraphDTA: predicting drug target binding affinity with graph neural networks
2022-07-06 22:02:00 【Stunned flounder (】
2020 Bioinformatics | GraphDTA: predicting drug target binding affinity with graph neural networks
Paper: https://academic.oup.com/bioinformatics/article/37/8/1140/5942970?login=false
Code:https://github.com/thinng/GraphDTA
Abstract
High development cost of new drugs 、 Time consuming , And often accompanied by security issues . Drug reuse can avoid expensive and lengthy drug development processes by finding new uses for approved drugs . In order to effectively reuse drugs , It is useful to know which proteins are targeted by which drugs . Estimate new drugs - The calculation model of target pair interaction intensity may speed up drug reuse . Several models have been proposed for this task . However , These models represent drugs as strings , This is not the natural way to express molecules . We put forward a proposal called GraphDTA It represents drugs as graphs , Graphical neural network is used to predict the affinity between drugs and targets . We show that , Figure neural network not only predicts drugs better than non deep learning model - Target affinity , And it is better than the competitive deep learning method . Our results confirm , The deep learning model is applicable to drugs - Prediction of target binding affinity , And representing drugs as graphs can lead to further improvements .
Introduce
medicine - Target affinity (DTA) There are several methods of prediction and calculation :
- molecular docking , It predicts drugs by scoring function - Stability of the target complex 3D structure .
- Using collaborative filtering . for example ,SimBoost The model uses affinity similarity between drugs and targets to construct new features .
- Use neural networks trained on one-dimensional representations of drug and protein sequences . for example ,DeepDTA The model uses one-dimensional representation and one-dimensional convolution ( With pooling ) To capture prediction patterns in the data
Drug characterization
SMILES It can be done by rdkit Open source software generation graph In the form of , Then, the drug eigenvector is obtained by graph convolution network representation learning . Each node is a multidimensional 01 Eigenvector , Expressed five messages : Atomic symbols 、 Number of adjacent atoms 、 Number of adjacent hydrogen atoms 、 The implied value of the atom 、 Whether the atom is in the aromatic structure .
Protein characterization
Because it is difficult to represent the structure of protein diagram , Protein results are characterized by one-hot Coding means . The gene name of the target is from UniProt Get the protein sequence from the database . The sequence is a string representing amino acids ASCII character . Each amino acid type is encoded with an integer according to its associated alphabetic symbol [ for example , Alanine (A) by 1, Cystine by 3, Aspartic acid (D) by 4, And so on ], So that the protein can be expressed as an integer sequence .
Molecular graph model structure
The author proposes a new graph based neural network and traditional CNN Of DTA prediction model . As shown in the figure below . First, classify and code the protein sequence , Then add the embedded layer to the sequence , Each of them ( code ) The characters are 128 The dimension vector represents . Next , Use three 1D Convolution layer learns different levels of abstract features from input . Last , The expression vector of the input protein sequence is obtained by using the maximum pooling layer . This method is similar to the existing baseline model . For drugs , We used molecular graphs and tested four graph neural network variants , Include GCN ( Kipf and Welling, 2017 )、GAT ( Veličković et al., 2018 ))、GIN ( Xu et al., 2019 ) And combined GAT-GCN framework .
Experiments and results
Researchers mainly compare the non deep learning model with the more popular deep learning model , The consistency index is calculated by measurement CI( Indicates the consistency between predicted and actual values ) And mean square error MSE These two indicators represent the quality of the model . In order to make the experimental results more comparative , Respectively in Davis And Kiba Data sets measure the model .
Davis Data set model measurement results
The measurement results in both data sets are based on GAT-GCN The combined graph representation model has the best prediction performance .
Conclusion
In this work , Researchers have come up with a computational drug - A new method of target binding affinity , be called GraphDTA; To make drug development less difficult , Reduce the time and cost of finding new drug target interactions , Shorten the drug development cycle . The model is used by SMILES Two dimensional graph structure data from data reconstruction , It can express more complete information of drugs , So this method can get better prediction performance .
Reference resources
边栏推荐
- GPS从入门到放弃(十三)、接收机自主完好性监测(RAIM)
- Powerful domestic API management tool
- HDU 4912 paths on the tree (lca+)
- The underlying implementation of string
- Codeforces Round #274 (Div. 2) –A Expression
- GPS从入门到放弃(十九)、精密星历(sp3格式)
- MySQL - transaction details
- Some problems about the use of char[] array assignment through scanf..
- guava: Multiset的使用
- GNN,请你的网络层数再深一点~
猜你喜欢
Numpy download and installation
numpy 下载安装
Leetcode topic [array] -118 Yang Hui triangle
[asp.net core] set the format of Web API response data -- formatfilter feature
Persistence / caching of RDD in spark
Unity3d Learning Notes 6 - GPU instantiation (1)
Broadcast variables and accumulators in spark
guava:Collections. The collection created by unmodifiablexxx is not immutable
Adjustable DC power supply based on LM317
抖音將推獨立種草App“可頌”,字節忘不掉小紅書?
随机推荐
MongoDB(三)——CRUD
AI enterprise multi cloud storage architecture practice | Shenzhen potential technology sharing
Five wars of Chinese Baijiu
The role of applicationmaster in spark on Yan's cluster mode
Intelligent online customer service system source code Gofly development log - 2 Develop command line applications
GPS从入门到放弃(十二)、 多普勒定速
mysql根据两个字段去重
The underlying implementation of string
[daily] win10 system setting computer never sleeps
[Chongqing Guangdong education] Tianjin urban construction university concrete structure design principle a reference
GPS from getting started to giving up (XX), antenna offset
关于程序员的职业操守,从《匠艺整洁之道》谈起
Persistence / caching of RDD in spark
Michael smashed the minority milk sign
GPS从入门到放弃(十一)、差分GPS
Codeforces Round #274 (Div. 2) –A Expression
小满网络模型&http1-http2 &浏览器缓存
Unity3D学习笔记6——GPU实例化(1)
GPS from entry to abandonment (XVII), tropospheric delay
Earned value management EVM detailed explanation and application, example explanation