当前位置:网站首页>Biological network analysis using deep learning
Biological network analysis using deep learning
2022-06-30 12:07:00 【Zhiyuan community】
Today, we bring you a report published in 《Briefings in Bioinformatics》 The article 《Biological network analysis with deep learning》, It is a review article .

Abstract
This article describes biological networks , The principle and basic algorithm of neural network are reviewed . Then it discusses the application of graphic neural network in bioinformatics , Such as protein structure prediction and electronic drug discovery and development . Last , This article emphasizes the application fields of gene regulatory network and disease diagnosis , In these areas , Deep learning is emerging as a new tool , To solve classic problems .
1. background
One of the advantages of deep learning is that it can detect complex patterns in data , This makes it very suitable for the application of bioinformatics , In bioinformatics , Data represent complex relationships between biological entities and processes 、 The interdependent relationship , These entities and processes usually have inherent noise , And it happens on multiple scales . also , The deep learning method has been extended to graph structure data , This makes it a promising technology to solve these biological network analysis problems . This article first introduces the biological network , Then it describes the typical learning tasks on the biological network , Finally, we discuss GNNs The most popular application task in bioinformatics .
2. Biological networks
DNA、RNA、 Proteins and metabolites play a crucial role in the molecular mechanisms of cellular processes . The structure and interaction of these entities can be represented by a graph , The graph consists of a set of nodes and a set of edges representing the connections between nodes . for example , A molecule can be represented as a graph , Where nodes are atoms , Edges are bonds between atoms . Similarly , Many biological processes can be modeled by the interactions or relationships between entities as nodes and between them as edges . Network provides a simple and intuitive representation for heterogeneous and complex biological processes . Besides , It uses graph theory 、 Machine learning and deep learning techniques to facilitate modeling and understanding of complex molecular mechanisms .
As mentioned earlier , We can define biological networks at different levels . In addition to the graphical representation of biological factors used to study molecular properties and functions , Other common biological networks include proteins - Protein interactions (PPI) The Internet 、 Gene regulatory networks (GRN)、 Metabolic networks and drugs - Drug interactions (DDI) The Internet . These networks will be briefly introduced next .
Protein-Protein Interaction Networks PPI Networks represent interactions between proteins .PPIs It is essential for almost all cell functions , From the assembly of cellular structural components , To transcribe 、 Translation and active transfer . stay PPI In the network , Nodes correspond to proteins , And edges define the interactions between connecting proteins .
Gene Regulatory Networks GRN It represents a complex mechanism for regulating gene expression . The regulatory mechanism takes place in DNA Different stages of protein production , Such as transcription 、 Translation and splicing stage . The intuitive explanation for these complex and interrelated mechanisms is , Protein is the product of gene expression , It is also the controller of gene expression . stay GRNs in , Each node represents a gene , The direct connection between two genes means that one gene directly regulates the expression of the other gene , Without other genes .
Metabolic Networks Metabolic networks use graphs to represent metabolism , Metabolism is a collection of all the chemical reactions that take place in an organism to maintain life . Given its complexity , Metabolic networks are usually broken down into metabolic pathways , That is, a series of chemical reactions related to the performance of specific metabolic functions . The metabolic network maps each metabolite to a node , Map each reaction to a directed edge with the enzyme as the catalyst .
Drug–Drug Interaction Networks DDI The goal of the network is to simulate the interaction between different drugs .DDI The network represents drugs as nodes , The drug interactions are represented by edges . Unlike previous networks ,DDI Networks do not represent biological processes . However , Because it is a meaningful representation of the knowledge of drug interactions , This leads researchers to have a deep understanding of DDI The Internet is becoming more and more interested . in fact ,DDI Network is widely used in multi drug research .
3. Learning tasks on graphs
The learning task of graph is divided into node classification at a higher level 、 Link prediction 、 Graph classification and graph embedding , Next, we will explain each task in detail .
Node Classification Node classification is a typical task in biological network analysis , It predicts the unknown function of proteins according to the function of neighbors in protein-protein interaction network . The input graph contains some labeled nodes , But many nodes have no labels , The goal is to classify the remaining unlabeled nodes in the network . This is usually solved by semi supervised learning , The algorithm uses the whole network as input in the training process , The goal is to classify all nodes . Although all nodes will be classified , But during training , The loss is calculated only on nodes with real labels , To classify the remaining unlabeled nodes .
Link Prediction Current knowledge of interactions in biological networks is often incomplete , For example, which genes regulate the expression of another gene in the gene regulatory network . Predicting these missing edges is link prediction . This is a semi supervised learning problem , Use the known links in the graph to predict other links that may exist .
Graph Classification or Regression When biological network data is composed of multiple individual networks , For example, molecular 3D Structure data set , The goal becomes to predict the properties of each network , Such as molecular solubility or toxicity . This task is called graph classification , It takes the drawing data set as input , Then classify each individual graphic ( Or return ). This is the most common supervised learning problem .
Graph Embedding Graph embedding is a method to find the low dimension of graph , A method of representing a graph by a vector of fixed size , for example PPI The Internet , Or elements in the network , For example, protein . This is usually achieved through unsupervised learning . A given node or graph is represented as a vector of fixed size so that the graph can use any ready-made machine learning algorithm . Before using standard machine learning algorithms for specific tasks , Learning graphic embedding is often used as a preprocessing step .
4.Applications in biology
Protein structure prediction Predict the genetic sequence of proteins 3D structure , Also known as the protein folding problem .AlphaFold Represents a breakthrough approach , It sets a new baseline for deep learning and traditional methods . Like other methods ,AlphaFold Start with the amino acid sequence , As a prediction 3D The foundation of the structure . This input is combined with other characteristic information collected from the protein database , And use CNN To predict the discrete probability distribution of the distance between all amino acid pairs , And the probability distribution of torsion angle . Compared with the previous method which only predicted whether two residues were linked by a link , The predicted distance and its corresponding distribution produce more accurate results . It uses distance and twist angle , And the penalty for predicting atomic overlap , To assess the quality of their predictions , Called potential . Then perform random gradient descent , Iteratively improve their models . Using this method yields very good results , And let people deeply understand the potential of deep learning in solving some challenging bioinformatics problems .
Disease diagnosis In the past few years , The use of deep learning for disease diagnosis has aroused great interest in the research community . However , Few people use biological networks . Some researchers proposed to combine spectral clustering with the central nervous system , Integrated gene expression data PPIs Network to predict lung cancer . It attempts different configurations of the proposed method , To determine the best way to perform , And from the accuracy 、 Methods to evaluate them in terms of accuracy and recall .
Besides ,Rhee Et al. Proposed another example of deep learning on biological networks , To classify the subtypes of breast cancer . Their approach incorporates a GCN And a network , And absorbed a rich gene expression data PPIs The Internet . utilize GCN, Their method can learn local graph information , and RN The use of allows you to capture complex patterns between node sets . take GCN Output and RN The output is combined to obtain the classification results . This method is combined with support vector machine 、 Random forests 、k a near neighbor 、 Polynomial and Gaussian naive Bayes are compared , The performance is obtained by Monte Carlo cross validation experiment . It turns out that , The proposed method is superior to the baseline in all the indicators used , Show pass GCN Study PPIs Network feature representation can significantly help capture patterns in gene expression data .
The next two examples use RNA-disease And genes - Application of disease association network . Some scholars have proposed a method , The inputs are disease and RNAs Graph associated with , be called RNAs- Disease networks . Author use GCN Combining the graph attention network to capture the input global and local structure information , The purpose is to predict RNA- Disease correlation . In order to achieve this goal , The author puts forward two GCNs And a matrix factorization . disease 、 Gene features and similarity graphs are assigned to two parallel gene control networks , By inner product, the obtained embedments are combined for prediction .
5. summary
Although the deep learning method is promising , But there are also limitations and many problems to be solved . One of the main problems of deep learning is the lack of explicability , Because the deep learning algorithm has the nature of black box , If there is not enough understanding of the prediction process , Doctors and patients are often less likely to believe the output of deep learning models . Another problem is the need for large tag datasets , Because the deep neural network has a large number of super parameters that need to be adjusted .
Despite these challenges , But the deep learning of graphics is still an active research field , And has made exciting achievements in various bioinformatics disciplines . therefore , Let's look forward to the continuous development of in-depth learning in biological network analysis .
边栏推荐
- Openmldb meetup No.4 meeting minutes
- 智慧法院新征程,无纸化办公,护航智慧法院绿色庭审
- 8253计数器介绍
- Goto statement jump uninitialized variable: c2362
- R语言ggplot2可视化:使用ggplot2可视化散点图、aes函数中的size参数指定数据点的大小(point size)
- wallys/600VX – 2×2 MIMO 802.11ac Mini PCIe Wi-Fi Module, Dual Band, 2,4GHz / 5GHz QCA 9880
- R language ggplot2 visualization: use ggplot2 to visualize the scatter diagram, and_ Set the alpha parameter in the point parameter to specify the transparency level of data points (points transparent
- Multiparty cardinality testing for threshold private set-2021: Interpretation
- 学习redis实现分布式锁—–自己的一个理解
- Beego development blog system learning (II)
猜你喜欢

Embedded SIG | 多 OS 混合部署框架

Another miserable day by kotlin grammar

服务器常用的一些硬件信息(不断更新)

A quietly rising domestic software, low-key and powerful!

谁还记得「张同学」?

ZABBIX monitors the number of TCP connections

Quel est le rôle du rétroéclairage LED?

wallys/600VX – 2 × 2 MIMO 802.11ac Mini PCIe Wi-Fi Module, Dual Band, 2,4GHz / 5GHz QCA 9880
![移除无效的括号[用数组模拟栈]](/img/df/0a2ae5ae40adb833d52b2dddea291b.png)
移除无效的括号[用数组模拟栈]

他是上海两大产业的第一功臣,却在遗憾中默默离世
随机推荐
Constructor, class member, destructor call order
time 函数和 clock_gettime()函数的区别
AutoCAD - len command
NoSQL——Redis的配置与优化
R语言ggplot2可视化:gganimate包基于transition_time函数创建动态散点图动画(gif)、使用labs函数为动画图添加动态时间标题(抽取frame_time信息)
goto语句跳转未初始化变量:C2362
DMA controller 8237a
Shutter start from zero 006 radio switches and checkboxes
又被 Kotlin 语法糖坑惨的一天
MySQL索引和优化的理解学习
R language ggplot2 visualization: gganimate package is based on Transition_ Time function to create dynamic scatter animation (GIF)
Another miserable day by kotlin grammar
【BUG解决】fiftyone报AttributeError: module ‘cv2‘ has no attribute ‘gapi_wip_gst_GStreamerPipeline‘错误解决方法
Lucene全文检索工具包学习笔记总结
TypeScript ReadonlyArray(只读数组类型) 详细介绍
Redis - SDS simple dynamic string
会议预告 | 华为 2012 实验室全球软件技术峰会-欧洲分会场
Yolov5 export the pit encountered by onnx
go-zero微服务实战系列(八、如何处理每秒上万次的下单请求)
构造函数、类成员、析构函数调用顺序