当前位置:网站首页>Biological network analysis using deep learning
Biological network analysis using deep learning
2022-06-30 12:07:00 【Zhiyuan community】
Today, we bring you a report published in 《Briefings in Bioinformatics》 The article 《Biological network analysis with deep learning》, It is a review article .

Abstract
This article describes biological networks , The principle and basic algorithm of neural network are reviewed . Then it discusses the application of graphic neural network in bioinformatics , Such as protein structure prediction and electronic drug discovery and development . Last , This article emphasizes the application fields of gene regulatory network and disease diagnosis , In these areas , Deep learning is emerging as a new tool , To solve classic problems .
1. background
One of the advantages of deep learning is that it can detect complex patterns in data , This makes it very suitable for the application of bioinformatics , In bioinformatics , Data represent complex relationships between biological entities and processes 、 The interdependent relationship , These entities and processes usually have inherent noise , And it happens on multiple scales . also , The deep learning method has been extended to graph structure data , This makes it a promising technology to solve these biological network analysis problems . This article first introduces the biological network , Then it describes the typical learning tasks on the biological network , Finally, we discuss GNNs The most popular application task in bioinformatics .
2. Biological networks
DNA、RNA、 Proteins and metabolites play a crucial role in the molecular mechanisms of cellular processes . The structure and interaction of these entities can be represented by a graph , The graph consists of a set of nodes and a set of edges representing the connections between nodes . for example , A molecule can be represented as a graph , Where nodes are atoms , Edges are bonds between atoms . Similarly , Many biological processes can be modeled by the interactions or relationships between entities as nodes and between them as edges . Network provides a simple and intuitive representation for heterogeneous and complex biological processes . Besides , It uses graph theory 、 Machine learning and deep learning techniques to facilitate modeling and understanding of complex molecular mechanisms .
As mentioned earlier , We can define biological networks at different levels . In addition to the graphical representation of biological factors used to study molecular properties and functions , Other common biological networks include proteins - Protein interactions (PPI) The Internet 、 Gene regulatory networks (GRN)、 Metabolic networks and drugs - Drug interactions (DDI) The Internet . These networks will be briefly introduced next .
Protein-Protein Interaction Networks PPI Networks represent interactions between proteins .PPIs It is essential for almost all cell functions , From the assembly of cellular structural components , To transcribe 、 Translation and active transfer . stay PPI In the network , Nodes correspond to proteins , And edges define the interactions between connecting proteins .
Gene Regulatory Networks GRN It represents a complex mechanism for regulating gene expression . The regulatory mechanism takes place in DNA Different stages of protein production , Such as transcription 、 Translation and splicing stage . The intuitive explanation for these complex and interrelated mechanisms is , Protein is the product of gene expression , It is also the controller of gene expression . stay GRNs in , Each node represents a gene , The direct connection between two genes means that one gene directly regulates the expression of the other gene , Without other genes .
Metabolic Networks Metabolic networks use graphs to represent metabolism , Metabolism is a collection of all the chemical reactions that take place in an organism to maintain life . Given its complexity , Metabolic networks are usually broken down into metabolic pathways , That is, a series of chemical reactions related to the performance of specific metabolic functions . The metabolic network maps each metabolite to a node , Map each reaction to a directed edge with the enzyme as the catalyst .
Drug–Drug Interaction Networks DDI The goal of the network is to simulate the interaction between different drugs .DDI The network represents drugs as nodes , The drug interactions are represented by edges . Unlike previous networks ,DDI Networks do not represent biological processes . However , Because it is a meaningful representation of the knowledge of drug interactions , This leads researchers to have a deep understanding of DDI The Internet is becoming more and more interested . in fact ,DDI Network is widely used in multi drug research .
3. Learning tasks on graphs
The learning task of graph is divided into node classification at a higher level 、 Link prediction 、 Graph classification and graph embedding , Next, we will explain each task in detail .
Node Classification Node classification is a typical task in biological network analysis , It predicts the unknown function of proteins according to the function of neighbors in protein-protein interaction network . The input graph contains some labeled nodes , But many nodes have no labels , The goal is to classify the remaining unlabeled nodes in the network . This is usually solved by semi supervised learning , The algorithm uses the whole network as input in the training process , The goal is to classify all nodes . Although all nodes will be classified , But during training , The loss is calculated only on nodes with real labels , To classify the remaining unlabeled nodes .
Link Prediction Current knowledge of interactions in biological networks is often incomplete , For example, which genes regulate the expression of another gene in the gene regulatory network . Predicting these missing edges is link prediction . This is a semi supervised learning problem , Use the known links in the graph to predict other links that may exist .
Graph Classification or Regression When biological network data is composed of multiple individual networks , For example, molecular 3D Structure data set , The goal becomes to predict the properties of each network , Such as molecular solubility or toxicity . This task is called graph classification , It takes the drawing data set as input , Then classify each individual graphic ( Or return ). This is the most common supervised learning problem .
Graph Embedding Graph embedding is a method to find the low dimension of graph , A method of representing a graph by a vector of fixed size , for example PPI The Internet , Or elements in the network , For example, protein . This is usually achieved through unsupervised learning . A given node or graph is represented as a vector of fixed size so that the graph can use any ready-made machine learning algorithm . Before using standard machine learning algorithms for specific tasks , Learning graphic embedding is often used as a preprocessing step .
4.Applications in biology
Protein structure prediction Predict the genetic sequence of proteins 3D structure , Also known as the protein folding problem .AlphaFold Represents a breakthrough approach , It sets a new baseline for deep learning and traditional methods . Like other methods ,AlphaFold Start with the amino acid sequence , As a prediction 3D The foundation of the structure . This input is combined with other characteristic information collected from the protein database , And use CNN To predict the discrete probability distribution of the distance between all amino acid pairs , And the probability distribution of torsion angle . Compared with the previous method which only predicted whether two residues were linked by a link , The predicted distance and its corresponding distribution produce more accurate results . It uses distance and twist angle , And the penalty for predicting atomic overlap , To assess the quality of their predictions , Called potential . Then perform random gradient descent , Iteratively improve their models . Using this method yields very good results , And let people deeply understand the potential of deep learning in solving some challenging bioinformatics problems .
Disease diagnosis In the past few years , The use of deep learning for disease diagnosis has aroused great interest in the research community . However , Few people use biological networks . Some researchers proposed to combine spectral clustering with the central nervous system , Integrated gene expression data PPIs Network to predict lung cancer . It attempts different configurations of the proposed method , To determine the best way to perform , And from the accuracy 、 Methods to evaluate them in terms of accuracy and recall .
Besides ,Rhee Et al. Proposed another example of deep learning on biological networks , To classify the subtypes of breast cancer . Their approach incorporates a GCN And a network , And absorbed a rich gene expression data PPIs The Internet . utilize GCN, Their method can learn local graph information , and RN The use of allows you to capture complex patterns between node sets . take GCN Output and RN The output is combined to obtain the classification results . This method is combined with support vector machine 、 Random forests 、k a near neighbor 、 Polynomial and Gaussian naive Bayes are compared , The performance is obtained by Monte Carlo cross validation experiment . It turns out that , The proposed method is superior to the baseline in all the indicators used , Show pass GCN Study PPIs Network feature representation can significantly help capture patterns in gene expression data .
The next two examples use RNA-disease And genes - Application of disease association network . Some scholars have proposed a method , The inputs are disease and RNAs Graph associated with , be called RNAs- Disease networks . Author use GCN Combining the graph attention network to capture the input global and local structure information , The purpose is to predict RNA- Disease correlation . In order to achieve this goal , The author puts forward two GCNs And a matrix factorization . disease 、 Gene features and similarity graphs are assigned to two parallel gene control networks , By inner product, the obtained embedments are combined for prediction .
5. summary
Although the deep learning method is promising , But there are also limitations and many problems to be solved . One of the main problems of deep learning is the lack of explicability , Because the deep learning algorithm has the nature of black box , If there is not enough understanding of the prediction process , Doctors and patients are often less likely to believe the output of deep learning models . Another problem is the need for large tag datasets , Because the deep neural network has a large number of super parameters that need to be adjusted .
Despite these challenges , But the deep learning of graphics is still an active research field , And has made exciting achievements in various bioinformatics disciplines . therefore , Let's look forward to the continuous development of in-depth learning in biological network analysis .
边栏推荐
猜你喜欢

Another miserable day by kotlin grammar

Redis6学习笔记-第二章-Redis6的基本操作

深入解析 Apache BookKeeper 系列:第四篇—背压

How can c write an SQL parser
Redis - ziplist compressed list

使用深度学习进行生物网络分析

【模式识别大作业】

"War" caused by a bottle of water

限时预约|6 月 Apache Pulsar 中文开发者与用户组会议

wallys/IPQ8074a/2x(4 × 4 or 8 × 8) 11AX MU-MIMO DUAL CONCURRENT EMBEDDEDBOARD
随机推荐
又被 Kotlin 语法糖坑惨的一天
【LeetCode】15、三数之和
Beego development blog system learning (II)
使用cookie技术实现历史浏览记录并控制显示的个数
Boost study: boost log
聊聊怎么做硬件兼容性检测,快速迁移到openEuler?
[pattern recognition]
Yolov5 export the pit encountered by onnx
【BUG解决】fiftyone报AttributeError: module ‘cv2‘ has no attribute ‘gapi_wip_gst_GStreamerPipeline‘错误解决方法
Let's talk about how to do hardware compatibility testing and quickly migrate to openeuler?
MATLAB中polarplot函数使用
wallys/3×3 MIMO 802.11ac Mini PCIe Wi-Fi Module, QCA9880, 2,4GHz / 5GHzDesigned for Enterprise
Serial communication interface 8250
AUTOCAD——LEN命令
Using cookie technology to realize historical browsing records and control the number of displays
并行接口8255A
lvgl 小部件样式篇
R语言ggplot2可视化:使用ggplot2可视化散点图、使用scale_x_log10函数配置X轴的数值范围为对数坐标
一个悄然崛起的国产软件,低调又强大!
会议预告 | 华为 2012 实验室全球软件技术峰会-欧洲分会场