当前位置:网站首页>WGCNA analysis basic tutorial summary
WGCNA analysis basic tutorial summary
2022-07-04 21:20:00 【Scientific research workers strive to move bricks】
WGCNA R Bao official website tutorial :
WGCNA: R package for performing Weighted Gene Co-expression Network Analysis
Related articles :
WGCNA: an R package for weighted correlation network analysis | BMC Bioinformatics | Full Text
Weighted gene coexpression network analysis
Weighted correlation network analysis (WGCNA)
Network analysis method based on correlation coefficient , Applicable to multi sample data , The more samples, the more stable the analysis . A systems biology approach to describe the patterns of gene association between different samples .
It can be used to find clusters of highly related genes ( modular ), Summarize such clusters using module characteristic genes or hub genes within the module , Correlate modules with external sample traits ( Using characteristic gene network method ), To measure the distribution of module members , Then it can be used to identify candidate biomarkers or therapeutic targets .
WGCNA advantage :① Make full use of information , Use the information of thousands or nearly 10000 genes with the greatest changes or all genes to identify the gene set of interest , And significant correlation analysis with phenotype ;② Transform the association between thousands of genes and phenotypes into the association between several gene sets and phenotypes , It avoids the problem of multiple hypothesis test correction .
Co expression network ( Weighted genetic network ): Dots represent genes , Edges represent gene expression correlations . Weighting refers to The correlation value is calculated twice ( The value of Mingci is Soft threshold (power, pickSoftThreshold What this function does is determine the appropriate power)). The screening principle of soft threshold is to make the constructed network more consistent with the characteristics of scale-free network . If there is nothing suitable power, Generally, it is because some samples are too different from other samples for some reason . terms of settlement : Remove some samples / Check the experience value .
The edge properties of undirected networks are calculated as abs(cor(genex, geney)) ^ power
; The edge properties of directed networks are calculated as (1+cor(genex, geney)/2) ^ power
; sign hybrid The edge attribute of is calculated as cor(genex, geney)^power if cor>0 else 0.
Module( modular ): Highly interconnected gene set . In undirected networks , Inside the module is the height relevant Genes . In directed networks , Inside the module is the height positive correlation Genes .
After clustering genes into modules , Each module can be analyzed at three levels :
1
. Functional enrichment analysis (GO/KEGG/GSEA) Check whether its functional characteristics are consistent with the research purpose ;
2
. Association analysis between modules and traits , Find the module with the highest correlation with the traits concerned ;
3
. Correlation analysis between modules and samples , Find the specific and highly expressed module of the sample .
Connectivity ( Connectivity ): It's similar to in the network “ degree ” (degree) The concept of . The connectivity of each gene is that of the genes connected to it Sum of edge attributes
.
Module eigengene E: The first principal component of a given model , representative Gene expression profile of the whole model .
Intramodular connectivity: The correlation between a given gene and other genes in a given model , Judge the relationship of genes .
Module membership: Given gene expression profile and given model eigengene The relevance of .
Adjacency matrix ( Adjacency matrix ): A matrix composed of weighted correlation values between genes .
TOM (Topological overlap matrix): Convert adjacency matrix into topological overlap matrix , To reduce noise and false correlation , The new distance matrix obtained , This information can be used to build networks or draw TOM chart .
Basic analysis process :
Construct gene coexpression network : Use weighted expression correlation .
Identify gene sets : Based on weighted correlation , Hierarchical cluster analysis , The clustering results are segmented according to the set criteria , Get different gene modules , It is represented by the branches and different colors of the cluster tree .
If there is phenotypic information , Calculate the correlation between gene module and phenotype , Identify modules related to traits .
Study the relationship between models , View the interaction networks of different models from the system level .
Select the driver genes of interest from the key models , Or infer the function of unknown genes according to the function of known genes in the model .
export TOM matrix , Draw a correlation diagram .
The specific actual combat can be operated according to the article reproduction in the link , To get familiar with the process .
Input data format :
1、 Gene expression matrix : Genes are good , Samples are listed . You can use Deseq2 in varianceStabilizingTransformation
or log2(x+1)
Transform the standardized data . If the data comes from different batches , You need to remove the batch effect first ( I remember that the last transcriptome training class talked about how to operate ). If the data has a system offset , Need to do quantile normalization
. perhaps normalizeBetweenArrays().
2、 Trait matrix : It must be continuous or sub type of value (0/1).
3、 Undirected networks are power Less than 15
Or directed network power Less than 30
Inside , None of them power Value can make the scale-free network map structure R^2 achieve 0.8 Or the average connectivity drops to 100 following , It may be because some samples are too different from other samples . This may be caused by Batch effect
、 Sample heterogeneity
or Experimental conditions have too much influence on expression
And so on , You can view grouping information by plotting sample clusters 、 Associate batch information 、 Process information and whether there are abnormal samples if this is really caused by meaningful biological changes , Experience can also be used power value
reference :
WGCNA analysis , Simple and comprehensive latest tutorial – All over the world
边栏推荐
- acwing 3302. 表达式求值
- Android原生数据库的基本使用和升级
- 华为ensp模拟器 配置ACL访问控制列表
- Remember to build wheels repeatedly at one time (the setting instructions of obsidian plug-in are translated into Chinese)
- 2021 CCPC 哈尔滨 B. Magical Subsequence(思维题)
- __ init__ () missing 2 required positive arguments
- 【Try to Hack】宽字节注入
- 【服务器数据恢复】某品牌服务器存储raid5数据恢复案例
- Go language notes (4) go common management commands
- PS竖排英文和数字文字怎么改变方向(变竖直显示)
猜你喜欢
随机推荐
[1200. Minimum absolute difference]
What are the functional modules of RFID warehouse management system solution
TweenMax表情按钮js特效
华为ensp模拟器 配置ACL访问控制列表
网件r7000梅林系统虚拟内存创建失败,提示USB磁盘读写速度不满足要求解决办法,有需要创建虚拟内存吗??
Day24: file system
maya灯建模
Solution of 5g unstable 5g signal often dropped in NetWare r7000 Merlin system
多模輸入事件分發機制詳解
杰理之AD 系列 MIDI 功能说明【篇】
Poster cover of glacier
哈希表、哈希函数、布隆过滤器、一致性哈希
async await 在map中使用
admas零件名重复
Pytorch---使用Pytorch实现LinkNet进行语义分割
WGCNA分析基本教程总结
GVM use
杰理之增加进关机前把触摸模块关闭流程【篇】
Day24:文件系统
Roast B station charges, is it because it has no money?