当前位置:网站首页>How to use R language to draw scatter diagram
How to use R language to draw scatter diagram
2022-06-23 17:53:00 【Playful programming】
R Language mapping gene expression gene “ Symmetric scatter diagram
Transcriptome analysis , After calculating the differentially expressed genes between the two groups , How do you usually mean ? You may have thought of using volcano map for the first time . You bet , Volcano maps are the most frequently used , In the volcanic map, it is easy to distinguish between two groups according to genes Fold
Change Value and significance p value , Overview of identifying and judging differentially expressed genes . A volcano map is essentially a scatter chart , Usually the horizontal and vertical coordinates represent log2 Transformed Fold Change as well as -
log10 Transformed p Value or p Adjustment value information ( The following figure on the left ). When it comes to scatter charts , There is another common pattern of showing differentially expressed genes : The horizontal and vertical axes can represent the average expression values of the two groups of genes , This style makes it easier to visually compare the different states of genes in the two groups .
1 Sample file
Sample file “gene_diff.txt” It is the result of differential expression analysis of a group of genes , The processing group is recorded (treat) And the control group (control) The expression of genes that are significantly inconsistent , The identification standard is p<0.01 as well as |log2
Fold Change|≥1.
among ,gene_id For the gene name ;control and treat It represents the average expression value of genes in the two groups ;log2FoldChange namely log2 The differential multiple of gene expression after transformation ;pvalue Is the significance of different genes p value ;diff On the basis of p<0.01 as well as |log2
Fold Change|≥1 Screened differential genes , In this column “up” For upward adjustment ,“down” Is down ,“none” Non differential genes .
Next, through the example file , Display using R Language mapping differential gene expression “ Symmetric scatter diagram ” The process .
2 Data preprocessing
First, do some preprocessing for the data .
for example , The magnitude difference of gene expression value is too large , Take a logarithmic transformation ; Gene names are sorted according to whether they are different genes , Avoid being obscured by insignificant gene points in subsequent mapping , That is, the purpose of sequencing is to make the points of these significant genes all located at the top of the graph .
# Read sample data
express <- read.delim('gene_diff.txt', sep = '\t')# Take a gene expression value log(1+) transformation
express$control <- log(express$control+1)
express$treat <- log(express$treat+1)
# Sort , The goal is to show the prominent genes on the front layer , Avoid being obscured by points of inconspicuous genes
express$diff <- factor(express$diff, levels = c('up', 'down', 'none'))express <- express[order(express$diff, decreasing = TRUE), ]
head(express) # View the data table after reading and preprocessing
3 Draw a scatter plot of differential genes , Color indicates the difference gene
Then you can use the preprocessed data to draw a graph .
The first type is the up regulation of genes 、 Down or non significant type shading , It is easy to identify different genes from the map . We use ggplot2 The method of mapping differential gene scatter plot .
# Draw a scatter plot , Prominently 、 Down regulated genes are distinguished by different colors
library(ggplot2)
ggplot(express, aes(x = control, y = treat)) +
geom_point(aes(color = diff), size = 1) + # Press up and down to specify the color of gene points
scale_color_manual(values = c('red', 'gray', 'green4'), limit = c('up', 'none', 'down')) + # Up and down gene color assignment theme_bw() + # Background adjustment
labs(x = 'control group', y = 'treat group', color = '') + # Axis title settings
geom_abline(intercept = 1, slope = 1, col = 'black', linetype = 'dashed', size = 0.5) + # this 3 Sentences are used to add |log2FC|>1 Threshold line of
geom_abline(intercept = -1, slope = 1, col = 'black', linetype = 'dashed', size = 0.5) +
geom_abline(intercept = 0, slope = 1, col = 'black', linetype = 'dashed', size = 0.5)
The two axes represent the processing group (treat) And the control group (control), The dots in the figure represent the average expression value of each gene in the two groups ( Have done log transformation ).treat Group and control Group comparison , Up regulated genes are shown in red , Down regulated genes are shown in green . The dotted line in the figure represents |log2FC|=1 Threshold line at .
In this diagram , We can easily observe the overall distribution status and quantity comparison of different genes .
4 Draw a scatter plot of differential genes , Color means p value
There is no such thing as p Value information shows . So another idea is , Color represents p value , So you can get a gradient in the graph . Also use ggplot2 How to draw , Compared with the above process, there is only difference in color assignment .
# Press p Gradient dispersion plot of values
ggplot(express, aes(x = control, y = treat)) +
geom_point(aes(color = pvalue), size = 0.8) + # Press p Value size specifies the color of the gene point
scale_color_gradient2(low = 'red', mid = 'darkgoldenrod2', high = 'royalblue2', midpoint = 0.5) + # Gradient color assignment
theme_bw() + # Background adjustment
labs(x = 'control group', y = 'treat group', color = 'p-value') + # Axis title settings
geom_abline(intercept = 1, slope = 1, col = 'black', linetype = 'dashed', size = 0.5) + # this 3 Sentences are used to add |log2FC|>1 Threshold line of
geom_abline(intercept = -1, slope = 1, col = 'black', linetype = 'dashed', size = 0.5) +
geom_abline(intercept = 0, slope = 1, col = 'black', linetype = 'dashed', size = 0.5)
Similar to the above figure , The two axes represent the processing group (treat) And the control group (control), The dots in the figure represent the average expression value of each gene in the two groups ( Have done log transformation ), The dotted line in the figure represents |log2FC|=1 Threshold line at .
The difference from the above figure is , At this time, the gene is significant p Value coloring , Never significant > Prominently displayed in blue > Red gradient , So we get a kind of gradient information . So it's easy to see , The greater the difference in expression values between the two groups ,p The smaller the value. , The two trends are consistent , The emphasis is on describing the difference multiple and p Value the relationship between .
边栏推荐
- MySQL的 安装、配置、卸载
- B. AND 0, Sum Big-Codeforces Round #716 (Div. 2)
- qYKVEtqdDg
- 解答03:Smith圆为什么能“上感下容 左串右并”?
- 记录——kubeadm集群node节点加入
- How to configure MySQL log management
- JS custom error
- [mae]masked autoencoders mask self encoder
- Go unit test
- POC about secureworks' recent azure Active Directory password brute force vulnerability
猜你喜欢

Interface ownership dispute

hands-on-data-analysis 第二单元 第四节数据可视化

DataNode进入Stale状态问题排查

Ctfshow PHP features
![[network communication -- webrtc] analysis of webrtc source code -- supplement of pacingcontroller related knowledge points](/img/18/dda0c7fa33ee6cdf2bbb2f85099940.png)
[network communication -- webrtc] analysis of webrtc source code -- supplement of pacingcontroller related knowledge points
![[mae]masked autoencoders mask self encoder](/img/08/5ab2b0d5b81c723919046699bb6f6d.png)
[mae]masked autoencoders mask self encoder

Self supervised learning (SSL)

Intranet penetration token stealing

Intel arc A380 graphics card message summary: the entry-level price products of running point and bright driving need to be optimized

Wechat applet: time selector for the estimated arrival date of the hotel
随机推荐
股票网上开户及开户流程怎样?在线开户安全么?
[Hyperf]Entry “xxxInterface“ cannot be resolved: the class is not instantiable
Date转换为LocalDateTime
JS reset form
Reinforcement learning series (I) -- basic concepts
Similarities and differences between Chinese and American electronic signature SaaS
Single fire wire design series article 10: expanding application - single fire switch realizes double control
B. Integers Shop-Hello 2022
QT layout manager [qvboxlayout, qhboxlayout, qgridlayout]
Is it cost-effective to buy a long-term financial product?
内网渗透令牌窃取
Query the size of each table in the database
How to create a three elimination game
Analysis of object class structure in Nanny level teaching (common class) [source code attached]
POC about secureworks' recent azure Active Directory password brute force vulnerability
Lighthouse open source application practice: o2oa
开户券商怎么选择?现在网上开户安全么?
Async/await
How to choose an account opening broker? Is it safe to open an account online now?
Meituan Sanmian: how do you understand the principle of redis master-slave replication?