当前位置:网站首页>KEGG通路的从属/注释信息如何获取
KEGG通路的从属/注释信息如何获取
2022-07-28 07:28:00 【TOP生物信息】
*富集分析经常做,之前只知道GO term有从上到下的层次关系,今天才知道KEGG pathway也有类似的分层关系。
*
起因是我准备更新一下自己的kegg富集结果展示图,之前一直画的这种图,略显朴素了。
正好搜到了这张图,还挺好看的(主要是配色很清新)

这张图并不难(等有空把代码整理出来发给大家),可是这个「kegg pathway annotation」我真没见过呀!去官网看了看,能找到,但没法下载

在朋友圈求助后,费了很大劲才把这个问题解决。中间好几个小伙伴都很热心地给了我帮助,感谢他们!
以下是我整理的KEGG pathway annotation文件的网盘链接,用到的代码和数据也在里面,想改善一下kegg富集图的朋友可以看看。
*链接:https://pan.baidu.com/s/18pwYZGGZSuk2_LGW8_nQFQ 提取码:ihyn (PS: 觉得有用可以给我
*本期推送的第1篇推文点个赞呀,谢谢了!)

整理这个文件的思路如下
浏览器打开这个网页(https://www.genome.jp/kegg/pathway.html),然后查看网页源代码(一般是鼠标右键),就能看到这个:

然后复制粘贴到一个文本文件
kegg_html.txt,删掉前188行左右,后17行左右(删掉的这些行明显不含有用信息),得到kegg_html_copy.txt。然后运行我编写的代码
pre.R就能得到最终表格了
library(tidyverse)
tmp1=readLines("kegg_html_copy.txt")
tmp2=c()
for (li in 1:length(tmp1)) {
if(str_detect(tmp1[li],">[0-9]")) {
tmp2=append(tmp2,tmp1[li])
}
}
###
big_anno=""
small_anno=""
final_lines=c()
for (li in 1:length(tmp2)) {
if (str_detect(tmp2[li],">[0-9]\\. ")) {
tmp_anno=str_extract(tmp2[li],">.*<")
tmp_anno=str_split(tmp_anno,"\\. ")[[1]][2]
tmp_anno=str_split(tmp_anno,"<")[[1]][1]
big_anno=tmp_anno
} else if (str_detect(tmp2[li],">[0-9]\\.[0-9]{1,2} ")) {
tmp_anno2=str_extract(tmp2[li],">.*<")
tmp_anno2=str_replace(tmp_anno2,"^>[0-9]\\.[0-9]{1,2} ","")
tmp_anno2=str_split(tmp_anno2,"<")[[1]][1]
small_anno=tmp_anno2
} else if (str_detect(tmp2[li],">[0-9]{5}")){
element1=str_extract(tmp2[li],">[0-9]{5}")
element1=str_split(element1,">")[[1]][2]
if (!str_detect(tmp2[li],"hsa\\+pathogen")) {
element2=str_extract(tmp2[li],"pathway\\/[a-zA-Z]{2,4}[0-9]{5}")
element2=str_split(element2,"\\/")[[1]][2]
element3=str_extract(tmp2[li],"pathway.*?<") #非贪婪匹配
element3=str_extract(element3,">.*<")
element3=str_split(element3,">")[[1]][2]
element3=str_split(element3,"<")[[1]][1]
} else {
element2="organism:hsa+pathogen"
element3=str_extract(tmp2[li],"hsa\\+pathogen.*?<") #非贪婪匹配
element3=str_extract(element3,">.*<")
element3=str_split(element3,">")[[1]][2]
element3=str_split(element3,"<")[[1]][1]
}
tmp_line=paste(as.character(element1),element2,element3,big_anno,small_anno,sep = ";")
final_lines=append(final_lines,tmp_line)
}else{
print(tmp2[li])
}
}
###
final_df=as.data.frame(final_lines)
colnames(final_df)="V1"
final_df=final_df%>%apply(1, function(x){as.data.frame(str_split(x,";")[[1]])})
final_df=as.data.frame(final_df)
final_df=as.data.frame(t(final_df))
rownames(final_df)=NULL
colnames(final_df)=NULL
colnames(final_df)=c("ID","Pathway Identifier","Pathway","big annotion","small annotion")
###
library(xlsx)
write.xlsx(final_df,file = "kegg_info.xlsx",col.names = T,row.names = F)
边栏推荐
- Redis 基本知识,快来回顾一下
- Export SQL server query results to excel table
- Go interface advanced
- Service current limiting and fusing of micro service architecture Sentinel
- Source code analysis of linkedblockingqueue
- Top all major platforms, 22 versions of interview core knowledge analysis notes, strong on the list
- XMIND Zen installation tutorial
- Distributed system architecture theory and components
- Vrrp+mstp configuration details [Huawei ENSP experiment]
- Div tags and span Tags
猜你喜欢

Bluetooth technology | it is reported that apple, meta and other manufacturers will promote new wearable devices, and Bluetooth will help the development of intelligent wearable devices

Top all major platforms, 22 versions of interview core knowledge analysis notes, strong on the list

Business digitalization is running rapidly, and management digitalization urgently needs to start

Line generation (matrix)

Explain cache consistency and memory barrier

Argocd Web UI loading is slow? A trick to teach you to solve

Go waitgroup and defer

快速搭建一个网关服务,动态路由、鉴权的流程,看完秒会(含流程图)

49 opencv deep analysis profile

图片批处理|必备小技能
随机推荐
(13) Simple temperature alarm device based on 51 single chip microcomputer
Path and attribute labels of picture labels
A new method of exposing services in kubernetes clusters
When I use MySQL CDC, there are 100 million pieces of data in the source table. In the full volume phase, when I synchronize 10 million, I stop, and then pass
Solution: indexerror: index 13 is out of bounds for dimension 0 with size 13
Round C financing has been completed! Smart software leads domestic Bi ecological empowerment, and products and services are a step forward
Export SQL server query results to excel table
After reading these 12 interview questions, the new media operation post is yours
Alibaba internal interview materials
Network interface network crystal head RJ45, Poe interface definition line sequence
SQL server time field sorting
[cloud computing] several mistakes that enterprises need to avoid after going to the cloud
How to import and export Youxuan database
SQL injection - pre Foundation
Go synergy
Hcip day 8
Go interface advanced
How to configure phpunit under window
Customer first | domestic Bi leader, smart software completes round C financing
Explain cache consistency and memory barrier