当前位置:网站首页>KEGG通路的从属/注释信息如何获取
KEGG通路的从属/注释信息如何获取
2022-07-25 20:32:00 【黄思源】
*富集分析经常做,之前只知道GO term有从上到下的层次关系,今天才知道KEGG pathway也有类似的分层关系。
*
起因是我准备更新一下自己的kegg富集结果展示图,之前一直画的这种图,略显朴素了。
正好搜到了这张图,还挺好看的(主要是配色很清新)

这张图并不难(等有空把代码整理出来发给大家),可是这个「kegg pathway annotation」我真没见过呀!去官网看了看,能找到,但没法下载

在朋友圈求助后,费了很大劲才把这个问题解决。中间好几个小伙伴都很热心地给了我帮助,感谢他们!
以下是我整理的KEGG pathway annotation文件的网盘链接,用到的代码和数据也在里面,想改善一下kegg富集图的朋友可以看看。
*链接:https://pan.baidu.com/s/18pwYZGGZSuk2_LGW8_nQFQ 提取码:ihyn (PS: 觉得有用可以给我
*本期推送的第1篇推文点个赞呀,谢谢了!)

整理这个文件的思路如下
浏览器打开这个网页(https://www.genome.jp/kegg/pathway.html),然后查看网页源代码(一般是鼠标右键),就能看到这个:

然后复制粘贴到一个文本文件
kegg_html.txt,删掉前188行左右,后17行左右(删掉的这些行明显不含有用信息),得到kegg_html_copy.txt。然后运行我编写的代码
pre.R就能得到最终表格了
library(tidyverse)
tmp1=readLines("kegg_html_copy.txt")
tmp2=c()
for (li in 1:length(tmp1)) {
if(str_detect(tmp1[li],">[0-9]")) {
tmp2=append(tmp2,tmp1[li])
}
}
###
big_anno=""
small_anno=""
final_lines=c()
for (li in 1:length(tmp2)) {
if (str_detect(tmp2[li],">[0-9]\\. ")) {
tmp_anno=str_extract(tmp2[li],">.*<")
tmp_anno=str_split(tmp_anno,"\\. ")[[1]][2]
tmp_anno=str_split(tmp_anno,"<")[[1]][1]
big_anno=tmp_anno
} else if (str_detect(tmp2[li],">[0-9]\\.[0-9]{1,2} ")) {
tmp_anno2=str_extract(tmp2[li],">.*<")
tmp_anno2=str_replace(tmp_anno2,"^>[0-9]\\.[0-9]{1,2} ","")
tmp_anno2=str_split(tmp_anno2,"<")[[1]][1]
small_anno=tmp_anno2
} else if (str_detect(tmp2[li],">[0-9]{5}")){
element1=str_extract(tmp2[li],">[0-9]{5}")
element1=str_split(element1,">")[[1]][2]
if (!str_detect(tmp2[li],"hsa\\+pathogen")) {
element2=str_extract(tmp2[li],"pathway\\/[a-zA-Z]{2,4}[0-9]{5}")
element2=str_split(element2,"\\/")[[1]][2]
element3=str_extract(tmp2[li],"pathway.*?<") #非贪婪匹配
element3=str_extract(element3,">.*<")
element3=str_split(element3,">")[[1]][2]
element3=str_split(element3,"<")[[1]][1]
} else {
element2="organism:hsa+pathogen"
element3=str_extract(tmp2[li],"hsa\\+pathogen.*?<") #非贪婪匹配
element3=str_extract(element3,">.*<")
element3=str_split(element3,">")[[1]][2]
element3=str_split(element3,"<")[[1]][1]
}
tmp_line=paste(as.character(element1),element2,element3,big_anno,small_anno,sep = ";")
final_lines=append(final_lines,tmp_line)
}else{
print(tmp2[li])
}
}
###
final_df=as.data.frame(final_lines)
colnames(final_df)="V1"
final_df=final_df%>%apply(1, function(x){as.data.frame(str_split(x,";")[[1]])})
final_df=as.data.frame(final_df)
final_df=as.data.frame(t(final_df))
rownames(final_df)=NULL
colnames(final_df)=NULL
colnames(final_df)=c("ID","Pathway Identifier","Pathway","big annotion","small annotion")
###
library(xlsx)
write.xlsx(final_df,file = "kegg_info.xlsx",col.names = T,row.names = F)
边栏推荐
- The uniapp project starts with an error binding Node is not a valid Win32 Application ultimate solution
- Log in to Baidu online disk with cookies (websites use cookies)
- 【云原生 | 从零开始学Kubernetes】八、命名空间资源配额以及标签
- Leetcode customs clearance: hash table six, this is really a little simple
- 参与开源社区还有证书拿?
- 使用cookie登录百度网盘(网站使用cookie)
- MySQL 日期【加号/+】条件筛选问题
- qml 结合 QSqlTableModel 动态加载数据 MVC「建议收藏」
- 网络协议:TCP Part2
- Arrow 之 Parquet
猜你喜欢

【高等数学】【1】函数、极限、连续
![[advanced mathematics] [4] indefinite integral](/img/4f/2aae654599fcc0ee85cb1ba46c9afd.png)
[advanced mathematics] [4] indefinite integral

Vivo official website app full model UI adaptation scheme
[today in history] June 30: von Neumann published the first draft; The semiconductor war in the late 1990s; CBS acquires CNET
![[MCU] 51 MCU burning those things](/img/fa/8f11ef64a18114365c084fff7d39f6.png)
[MCU] 51 MCU burning those things

Online random coin tossing positive and negative statistical tool

Docker builds redis cluster

各厂商网络虚拟化的优势

Difference Between Accuracy and Precision
![PMP adopts the latest exam outline, here is [agile project management]](/img/72/d3e46a820796a48b458cd2d0a18f8f.png)
PMP adopts the latest exam outline, here is [agile project management]
随机推荐
Advantages of network virtualization of various manufacturers
智能电子界桩自然保护区远程监控解决方案
参与开源社区还有证书拿?
使用cookie登录百度网盘(网站使用cookie)
Network RTK UAV test [easy to understand]
Recommended system topic | Minet: cross domain CTR prediction
什么是聚类分析?聚类分析方法的类别[通俗易懂]
DIY personal server (DIY storage server)
Introduction and construction of consul Registration Center
Introduction to several scenarios involving programming operation of Excel in SAP implementation project
雷达水位计的工作原理及安装维护注意事项
PreScan快速入门到精通第十八讲之PreScan轨迹编辑的特殊功能
Rand1 generates rand9
JS scope and scope chain
JVM (XXIII) -- JVM runtime parameters
Clickhouse notes 02 -- installation test clickvisual
RF, gbdt, xgboost feature selection methods "recommended collection"
JS作用域与作用域链
PreScan快速入门到精通第十九讲之PreScan执行器配置、轨迹同步及非配多个轨迹
【ONNX】pytorch模型导出成ONNX格式:支持多参数与动态输入