当前位置:网站首页>根据Uniprot ID/PDB ID批处理获取蛋白质.pdb文件
根据Uniprot ID/PDB ID批处理获取蛋白质.pdb文件
2022-08-01 19:52:00 【李划水员】
1.根据Uniprot ID批处理获取蛋白质.pdb文件
由于Uniprot的ID号可能对应多个NCBI的ID,但是根据Alphafold可以获取其唯一的PDB文件,所以用代码批处理获得.pdb文件如下:
import pandas as pd
import numpy as np
from Bio import SeqIO
from Bio import PDB
import requests
# 但是可能会出现 InsecureRequestWarning 警告,
# 虽然不影响代码采集但是看着不舒服,可以加上下面两行:
import urllib3
urllib3.disable_warnings()
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0'}
def read_file(file_name):
pro_swissProt = []
with open(file_name, 'r') as fp:
for line in fp:
if line.startswith('>'):#作用:判断字符串是否以指定字符或子字符串开头
pro_swissProt.append(line[1:-1])
return pro_swissProt
file1 = '/AD/all1.csv'
ID=read_file(file1)
j = 0
not_exist_list = []
for i in ID:
j = j + 1
print(j)
print(i)
url = 'https://alphafold.ebi.ac.uk/files/AF-'+i+'-F1-model_v1'+'.pdb'
print(url)
r = requests.get(url, headers=headers, verify=False)
with open('/AD/Information/PDB/'+i+'.pdb','w') as files:
r = r.text.splitlines() #np.array(pssm).tolist()
for lines in r:
files.write(lines)
files.write('\n')
if r[0][1]=='?':
print(i + '没有pdb文件。')
not_exist_list.append(i)
#输出了未找到的蛋白质的.pdb文件,这些可以在网址里再手动查一下,有遗漏的
print(not_exist_list)
print(len(not_exist_list))
其中,file1格式如下:
>Q8BH75
MGYDVTRFQGDVDEDLICPICSGVLEEPVQAPHCEHAFCNACITQWFSQQQTCPVDRSVVTVAHLRPVPRIMRNMLSKLQIACDNAVFGCSAVVRLDNLMSHLSDCEHNPKRPVTCEQGCGLEMPKDELPNHNCIKHLRSVVQQQQSRIAELEKTSAEHKHQLAEQKRDIQLLKAYMRAIRSVNPNLQNLEETIEYNEILEWVNSLQPARVTRWGGMISTPDAVLQAVIKRSLVESGCPASIVNELIENAHERSWPQGLATLETRQMNRRYYENYVAKRIPGKQAVVVMACENQHMGDDMVQEPGLVMIFAHGVEEI
>P06727
MFLKAVVLTLALVAVAGARAEVSADQVATVMWDYFSQLSNNAKEAVEHLQKSELTQQLNALFQDKLGEVNTYAGDLQKKLVPFATELHERLAKDSEKLKEEIGKELEELRARLLPHANEVSQKIGDNLRELQQRLEPYADQLRTQVSTQAEQLRRQLTPYAQRMERVLRENADSLQASLRPHADELKAKIDQNVEELKGRLTPYADEFKVKIDQTVEELRRSLAPYAQDTQEKLNHQLEGLTFQMKKNAEELKARISASAEELRQRLAPLAEDVRGNLRGNTEGLQKSLAELGGHLDQQVEEFRRRVEPYGENFNKALVQQMEQLRQKLGPHAGDVEGHLSFLEKDLRDKVNSFFSTFKEKESQDKTLSLPELEQQQEQQQEQQQEQVQMLAPLES
>Q60770
MAPPVSERGLKSVVWRKIKTAVFDDCRKEGEWKIMLLDEFTTKLLSSCCKMTDLLEEGITVIENIYKNREPVRQMKALYFISPTPKSVDCFLRDFGSKSEKKYKAAYIYFTDFCPDSLFNKIKASCSKSIRRCKEINISFIPQESQVYTLDVPDAFYYCYSPDPSNASRKEVVMEAMAEQIVTVCATLDENPGVRYKSKPLDNASKLAQLVEKKLEDYYKIDEKGLIKGKTQSQLLIIDRGFDPVSTVLHELTFQAMAYDLLPIENDTYKYKTDGKEKEAVLEEDDDLWVRVRHRHIAVVLEEIPKLMKEISSTKKATEGKTSLSALTQLMKKMPHFRKQISKQVVHLNLAEDCMNKFKLNIEKLCKTEQDLALGTDAEGQRVKDSMLVLLPVLLNKNHDNCDKIRAVLLYIFGINGTTEENLDRLIHNVKIEDDSDMIRNWSHLGVPIVPPSQQAKPLRKDRSAEETFQLSRWTPFIKDIMEDAIDNRLDSKEWPYCSRCPAVWNGSGAVSARQKPRTNYLELDRKNGSRLIIFVIGGITYSEMRCAYEVSQAHKSCEVIIGSTHILTPRKLLDDIKMLNKSKDKVSFKDE
>P70452
MRDRTHELRQGDNISDDEDEVRVALVVHSGAARLGSPDDEFFQKVQTIRQTMAKLESKVRELEKQQVTILATPLPEESMKQGLQNLREEIKQLGREVRAQLKAIEPQKEEADENYNSVNTRMKKTQHGVLSQQFVELINKCNSMQSEYREKNVERIRRQLKITNAGMVSDEELEQMLDSGQSEVFVSNILKDTQVTRQALNEISARHSEIQQLERSIRELHEIFTFLATEVEMQGEMINRIEKNILSSADYVERGQEHVKIALENQKKARKKKVMIAICVSVTVLILAVIIGITITVG
>P63044
MSATAATVPPAAPAGEGGPPAPPPNLTSNRRLQQTQAQVDEVVDIMRVNVDKVLERDQKLSELDDRADALQAGASQFETSAAKLKRKYWWKNLKMMIILGVICAIILIIIIVYFST2.根据PDB ID在RCSB中获取pdb文件
将第一段代码的网址换成:
url = 'http://www.rcsb.org/pdb/files/'+i+'.pdb'PS:最近在学习dssp的处理,但是一直没有进展,又没有小伙伴有Linux的安装包和教程
***********************
满满的干货说我文章质量太低了………………,让我提交下,看看字数够了没
边栏推荐
- 对于web性能优化我有话说!
- GEE(8):使用MODIS填补由去云后的Landsat影像计算得到的NDVI数据
- openresty 动态黑白名单
- Heavy cover special | intercept 99% malicious traffic, reveal WAF offensive and defensive drills best practices
- 如何记录分析你的炼丹流程—可视化神器Wandb使用笔记【1】
- 环境变量,进程地址空间
- Gradle系列——Gradle文件操作,Gradle依赖(基于Gradle文档7.5)day3-1
- Win11校园网无法连接怎么办?Win11连接不到校园网的解决方法
- OSPO 五阶段成熟度模型解析
- mysql解压版简洁式本地配置方式
猜你喜欢
随机推荐
【kali-信息收集】(1.2)SNMP枚举:Snmpwalk、Snmpcheck;SMTP枚举:smtp-user-enum
In the background of the GBase 8c database, what command is used to perform the master-slave switchover operation for the gtm and dn nodes?
经验共享|在线文档协作:企业文档处理的最佳选择
latex论文神器--服务器部署overleaf
【七夕特别篇】七夕已至,让爱闪耀
datax - 艰难debug路
Intranet penetration lanproxy deployment
Database Plus 的云上之旅:SphereEx 正式开源 ShardingSphere on Cloud 解决方案
实用新型专利和发明专利的区别?秒懂!
The solution to the vtk volume rendering code error (the code can run in vtk7, 8, 9), and the VTK dataset website
Redis启动时提示Creating Server TCP listening socket *:6379: bind: No error
部署zabbix
From ordinary advanced to excellent test/development programmer, all the way through
57:第五章:开发admin管理服务:10:开发【从MongoDB的GridFS中,获取文件,接口】;(从GridFS中,获取文件的SOP)(不使用MongoDB的服务,可以排除其自动加载类)
What should I do if the Win11 campus network cannot be connected?Win11 can't connect to campus network solution
力扣刷题之求两数之和
C语言实现-直接插入排序(带图详解)
cf:D. Magical Array【数学直觉 + 前缀和的和】
Choosing the right DevOps tool starts with understanding DevOps
【软考软件评测师】基于规则说明的测试技术下篇









