当前位置:网站首页>Batch get protein .pdb files based on Uniprot ID/PDB ID
Batch get protein .pdb files based on Uniprot ID/PDB ID
2022-08-01 20:00:00 【Paddler Lee】
1. Obtain the protein .pdb file according to the Uniprot ID batch process
Since Uniprot's ID number may correspond to multiple NCBI IDs, but its unique PDB file can be obtained according to Alphafold, so the .pdb file is obtained by code batch processing as follows:
import pandas as pdimport numpy as npfrom Bio import SeqIOfrom Bio import PDBimport requests# But there may be an InsecureRequestWarning warning,# Although it does not affect code collection, but it looks uncomfortable, you can add the following two lines:import urllib3urllib3.disable_warnings()headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0'}def read_file(file_name):pro_swissProt = []with open(file_name, 'r') as fp:for line in fp:if line.startswith('>'):#Function: Determine whether the string starts with the specified character or substringpro_swissProt.append(line[1:-1])return pro_swissProtfile1 = '/AD/all1.csv'ID=read_file(file1)j = 0not_exist_list = []for i in ID:j = j + 1print(j)print(i)url = 'https://alphafold.ebi.ac.uk/files/AF-'+i+'-F1-model_v1'+'.pdb'print(url)r = requests.get(url, headers=headers, verify=False)with open('/AD/Information/PDB/'+i+'.pdb','w') as files:r = r.text.splitlines() #np.array(pssm).tolist()for lines in r:files.write(lines)files.write('\n')if r[0][1]=='?':print(i + 'No pdb file.')not_exist_list.append(i)#Output the .pdb file of the protein not found, these can be checked manually in the URL, there are omissionsprint(not_exist_list)print(len(not_exist_list))The format of file1 is as follows:
>Q8BH75MGYDVTRFQGDVDEDLICPICSGVLEEPVQAPHCEHAFCNACITQWFSQQQTCPVDRSVVTVAHLRPVPRIMRNMLSKLQIACDNAVFGCSAVVRLDNLMSHLSDCEHNPKRPVTCEQGCGLEMPKDELPNHNCIKHLRSVVQQQQSRIAELEKTSAEHKHQLAEQKRDIQLLKAYMRAIRSVNPNLQNLEETIEYNEILEWVNSLQPARVTRWGGMISTPDAVLQAVIKRSLVESGCPASIVNELIENAHERSWPQGLATLETRQMNRRYYENYVAKRIPGKQAVVVMACENQHMGDDMVQEPGLVMIFAHGVEEI>P06727MFLKAVVLTLALVAVAGARAEVSADQVATVMWDYFSQLSNNAKEAVEHLQKSELTQQLNALFQDKLGEVNTYAGDLQKKLVPFATELHERLAKDSEKLKEEIGKELEELRARLLPHANEVSQKIGDNLRELQQRLEPYADQLRTQVSTQAEQLRRQLTPYAQRMERVLRENADSLQASLRPHADELKAKIDQNVEELKGRLTPYADEFKVKIDQTVEELRRSLAPYAQDTQEKLNHQLEGLTFQMKKNAEELKARISASAEELRQRLAPLAEDVRGNLRGNTEGLQKSLAELGGHLDQQVEEFRRRVEPYGENFNKALVQQMEQLRQKLGPHAGDVEGHLSFLEKDLRDKVNSFFSTFKEKESQDKTLSLPELEQQQEQQQEQQQEQVQMLAPLES>Q60770MAPPVSERGLKSVVWRKIKTAVFDDCRKEGEWKIMLLDEFTTKLLSSCCKMTDLLEEGITVIENIYKNREPVRQMKALYFISPTPKSVDCFLRDFGSKSEKKYKAAYIYFTDFCPDSLFNKIKASCSKSIRRCKEINISFIPQESQVYTLDVPDAFYYCYSPDPSNASRKEVVMEAMAEQIVTVCATLDENPGVRYKSKPLDNASKLAQLVEKKLEDYYKIDEKGLIKGKTQSQLLIIDRGFDPVSTVLHELTFQAMAYDLLPIENDTYKYKTDGKEKEAVLEEDDDLWVRVRHRHIAVVLEEIPKLMKEISSTKKATEGKTSLSALTQLMKKMPHFRKQISKQVVHLNLAEDCMNKFKLNIEKLCKTEQDLALGTDAEGQRVKDSMLVLLPVLLNKNHDNCDKIRAVLLYIFGINGTTEENLDRLIHNVKIEDDSDMIRNWSHLGVPIVPPSQQAKPLRKDRSAEETFQLSRWTPFIKDIMEDAIDNRLDSKEWPYCSRCPAVWNGSGAVSARQKPRTNYLELDRKNGSRLIIFVIGGITYSEMRCAYEVSQAHKSCEVIIGSTHILTPRKLLDDIKMLNKSKDKVSFKDE>P70452MRDRTHELRQGDNISDDEDEVRVALVVHSGAARLGSPDDEFFQKVQTIRQTMAKLESKVRELEKQQVTILATPLPEESMKQGLQNLREEIKQLGREVRAQLKAIEPQKEEADENYNSVNTRMKKTQHGVLSQQFVELINKCNSMQSEYREKNVERIRRQLKITNAGMVSDEELEQMLDSGQSEVFVSNILKDTQVTRQALNEISARHSEIQQLERSIRELHEIFTFLATEVEMQGEMINRIEKNILSSADYVERGQEHVKIALENQKKARKKKVMIAICVSVTVLILAVIIGITITVG>P63044MSATAATVPPAAPAGEGGPPAPPPNLTSNRRLQQTQAQVDEVVDIMRVNVDKVLERDQKLSELDDRADALQAGASQFETSAAKLKRKYWWKNLKMMIILGVICAIILIIIIVYFST2. Obtain the pdb file in RCSB according to the PDB ID
Replace the URL of the first piece of code with:
url = 'http://www.rcsb.org/pdb/files/'+i+'.pdb'PS: I have been learning about dssp processing recently, but there has been no progress, and no friends have Linux installation packages and tutorials
************************
Full of dry goods said that the quality of my article is too low......, let me submit it and see if the number of words is enough
边栏推荐
猜你喜欢
随机推荐
启明云端分享|盘点ESP8684开发板有哪些功能
C语言实现-直接插入排序(带图详解)
30-day question brushing plan (5)
Pytorch模型训练实用教程学习笔记:二、模型的构建
Creo5.0草绘如何绘制正六边形
我的驾照考试笔记(1)
为什么限制了Oracle的SGA和PGA,OS仍然会用到SWAP?
Redis 做签到统计
洛谷 P2440 木材加工
mysql解压版简洁式本地配置方式
Oracle排序某个字段, 如果这个varchar2类型的字段有数字也有文字 , 怎么按照数字大小排序?
即时通讯开发移动端弱网络优化方法总结
ThreadLocal讲义
17. Load balancing
【nn.Parameter()】生成和为什么要初始化
XSS range intermediate bypass
OSPO 五阶段成熟度模型解析
easyUI中datagrid中的formatter里面向后台发送请求获取数据
Combining two ordered arrays
第59章 ApplicationPart内置依赖注入中间件








