当前位置:网站首页>Batch get protein .pdb files based on Uniprot ID/PDB ID
Batch get protein .pdb files based on Uniprot ID/PDB ID
2022-08-01 20:00:00 【Paddler Lee】
1. Obtain the protein .pdb file according to the Uniprot ID batch process
Since Uniprot's ID number may correspond to multiple NCBI IDs, but its unique PDB file can be obtained according to Alphafold, so the .pdb file is obtained by code batch processing as follows:
import pandas as pdimport numpy as npfrom Bio import SeqIOfrom Bio import PDBimport requests# But there may be an InsecureRequestWarning warning,# Although it does not affect code collection, but it looks uncomfortable, you can add the following two lines:import urllib3urllib3.disable_warnings()headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0'}def read_file(file_name):pro_swissProt = []with open(file_name, 'r') as fp:for line in fp:if line.startswith('>'):#Function: Determine whether the string starts with the specified character or substringpro_swissProt.append(line[1:-1])return pro_swissProtfile1 = '/AD/all1.csv'ID=read_file(file1)j = 0not_exist_list = []for i in ID:j = j + 1print(j)print(i)url = 'https://alphafold.ebi.ac.uk/files/AF-'+i+'-F1-model_v1'+'.pdb'print(url)r = requests.get(url, headers=headers, verify=False)with open('/AD/Information/PDB/'+i+'.pdb','w') as files:r = r.text.splitlines() #np.array(pssm).tolist()for lines in r:files.write(lines)files.write('\n')if r[0][1]=='?':print(i + 'No pdb file.')not_exist_list.append(i)#Output the .pdb file of the protein not found, these can be checked manually in the URL, there are omissionsprint(not_exist_list)print(len(not_exist_list))
The format of file1 is as follows:
>Q8BH75MGYDVTRFQGDVDEDLICPICSGVLEEPVQAPHCEHAFCNACITQWFSQQQTCPVDRSVVTVAHLRPVPRIMRNMLSKLQIACDNAVFGCSAVVRLDNLMSHLSDCEHNPKRPVTCEQGCGLEMPKDELPNHNCIKHLRSVVQQQQSRIAELEKTSAEHKHQLAEQKRDIQLLKAYMRAIRSVNPNLQNLEETIEYNEILEWVNSLQPARVTRWGGMISTPDAVLQAVIKRSLVESGCPASIVNELIENAHERSWPQGLATLETRQMNRRYYENYVAKRIPGKQAVVVMACENQHMGDDMVQEPGLVMIFAHGVEEI>P06727MFLKAVVLTLALVAVAGARAEVSADQVATVMWDYFSQLSNNAKEAVEHLQKSELTQQLNALFQDKLGEVNTYAGDLQKKLVPFATELHERLAKDSEKLKEEIGKELEELRARLLPHANEVSQKIGDNLRELQQRLEPYADQLRTQVSTQAEQLRRQLTPYAQRMERVLRENADSLQASLRPHADELKAKIDQNVEELKGRLTPYADEFKVKIDQTVEELRRSLAPYAQDTQEKLNHQLEGLTFQMKKNAEELKARISASAEELRQRLAPLAEDVRGNLRGNTEGLQKSLAELGGHLDQQVEEFRRRVEPYGENFNKALVQQMEQLRQKLGPHAGDVEGHLSFLEKDLRDKVNSFFSTFKEKESQDKTLSLPELEQQQEQQQEQQQEQVQMLAPLES>Q60770MAPPVSERGLKSVVWRKIKTAVFDDCRKEGEWKIMLLDEFTTKLLSSCCKMTDLLEEGITVIENIYKNREPVRQMKALYFISPTPKSVDCFLRDFGSKSEKKYKAAYIYFTDFCPDSLFNKIKASCSKSIRRCKEINISFIPQESQVYTLDVPDAFYYCYSPDPSNASRKEVVMEAMAEQIVTVCATLDENPGVRYKSKPLDNASKLAQLVEKKLEDYYKIDEKGLIKGKTQSQLLIIDRGFDPVSTVLHELTFQAMAYDLLPIENDTYKYKTDGKEKEAVLEEDDDLWVRVRHRHIAVVLEEIPKLMKEISSTKKATEGKTSLSALTQLMKKMPHFRKQISKQVVHLNLAEDCMNKFKLNIEKLCKTEQDLALGTDAEGQRVKDSMLVLLPVLLNKNHDNCDKIRAVLLYIFGINGTTEENLDRLIHNVKIEDDSDMIRNWSHLGVPIVPPSQQAKPLRKDRSAEETFQLSRWTPFIKDIMEDAIDNRLDSKEWPYCSRCPAVWNGSGAVSARQKPRTNYLELDRKNGSRLIIFVIGGITYSEMRCAYEVSQAHKSCEVIIGSTHILTPRKLLDDIKMLNKSKDKVSFKDE>P70452MRDRTHELRQGDNISDDEDEVRVALVVHSGAARLGSPDDEFFQKVQTIRQTMAKLESKVRELEKQQVTILATPLPEESMKQGLQNLREEIKQLGREVRAQLKAIEPQKEEADENYNSVNTRMKKTQHGVLSQQFVELINKCNSMQSEYREKNVERIRRQLKITNAGMVSDEELEQMLDSGQSEVFVSNILKDTQVTRQALNEISARHSEIQQLERSIRELHEIFTFLATEVEMQGEMINRIEKNILSSADYVERGQEHVKIALENQKKARKKKVMIAICVSVTVLILAVIIGITITVG>P63044MSATAATVPPAAPAGEGGPPAPPPNLTSNRRLQQTQAQVDEVVDIMRVNVDKVLERDQKLSELDDRADALQAGASQFETSAAKLKRKYWWKNLKMMIILGVICAIILIIIIVYFST
2. Obtain the pdb file in RCSB according to the PDB ID
Replace the URL of the first piece of code with:
url = 'http://www.rcsb.org/pdb/files/'+i+'.pdb'
PS: I have been learning about dssp processing recently, but there has been no progress, and no friends have Linux installation packages and tutorials
************************
Full of dry goods said that the quality of my article is too low......, let me submit it and see if the number of words is enough
边栏推荐
猜你喜欢
随机推荐
【kali-信息收集】(1.2)SNMP枚举:Snmpwalk、Snmpcheck;SMTP枚举:smtp-user-enum
1个小时!从零制作一个! AI图片识别WEB应用!
因斯布鲁克大学团队量子计算硬件突破了二进制
环境变量,进程地址空间
CMake教程——Leeds_Garden
easyUI中datagrid中的formatter里面向后台发送请求获取数据
Arthas 常用命令
启明云端分享|盘点ESP8684开发板有哪些功能
不同的操作加不同的锁详解
数值矩阵的图形表示
八百客、销售易、纷享销客各行其道
Risc-v Process Attack
【多任务优化】DWA、DTP、Gradnorm(CVPR 2019、ECCV 2018、 ICML 2018)
面试突击70:什么是粘包和半包?怎么解决?
Acrel-5010重点用能单位能耗在线监测系统在湖南三立集团的应用
第55章 业务逻辑之订单、支付实体定义
【kali-信息收集】(1.6)服务的指纹识别:Nmap、Amap
Pytorch模型训练实用教程学习笔记:一、数据加载和transforms方法总结
deploy zabbix
我的驾照考试笔记(3)