当前位置:网站首页>Batch get protein .pdb files based on Uniprot ID/PDB ID
Batch get protein .pdb files based on Uniprot ID/PDB ID
2022-08-01 20:00:00 【Paddler Lee】
1. Obtain the protein .pdb file according to the Uniprot ID batch process
Since Uniprot's ID number may correspond to multiple NCBI IDs, but its unique PDB file can be obtained according to Alphafold, so the .pdb file is obtained by code batch processing as follows:
import pandas as pdimport numpy as npfrom Bio import SeqIOfrom Bio import PDBimport requests# But there may be an InsecureRequestWarning warning,# Although it does not affect code collection, but it looks uncomfortable, you can add the following two lines:import urllib3urllib3.disable_warnings()headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0'}def read_file(file_name):pro_swissProt = []with open(file_name, 'r') as fp:for line in fp:if line.startswith('>'):#Function: Determine whether the string starts with the specified character or substringpro_swissProt.append(line[1:-1])return pro_swissProtfile1 = '/AD/all1.csv'ID=read_file(file1)j = 0not_exist_list = []for i in ID:j = j + 1print(j)print(i)url = 'https://alphafold.ebi.ac.uk/files/AF-'+i+'-F1-model_v1'+'.pdb'print(url)r = requests.get(url, headers=headers, verify=False)with open('/AD/Information/PDB/'+i+'.pdb','w') as files:r = r.text.splitlines() #np.array(pssm).tolist()for lines in r:files.write(lines)files.write('\n')if r[0][1]=='?':print(i + 'No pdb file.')not_exist_list.append(i)#Output the .pdb file of the protein not found, these can be checked manually in the URL, there are omissionsprint(not_exist_list)print(len(not_exist_list))The format of file1 is as follows:
>Q8BH75MGYDVTRFQGDVDEDLICPICSGVLEEPVQAPHCEHAFCNACITQWFSQQQTCPVDRSVVTVAHLRPVPRIMRNMLSKLQIACDNAVFGCSAVVRLDNLMSHLSDCEHNPKRPVTCEQGCGLEMPKDELPNHNCIKHLRSVVQQQQSRIAELEKTSAEHKHQLAEQKRDIQLLKAYMRAIRSVNPNLQNLEETIEYNEILEWVNSLQPARVTRWGGMISTPDAVLQAVIKRSLVESGCPASIVNELIENAHERSWPQGLATLETRQMNRRYYENYVAKRIPGKQAVVVMACENQHMGDDMVQEPGLVMIFAHGVEEI>P06727MFLKAVVLTLALVAVAGARAEVSADQVATVMWDYFSQLSNNAKEAVEHLQKSELTQQLNALFQDKLGEVNTYAGDLQKKLVPFATELHERLAKDSEKLKEEIGKELEELRARLLPHANEVSQKIGDNLRELQQRLEPYADQLRTQVSTQAEQLRRQLTPYAQRMERVLRENADSLQASLRPHADELKAKIDQNVEELKGRLTPYADEFKVKIDQTVEELRRSLAPYAQDTQEKLNHQLEGLTFQMKKNAEELKARISASAEELRQRLAPLAEDVRGNLRGNTEGLQKSLAELGGHLDQQVEEFRRRVEPYGENFNKALVQQMEQLRQKLGPHAGDVEGHLSFLEKDLRDKVNSFFSTFKEKESQDKTLSLPELEQQQEQQQEQQQEQVQMLAPLES>Q60770MAPPVSERGLKSVVWRKIKTAVFDDCRKEGEWKIMLLDEFTTKLLSSCCKMTDLLEEGITVIENIYKNREPVRQMKALYFISPTPKSVDCFLRDFGSKSEKKYKAAYIYFTDFCPDSLFNKIKASCSKSIRRCKEINISFIPQESQVYTLDVPDAFYYCYSPDPSNASRKEVVMEAMAEQIVTVCATLDENPGVRYKSKPLDNASKLAQLVEKKLEDYYKIDEKGLIKGKTQSQLLIIDRGFDPVSTVLHELTFQAMAYDLLPIENDTYKYKTDGKEKEAVLEEDDDLWVRVRHRHIAVVLEEIPKLMKEISSTKKATEGKTSLSALTQLMKKMPHFRKQISKQVVHLNLAEDCMNKFKLNIEKLCKTEQDLALGTDAEGQRVKDSMLVLLPVLLNKNHDNCDKIRAVLLYIFGINGTTEENLDRLIHNVKIEDDSDMIRNWSHLGVPIVPPSQQAKPLRKDRSAEETFQLSRWTPFIKDIMEDAIDNRLDSKEWPYCSRCPAVWNGSGAVSARQKPRTNYLELDRKNGSRLIIFVIGGITYSEMRCAYEVSQAHKSCEVIIGSTHILTPRKLLDDIKMLNKSKDKVSFKDE>P70452MRDRTHELRQGDNISDDEDEVRVALVVHSGAARLGSPDDEFFQKVQTIRQTMAKLESKVRELEKQQVTILATPLPEESMKQGLQNLREEIKQLGREVRAQLKAIEPQKEEADENYNSVNTRMKKTQHGVLSQQFVELINKCNSMQSEYREKNVERIRRQLKITNAGMVSDEELEQMLDSGQSEVFVSNILKDTQVTRQALNEISARHSEIQQLERSIRELHEIFTFLATEVEMQGEMINRIEKNILSSADYVERGQEHVKIALENQKKARKKKVMIAICVSVTVLILAVIIGITITVG>P63044MSATAATVPPAAPAGEGGPPAPPPNLTSNRRLQQTQAQVDEVVDIMRVNVDKVLERDQKLSELDDRADALQAGASQFETSAAKLKRKYWWKNLKMMIILGVICAIILIIIIVYFST2. Obtain the pdb file in RCSB according to the PDB ID
Replace the URL of the first piece of code with:
url = 'http://www.rcsb.org/pdb/files/'+i+'.pdb'PS: I have been learning about dssp processing recently, but there has been no progress, and no friends have Linux installation packages and tutorials
************************
Full of dry goods said that the quality of my article is too low......, let me submit it and see if the number of words is enough
边栏推荐
- 第60章 ApplicationPart自动集成整体性和独立性插件项
- Intranet penetration lanproxy deployment
- 18、分布式配置中心nacos
- 使用Huggingface在矩池云快速加载预训练模型和数据集
- From ordinary advanced to excellent test/development programmer, all the way through
- 网络不通?服务丢包?这篇 TCP 连接状态详解及故障排查,收好了~
- 图文详述Eureka的缓存机制/三级缓存
- The solution to the vtk volume rendering code error (the code can run in vtk7, 8, 9), and the VTK dataset website
- 数值矩阵的图形表示
- 【webrtc】sigslot : 继承has_slot 及相关流程和逻辑
猜你喜欢

【多任务模型】Progressive Layered Extraction: A Novel Multi-Task Learning Model for Personalized(RecSys‘20)

八百客、销售易、纷享销客各行其道

openresty 动态黑白名单

【个人作品】无线网络图传模块

nacos安装与配置

Intranet penetration lanproxy deployment

How PROE/Croe edits a completed sketch and brings it back to sketching state

Acrel-5010重点用能单位能耗在线监测系统在湖南三立集团的应用

解除360对默认浏览器的检测与修改

为你的“架构”安排定期体检吧!
随机推荐
regular expression
57:第五章:开发admin管理服务:10:开发【从MongoDB的GridFS中,获取文件,接口】;(从GridFS中,获取文件的SOP)(不使用MongoDB的服务,可以排除其自动加载类)
MongoDB快速上手
CMake教程——Leeds_Garden
Find the sum of two numbers
Gradle系列——Gradle文件操作,Gradle依赖(基于Gradle文档7.5)day3-1
我的驾照考试笔记(2)
The solution to the vtk volume rendering code error (the code can run in vtk7, 8, 9), and the VTK dataset website
解除360对默认浏览器的检测与修改
PROE/Croe如何编辑已完成的草图,让其再次进入草绘状态
myid file is missing
nacos安装与配置
为什么限制了Oracle的SGA和PGA,OS仍然会用到SWAP?
二维、三维、四维矩阵每个维度含义解释
突破边界,华为存储的破壁之旅
MySQL你到底都加了什么锁?
Pytorch模型训练实用教程学习笔记:三、损失函数汇总
多线程之生产者与消费者
Win10, the middle mouse button cannot zoom in and out in proe/creo
锐捷交换机基础配置