当前位置:网站首页>Batch get protein .pdb files based on Uniprot ID/PDB ID
Batch get protein .pdb files based on Uniprot ID/PDB ID
2022-08-01 20:00:00 【Paddler Lee】
1. Obtain the protein .pdb file according to the Uniprot ID batch process
Since Uniprot's ID number may correspond to multiple NCBI IDs, but its unique PDB file can be obtained according to Alphafold, so the .pdb file is obtained by code batch processing as follows:
import pandas as pdimport numpy as npfrom Bio import SeqIOfrom Bio import PDBimport requests# But there may be an InsecureRequestWarning warning,# Although it does not affect code collection, but it looks uncomfortable, you can add the following two lines:import urllib3urllib3.disable_warnings()headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0'}def read_file(file_name):pro_swissProt = []with open(file_name, 'r') as fp:for line in fp:if line.startswith('>'):#Function: Determine whether the string starts with the specified character or substringpro_swissProt.append(line[1:-1])return pro_swissProtfile1 = '/AD/all1.csv'ID=read_file(file1)j = 0not_exist_list = []for i in ID:j = j + 1print(j)print(i)url = 'https://alphafold.ebi.ac.uk/files/AF-'+i+'-F1-model_v1'+'.pdb'print(url)r = requests.get(url, headers=headers, verify=False)with open('/AD/Information/PDB/'+i+'.pdb','w') as files:r = r.text.splitlines() #np.array(pssm).tolist()for lines in r:files.write(lines)files.write('\n')if r[0][1]=='?':print(i + 'No pdb file.')not_exist_list.append(i)#Output the .pdb file of the protein not found, these can be checked manually in the URL, there are omissionsprint(not_exist_list)print(len(not_exist_list))The format of file1 is as follows:
>Q8BH75MGYDVTRFQGDVDEDLICPICSGVLEEPVQAPHCEHAFCNACITQWFSQQQTCPVDRSVVTVAHLRPVPRIMRNMLSKLQIACDNAVFGCSAVVRLDNLMSHLSDCEHNPKRPVTCEQGCGLEMPKDELPNHNCIKHLRSVVQQQQSRIAELEKTSAEHKHQLAEQKRDIQLLKAYMRAIRSVNPNLQNLEETIEYNEILEWVNSLQPARVTRWGGMISTPDAVLQAVIKRSLVESGCPASIVNELIENAHERSWPQGLATLETRQMNRRYYENYVAKRIPGKQAVVVMACENQHMGDDMVQEPGLVMIFAHGVEEI>P06727MFLKAVVLTLALVAVAGARAEVSADQVATVMWDYFSQLSNNAKEAVEHLQKSELTQQLNALFQDKLGEVNTYAGDLQKKLVPFATELHERLAKDSEKLKEEIGKELEELRARLLPHANEVSQKIGDNLRELQQRLEPYADQLRTQVSTQAEQLRRQLTPYAQRMERVLRENADSLQASLRPHADELKAKIDQNVEELKGRLTPYADEFKVKIDQTVEELRRSLAPYAQDTQEKLNHQLEGLTFQMKKNAEELKARISASAEELRQRLAPLAEDVRGNLRGNTEGLQKSLAELGGHLDQQVEEFRRRVEPYGENFNKALVQQMEQLRQKLGPHAGDVEGHLSFLEKDLRDKVNSFFSTFKEKESQDKTLSLPELEQQQEQQQEQQQEQVQMLAPLES>Q60770MAPPVSERGLKSVVWRKIKTAVFDDCRKEGEWKIMLLDEFTTKLLSSCCKMTDLLEEGITVIENIYKNREPVRQMKALYFISPTPKSVDCFLRDFGSKSEKKYKAAYIYFTDFCPDSLFNKIKASCSKSIRRCKEINISFIPQESQVYTLDVPDAFYYCYSPDPSNASRKEVVMEAMAEQIVTVCATLDENPGVRYKSKPLDNASKLAQLVEKKLEDYYKIDEKGLIKGKTQSQLLIIDRGFDPVSTVLHELTFQAMAYDLLPIENDTYKYKTDGKEKEAVLEEDDDLWVRVRHRHIAVVLEEIPKLMKEISSTKKATEGKTSLSALTQLMKKMPHFRKQISKQVVHLNLAEDCMNKFKLNIEKLCKTEQDLALGTDAEGQRVKDSMLVLLPVLLNKNHDNCDKIRAVLLYIFGINGTTEENLDRLIHNVKIEDDSDMIRNWSHLGVPIVPPSQQAKPLRKDRSAEETFQLSRWTPFIKDIMEDAIDNRLDSKEWPYCSRCPAVWNGSGAVSARQKPRTNYLELDRKNGSRLIIFVIGGITYSEMRCAYEVSQAHKSCEVIIGSTHILTPRKLLDDIKMLNKSKDKVSFKDE>P70452MRDRTHELRQGDNISDDEDEVRVALVVHSGAARLGSPDDEFFQKVQTIRQTMAKLESKVRELEKQQVTILATPLPEESMKQGLQNLREEIKQLGREVRAQLKAIEPQKEEADENYNSVNTRMKKTQHGVLSQQFVELINKCNSMQSEYREKNVERIRRQLKITNAGMVSDEELEQMLDSGQSEVFVSNILKDTQVTRQALNEISARHSEIQQLERSIRELHEIFTFLATEVEMQGEMINRIEKNILSSADYVERGQEHVKIALENQKKARKKKVMIAICVSVTVLILAVIIGITITVG>P63044MSATAATVPPAAPAGEGGPPAPPPNLTSNRRLQQTQAQVDEVVDIMRVNVDKVLERDQKLSELDDRADALQAGASQFETSAAKLKRKYWWKNLKMMIILGVICAIILIIIIVYFST2. Obtain the pdb file in RCSB according to the PDB ID
Replace the URL of the first piece of code with:
url = 'http://www.rcsb.org/pdb/files/'+i+'.pdb'PS: I have been learning about dssp processing recently, but there has been no progress, and no friends have Linux installation packages and tutorials
************************
Full of dry goods said that the quality of my article is too low......, let me submit it and see if the number of words is enough
边栏推荐
- 58:第五章:开发admin管理服务:11:开发【管理员人脸登录,接口】;(未实测)(使用了阿里AI人脸识别)(演示了,使用RestTemplate实现接口调用接口;)
- Pytorch模型训练实用教程学习笔记:一、数据加载和transforms方法总结
- SIPp 安装及使用
- 17. Load balancing
- Oracle排序某个字段, 如果这个varchar2类型的字段有数字也有文字 , 怎么按照数字大小排序?
- SQL的 ISNULL 函数
- XSS range intermediate bypass
- latex论文神器--服务器部署overleaf
- myid file is missing
- 【torch】张量乘法:matmul,einsum
猜你喜欢

【nn.Parameter()】生成和为什么要初始化

泰德制药董事长郑翔玲荣膺“2022卓越影响力企业家奖” 泰德制药荣获“企业社会责任典范奖”

Combining two ordered arrays
![57: Chapter 5: Develop admin management services: 10: Develop [get files from MongoDB's GridFS, interface]; (from GridFS, get the SOP of files) (Do not use MongoDB's service, you can exclude its autom](/img/e1/2fa8dcc9c246bbbc2494326a83cda1.png)
57: Chapter 5: Develop admin management services: 10: Develop [get files from MongoDB's GridFS, interface]; (from GridFS, get the SOP of files) (Do not use MongoDB's service, you can exclude its autom

regular expression

第56章 业务逻辑之物流/配送实体定义

【多任务学习】Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts KDD18

Does LabVIEW really close the COM port using VISA Close?

【个人作品】记之-串口日志记录工具

卷积神经网络(CNN)mnist数字识别-Tensorflow
随机推荐
easyUI中datagrid中的formatter里面向后台发送请求获取数据
Win10, the middle mouse button cannot zoom in and out in proe/creo
Determine a binary tree given inorder traversal and another traversal method
多线程之生产者与消费者
密码学的基础:X.690和对应的BER CER DER编码
An implementation of an ordered doubly linked list.
数据库系统原理与应用教程(070)—— MySQL 练习题:操作题 101-109(十四):查询条件练习
【多任务模型】Progressive Layered Extraction: A Novel Multi-Task Learning Model for Personalized(RecSys‘20)
专利检索常用的网站有哪些?
作为程序员你应该会的软件
终于有人把AB实验讲明白了
Does LabVIEW really close the COM port using VISA Close?
GEE(8):使用MODIS填补由去云后的Landsat影像计算得到的NDVI数据
MySQL你到底都加了什么锁?
小数据如何学习?吉大最新《小数据学习》综述,26页pdf涵盖269页文献阐述小数据学习理论、方法与应用
LabVIEW 使用VISA Close真的关闭COM口了吗
使用微信公众号给指定微信用户发送信息
不恰当Equatable协议==方法的实现对SwiftUI中@State修饰属性的影响
安装win32gui失败,解决问题
[Multi-task optimization] DWA, DTP, Gradnorm (CVPR 2019, ECCV 2018, ICML 2018)