当前位置:网站首页>Rdkit II: use rdkit screening to screen 2D pharmacophores of chemical small molecules
Rdkit II: use rdkit screening to screen 2D pharmacophores of chemical small molecules
2022-07-29 03:24:00 【Order anything】
First, introduce the background of pharmacophore screening :
There are two main ways of computer-aided drug design :
1. Receptor based drug design ;
2. Ligand based drug design ;
Because the crystal structure of a large number of proteins is still unknown , Especially membrane proteins , Membrane proteins and their hydrophobic properties make it difficult to purify and crystallize . Unknown targets for crystal structures , When there are many ligands with similar structures , Pharmacophore based drug design methods can be used .
Next, let's introduce what is Pharmacophores , About pharmacophore , This is what the book says : There must be specific binding sites with drugs in the target of finished drugs . Compounds that exert activity on a target must have similarities in structural characteristics . The most common common characteristic of these compounds is defined as pharmacophore .IUPAC Define pharmacophore as “ Ensure the best interaction with specific biological targets and trigger their biological reactions ” A collection of required spatial and electronic features .
RDkit It is a toolkit developed by Novartis to deal with chemical informatics problems , Bottom layer C++ To write , It integrates most chemical informatics processing methods and tools .
rdkit Pharmacophore information in is presented in the form of molecular fingerprints , With SMART In the form of coding , For details, please refer to this document :
https://pubs.acs.org/doi/abs/10.1021/ci7003253https://pubs.acs.org/doi/abs/10.1021/ci7003253 The molecular structure information is stored in the form of a triple , Include all possible combinations :( Number of participating atoms point, Type of atom patterns, distance distance bins) All are one (bit), Form a pharmacophore fingerprint with a fixed length (pharmacophore fingerprints).
notes : The same atom can be assigned to several atomic types .
# The whole idea
# utilize rdkit Search for pharmacophores in two dimensions , It can be divided into two levels :
# 1. The first level : Generate pharmacophore fingerprint code of the target molecule ( It includes pharmacophore and distance )
# 2. The second level : Generate pharmacophore fingerprint codes for the molecules to be screened , Calculate the similarity with the target molecule , Set the threshold to select the appropriate molecule
# The code is as follows :
# Import required packages
from rdkit import Chem
from rdkit.Chem import ChemicalFeatures
from rdkit import RDConfig
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from rdkit.Chem.Pharm2D.SigFactory import SigFactory
from rdkit.Chem.Pharm2D import Generate, Gobbi_Pharm2D
# The following code sections are familiar BaseFeatures.fdef
# Read and be familiar with rdkit Built in pharmacophore file
fdefName = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')
# Instantiate feature factory
factory = ChemicalFeatures.BuildFeatureFactory(fdefName)
# 27
print(f'fdef.GetNumFeatureDefs():{fdef.GetNumFeatureDefs()}')
# Number of functional families 8
print(f'len(fdef.GetFeatureFamilies()):{len(fdef.GetFeatureFamilies())}')
# Functional family categories
print(f'fdef.GetFeatureFamilies():{fdef.GetFeatureFamilies()}')
# type ,dict In a way
print(f'fdef.GetFeatureDefs().keys():{fdef.GetFeatureDefs().keys()}')
# Create pharmacophore category DataFrame
import pandas as pd
family_df = pd.DataFrame(columns=['family', 'definition'])
family_names = fdefname.GetFeatureFamilies()
for k,v in fdefname.GetFeatureDefs().items():
for fam in family_names:
if fam in k:
family_df.loc[k] = [fam, v]
family_df
# Co inclusion 8 A pharmacophore , Among them, our molecules pay more attention to :Donor,Acceptor,Hydrophobe
# Check the number of subdivision classes in each family
family_df['family'].value_counts()
# Acceptor and Donor All belong to one category , There is no limit to ,Hyropphone There are two types in
# Check the specific scope process in the dictionary
for k, v in fdef.GetFeatureDefs().items():
print(k,':',v)
# The coding type of each pharmacophore is :SMART Formal
'''------------------------------- The next part is the formal process -----------------------------'''
''' The first level : Generate pharmacophore fingerprint code of the target molecule '''
# Read in the target molecule
from rdkit.Chem.Pharm2D import Generate
mol1 = Chem.MolFromSmiles('NC(NCCC[[email protected]](N)C(NCC(N[[email protected]@H](CC(O)=O)C(N[[email protected]@H](CC(C)C)C(O)=O)=O)=O)=O)=N')
Draw.MolToImageFile(mol1,"/Users/lenovo/RGDXXL.jpg")
# Use the feature factory to search for features
feats = factory.GetFeaturesForMol(mol1)
print(len(feats))
# Each feature searched contains a modified feature family ( For example, receptors 、 Donor, etc ) Feature category 、 The atom corresponding to this feature 、 Serial number corresponding to the feature
for f in feats:
print(
f.GetFamily(), # Feature family information
f.GetType(), # Feature type information
f.GetAtomIds() # Features correspond to atoms
)
# Use feature factory to build fingerprint factory signature,factory Used to set fingerprint parameters
# Build fingerprint factory :
SigFactory(
factory, # Feature factory
useCounts = False, # Default False.False Regardless of fingerprint frequency , And generate SparseBitVect
minPointCount = 2, # The default is 2. The minimum number of pharmacophores included when generating fingerprints .
maxPointCount = 3, # The default is 3. The maximum number of pharmacophores included when generating fingerprints .
)
sigFactory=SigFactory(factory, minPointCount = 2, maxPointCount = 3)
# Segment the topological distance
sigFactory.SetBins([(0, 2), (2, 5), (5, 8)])
# Remove several groups that do not need to be investigated
sigFactory.skipFeats=['PosIonizable','NegIonizable','ZnBinder','LumpedHydrophobe','Aromatic']
# After each parameter modification , All need to be initialized
sigFactory.Init()
# Check the fingerprint length
sigFactory.GetSigSize()
# Generate molecular fingerprints and view
fps = Generate.Gen2DFingerprint(mol1,sigFactory)
print(f'len(fps):{len(fps)}')
print(f'fps.GetNumOnBits():{fps.GetNumOnBits()}')
# Information about the characteristics represented by each fingerprint 、 Distance matrix of features and other information , Both can pass signature factory Check it out.
print(list(fp.GetOnBits()))
print(sigFactory.GetBitDescription(1))
''' The second level : Generate pharmacophore fingerprint codes for the molecules to be screened , Calculate the similarity with the target molecule , Set the threshold to select the appropriate molecule '''
# Write a similarity detection function
def similarityMeasure(fps,mol):
# print(f'first')
fps2 = Generate.Gen2DFingerprint(mol,sigFactory)
# print(f'second')
similarityPos = DataStructs.FingerprintSimilarity(fps,fps2, metric=DataStructs.TanimotoSimilarity)
if similarityPos>=0.65:
print (mol2.GetProp('_Name'), Chem.MolToSmiles(mol2), similarityPos)
return similarityPos
# Read in the data
suppl = pd.read_excel('generate_molecules.xlsx',header=None)
suppl_list = suppl[0].tolist()
suppl_end = [Chem.MolFromSmiles(x) for x in suppl_list]
# Set threshold to filter molecules
pos_x=[]
entry = []
i = 0
for mol in suppl_end:
# print(f'mol:{mol}')
i += 1
pos = similarityMeasure(fps, mol)
print(f'i:{i},pos:{pos}')
if pos>=0.5:
pos_x.append(pos)
entry.append(i)
print(pos_x,entry)
Sometimes , We need to expand the definition of pharmacophore fingerprints , Face specific problems , You may want to customize specific pharmacophore strategies . Here you need to read SMARTS Grammar and fdef The syntax of the document :
Daylight Theory: SMARTS - A Language for Describing Molecular Patterns
The RDKit Book — The RDKit 2022.03.1 documentationj
Here is a blog to deal with this kind of problem :
边栏推荐
- 军品三大基线(功能基线、分配基线、产品基线)及基线包含的文件
- 复现20字符短域名绕过以及xss相关知识点
- How close can QA be to business code Direct exposure of defects through codediff
- How to realize multi line annotation in MATLAB
- Introduction and advanced MySQL (XIV)
- Complexity analysis learning
- 机器学习【Numpy】
- C traps and defects Chapter 3 semantic "traps" 3.8 operators &, |, and!
- [technology 1]
- Plato Farm在Elephant Swap上铸造的ePLATO是什么?为何具备高溢价?
猜你喜欢
Configure vscade to realize ROS writing
Producer consumer model of concurrent model
MySQL installation and configuration super detailed tutorial and simple database and table building method
Singleton mode (hungry and lazy)
During the year, the first "three consecutive falls" of No. 95 gasoline returned to the "8 Yuan era"“
带你来浅聊一下,单商户功能模块汇总
【科技1】
Score addition and subtraction of force deduction and brushing questions (one question per day 7/27)
Sleuth+Zipkin 来进行分布式服务链路的追踪
国产ERP有没有机会击败SAP ?
随机推荐
Shortcut key for adjusting terminal size in ROS
Principle knowledge is useful
Tonight at 7:30 | is the AI world in the eyes of Lianjie, Jiangmen, Baidu and country garden venture capital continue to be advanced or return to the essence of business
LeetCode 1331 数组序号转换[Map] HERODING的LeetCode之路
Matlab learning - accumulation of small knowledge points
反脆弱·从不确定性中获益---管理?
AI platform, AI midrange architecture
GJB common confused concepts
exness:鸽派决议帮助黄金反弹,焦点转向美国GDP
mycat读写分离配置
Kubernetes-1.24.x feature
C traps and defects Chapter 3 semantic "traps" 3.7 evaluation order
Multi level wavelet CNN for image restoration
Digital image processing Chapter 10 - image segmentation
简历竟然敢写精通并发编程,那你说说AQS为什么要用双向链表?
MySQL流程控制之while、repeat、loop循环实例分析
shell脚本总结
Unity game special effects
原理知识用得上
3D高级渲染器:Artlantis studio 2021.2中文版