当前位置:网站首页>Rdkit II: use rdkit screening to screen 2D pharmacophores of chemical small molecules
Rdkit II: use rdkit screening to screen 2D pharmacophores of chemical small molecules
2022-07-29 03:24:00 【Order anything】
First, introduce the background of pharmacophore screening :
There are two main ways of computer-aided drug design :
1. Receptor based drug design ;
2. Ligand based drug design ;
Because the crystal structure of a large number of proteins is still unknown , Especially membrane proteins , Membrane proteins and their hydrophobic properties make it difficult to purify and crystallize . Unknown targets for crystal structures , When there are many ligands with similar structures , Pharmacophore based drug design methods can be used .
Next, let's introduce what is Pharmacophores , About pharmacophore , This is what the book says : There must be specific binding sites with drugs in the target of finished drugs . Compounds that exert activity on a target must have similarities in structural characteristics . The most common common characteristic of these compounds is defined as pharmacophore .IUPAC Define pharmacophore as “ Ensure the best interaction with specific biological targets and trigger their biological reactions ” A collection of required spatial and electronic features .
RDkit It is a toolkit developed by Novartis to deal with chemical informatics problems , Bottom layer C++ To write , It integrates most chemical informatics processing methods and tools .
rdkit Pharmacophore information in is presented in the form of molecular fingerprints , With SMART In the form of coding , For details, please refer to this document :
https://pubs.acs.org/doi/abs/10.1021/ci7003253
https://pubs.acs.org/doi/abs/10.1021/ci7003253 The molecular structure information is stored in the form of a triple , Include all possible combinations :( Number of participating atoms point, Type of atom patterns, distance distance bins) All are one (bit), Form a pharmacophore fingerprint with a fixed length (pharmacophore fingerprints).
notes : The same atom can be assigned to several atomic types .

# The whole idea
# utilize rdkit Search for pharmacophores in two dimensions , It can be divided into two levels :
# 1. The first level : Generate pharmacophore fingerprint code of the target molecule ( It includes pharmacophore and distance )
# 2. The second level : Generate pharmacophore fingerprint codes for the molecules to be screened , Calculate the similarity with the target molecule , Set the threshold to select the appropriate molecule
# The code is as follows :
# Import required packages
from rdkit import Chem
from rdkit.Chem import ChemicalFeatures
from rdkit import RDConfig
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from rdkit.Chem.Pharm2D.SigFactory import SigFactory
from rdkit.Chem.Pharm2D import Generate, Gobbi_Pharm2D
# The following code sections are familiar BaseFeatures.fdef
# Read and be familiar with rdkit Built in pharmacophore file
fdefName = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')
# Instantiate feature factory
factory = ChemicalFeatures.BuildFeatureFactory(fdefName)
# 27
print(f'fdef.GetNumFeatureDefs():{fdef.GetNumFeatureDefs()}')
# Number of functional families 8
print(f'len(fdef.GetFeatureFamilies()):{len(fdef.GetFeatureFamilies())}')
# Functional family categories
print(f'fdef.GetFeatureFamilies():{fdef.GetFeatureFamilies()}')
# type ,dict In a way
print(f'fdef.GetFeatureDefs().keys():{fdef.GetFeatureDefs().keys()}')
# Create pharmacophore category DataFrame
import pandas as pd
family_df = pd.DataFrame(columns=['family', 'definition'])
family_names = fdefname.GetFeatureFamilies()
for k,v in fdefname.GetFeatureDefs().items():
for fam in family_names:
if fam in k:
family_df.loc[k] = [fam, v]
family_df
# Co inclusion 8 A pharmacophore , Among them, our molecules pay more attention to :Donor,Acceptor,Hydrophobe
# Check the number of subdivision classes in each family
family_df['family'].value_counts()
# Acceptor and Donor All belong to one category , There is no limit to ,Hyropphone There are two types in
# Check the specific scope process in the dictionary
for k, v in fdef.GetFeatureDefs().items():
print(k,':',v)
# The coding type of each pharmacophore is :SMART Formal
'''------------------------------- The next part is the formal process -----------------------------'''
''' The first level : Generate pharmacophore fingerprint code of the target molecule '''
# Read in the target molecule
from rdkit.Chem.Pharm2D import Generate
mol1 = Chem.MolFromSmiles('NC(NCCC[[email protected]](N)C(NCC(N[[email protected]@H](CC(O)=O)C(N[[email protected]@H](CC(C)C)C(O)=O)=O)=O)=O)=N')
Draw.MolToImageFile(mol1,"/Users/lenovo/RGDXXL.jpg")
# Use the feature factory to search for features
feats = factory.GetFeaturesForMol(mol1)
print(len(feats))
# Each feature searched contains a modified feature family ( For example, receptors 、 Donor, etc ) Feature category 、 The atom corresponding to this feature 、 Serial number corresponding to the feature
for f in feats:
print(
f.GetFamily(), # Feature family information
f.GetType(), # Feature type information
f.GetAtomIds() # Features correspond to atoms
)
# Use feature factory to build fingerprint factory signature,factory Used to set fingerprint parameters
# Build fingerprint factory :
SigFactory(
factory, # Feature factory
useCounts = False, # Default False.False Regardless of fingerprint frequency , And generate SparseBitVect
minPointCount = 2, # The default is 2. The minimum number of pharmacophores included when generating fingerprints .
maxPointCount = 3, # The default is 3. The maximum number of pharmacophores included when generating fingerprints .
)
sigFactory=SigFactory(factory, minPointCount = 2, maxPointCount = 3)
# Segment the topological distance
sigFactory.SetBins([(0, 2), (2, 5), (5, 8)])
# Remove several groups that do not need to be investigated
sigFactory.skipFeats=['PosIonizable','NegIonizable','ZnBinder','LumpedHydrophobe','Aromatic']
# After each parameter modification , All need to be initialized
sigFactory.Init()
# Check the fingerprint length
sigFactory.GetSigSize()
# Generate molecular fingerprints and view
fps = Generate.Gen2DFingerprint(mol1,sigFactory)
print(f'len(fps):{len(fps)}')
print(f'fps.GetNumOnBits():{fps.GetNumOnBits()}')
# Information about the characteristics represented by each fingerprint 、 Distance matrix of features and other information , Both can pass signature factory Check it out.
print(list(fp.GetOnBits()))
print(sigFactory.GetBitDescription(1))
''' The second level : Generate pharmacophore fingerprint codes for the molecules to be screened , Calculate the similarity with the target molecule , Set the threshold to select the appropriate molecule '''
# Write a similarity detection function
def similarityMeasure(fps,mol):
# print(f'first')
fps2 = Generate.Gen2DFingerprint(mol,sigFactory)
# print(f'second')
similarityPos = DataStructs.FingerprintSimilarity(fps,fps2, metric=DataStructs.TanimotoSimilarity)
if similarityPos>=0.65:
print (mol2.GetProp('_Name'), Chem.MolToSmiles(mol2), similarityPos)
return similarityPos
# Read in the data
suppl = pd.read_excel('generate_molecules.xlsx',header=None)
suppl_list = suppl[0].tolist()
suppl_end = [Chem.MolFromSmiles(x) for x in suppl_list]
# Set threshold to filter molecules
pos_x=[]
entry = []
i = 0
for mol in suppl_end:
# print(f'mol:{mol}')
i += 1
pos = similarityMeasure(fps, mol)
print(f'i:{i},pos:{pos}')
if pos>=0.5:
pos_x.append(pos)
entry.append(i)
print(pos_x,entry)
Sometimes , We need to expand the definition of pharmacophore fingerprints , Face specific problems , You may want to customize specific pharmacophore strategies . Here you need to read SMARTS Grammar and fdef The syntax of the document :
Daylight Theory: SMARTS - A Language for Describing Molecular Patterns
The RDKit Book — The RDKit 2022.03.1 documentationj
Here is a blog to deal with this kind of problem :
边栏推荐
- Learn more than 4000 words, understand the problem of this pointing in JS, and handwrite to realize call, apply and bind
- 美联储再加息,75基点 鲍威尔“放鸽”,美股狂欢
- How close can QA be to business code QA conducts testability transformation on business code
- Rongyun IM & RTC capabilities on new sites
- [freeswitch development practice] media bug obtains call voice flow
- three.js 第五十四用如何给shader传递结构体数组
- VISO fast rendering convolution block
- [technology 1]
- How to deploy sentinel cluster of redis
- Mathematical modeling -- analytic hierarchy process model
猜你喜欢

Arm architecture and neural network
![Leetcode 1331 array sequence number conversion [map] the leetcode path of heroding](/img/be/d429d0c437dc5ed7cb4448e223a83a.png)
Leetcode 1331 array sequence number conversion [map] the leetcode path of heroding

美联储再加息,75基点 鲍威尔“放鸽”,美股狂欢

带你来浅聊一下,单商户功能模块汇总

STC MCU drive 1.8 'TFT SPI screen demonstration example (including data package)

AI platform, AI midrange architecture

MySQL流程控制之while、repeat、loop循环实例分析

逐步分析类的拆分之案例——五彩斑斓的小球碰撞

Summarize the knowledge points of the ten JVM modules. If you don't believe it, you still don't understand it

复现20字符短域名绕过以及xss相关知识点
随机推荐
1.5 nn. Module neural network (III)
ROS-Errror:Did you forget to specify generate_ messages(DEPENDENCIES ...)?
Score addition and subtraction of force deduction and brushing questions (one question per day 7/27)
复现20字符短域名绕过以及xss相关知识点
Unity game special effects
Web uploader cannot upload multiple files
[freeswitch development practice] unimrcp compilation and installation
Anti vulnerability · benefit from uncertainty --- management?
3D advanced renderer: artlandis studio 2021.2 Chinese version
Principle knowledge is useful
正则表达绕过waf
SAP 中国本地化内容汇总
Matlab learning -- structured programs and user-defined functions
Introduction and advanced level of MySQL (12)
mysql的timestamp存在的时区问题怎么解决
How to realize multi line annotation in MATLAB
「PHP基础知识」输出圆周率的近似值
Producer consumer model of concurrent model
反脆弱·从不确定性中获益---管理?
How to deploy sentinel cluster of redis