当前位置：网站首页>Rdkit II: use rdkit screening to screen 2D pharmacophores of chemical small molecules

Rdkit II: use rdkit screening to screen 2D pharmacophores of chemical small molecules

2022-07-29 03:24:00 【Order anything】

First, introduce the background of pharmacophore screening ：

There are two main ways of computer-aided drug design ：

1. Receptor based drug design ;

2. Ligand based drug design ;

Because the crystal structure of a large number of proteins is still unknown , Especially membrane proteins , Membrane proteins and their hydrophobic properties make it difficult to purify and crystallize . Unknown targets for crystal structures , When there are many ligands with similar structures , Pharmacophore based drug design methods can be used .

Next, let's introduce what is Pharmacophores , About pharmacophore , This is what the book says ： There must be specific binding sites with drugs in the target of finished drugs . Compounds that exert activity on a target must have similarities in structural characteristics . The most common common characteristic of these compounds is defined as pharmacophore .IUPAC Define pharmacophore as “ Ensure the best interaction with specific biological targets and trigger their biological reactions ” A collection of required spatial and electronic features .

RDkit It is a toolkit developed by Novartis to deal with chemical informatics problems , Bottom layer C++ To write , It integrates most chemical informatics processing methods and tools .

rdkit Pharmacophore information in is presented in the form of molecular fingerprints , With SMART In the form of coding , For details, please refer to this document ：

https://pubs.acs.org/doi/abs/10.1021/ci7003253https://pubs.acs.org/doi/abs/10.1021/ci7003253 The molecular structure information is stored in the form of a triple , Include all possible combinations ：（ Number of participating atoms point, Type of atom patterns, distance distance bins） All are one (bit), Form a pharmacophore fingerprint with a fixed length (pharmacophore fingerprints).

notes ： The same atom can be assigned to several atomic types .

#  The whole idea 
#  utilize rdkit Search for pharmacophores in two dimensions , It can be divided into two levels ：
# 1. The first level ： Generate pharmacophore fingerprint code of the target molecule ( It includes pharmacophore and distance )
# 2. The second level ： Generate pharmacophore fingerprint codes for the molecules to be screened , Calculate the similarity with the target molecule , Set the threshold to select the appropriate molecule 

#  The code is as follows ：

#  Import required packages 
from rdkit import Chem
from rdkit.Chem import ChemicalFeatures
from rdkit import RDConfig
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from rdkit.Chem.Pharm2D.SigFactory import SigFactory
from rdkit.Chem.Pharm2D import Generate, Gobbi_Pharm2D


#  The following code sections are familiar BaseFeatures.fdef
#  Read and be familiar with rdkit Built in pharmacophore file 
fdefName = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')
#  Instantiate feature factory 
factory = ChemicalFeatures.BuildFeatureFactory(fdefName)
# 27
print(f'fdef.GetNumFeatureDefs():{fdef.GetNumFeatureDefs()}')
#  Number of functional families  8
print(f'len(fdef.GetFeatureFamilies()):{len(fdef.GetFeatureFamilies())}')
#  Functional family categories 
print(f'fdef.GetFeatureFamilies():{fdef.GetFeatureFamilies()}')
#  type ,dict In a way 
print(f'fdef.GetFeatureDefs().keys():{fdef.GetFeatureDefs().keys()}')


#  Create pharmacophore category DataFrame
import pandas as pd
family_df = pd.DataFrame(columns=['family', 'definition']) 
family_names = fdefname.GetFeatureFamilies()
 
for k,v in fdefname.GetFeatureDefs().items():
    for fam in family_names:
        if fam in k:
            family_df.loc[k] = [fam, v]
 
family_df
#  Co inclusion 8 A pharmacophore , Among them, our molecules pay more attention to ：Donor,Acceptor,Hydrophobe


#  Check the number of subdivision classes in each family 
family_df['family'].value_counts()
# Acceptor and Donor All belong to one category , There is no limit to ,Hyropphone There are two types in 


#  Check the specific scope process in the dictionary 
for k, v in fdef.GetFeatureDefs().items():
    print(k,':',v)
#  The coding type of each pharmacophore is ：SMART Formal 

'''------------------------------- The next part is the formal process -----------------------------'''

''' The first level ： Generate pharmacophore fingerprint code of the target molecule '''

#  Read in the target molecule 
from rdkit.Chem.Pharm2D import Generate
mol1 = Chem.MolFromSmiles('NC(NCCC[[email protected]](N)C(NCC(N[[email protected]@H](CC(O)=O)C(N[[email protected]@H](CC(C)C)C(O)=O)=O)=O)=O)=N')
Draw.MolToImageFile(mol1,"/Users/lenovo/RGDXXL.jpg")

#  Use the feature factory to search for features 
feats = factory.GetFeaturesForMol(mol1)
print(len(feats))
#  Each feature searched contains a modified feature family （ For example, receptors 、 Donor, etc ） Feature category 、 The atom corresponding to this feature 、 Serial number corresponding to the feature 
for f in feats:
    print(
        f.GetFamily(),  #  Feature family information 
        f.GetType(),    #  Feature type information 
        f.GetAtomIds()  #  Features correspond to atoms 
    )


#  Use feature factory to build fingerprint factory signature,factory Used to set fingerprint parameters 
#  Build fingerprint factory  ：
SigFactory(
    factory,      #  Feature factory 
    useCounts = False,  #  Default False.False Regardless of fingerprint frequency , And generate SparseBitVect
    minPointCount = 2,  #  The default is 2. The minimum number of pharmacophores included when generating fingerprints .
    maxPointCount = 3,  #  The default is 3. The maximum number of pharmacophores included when generating fingerprints .
)
sigFactory=SigFactory(factory, minPointCount = 2, maxPointCount = 3)
#  Segment the topological distance 
sigFactory.SetBins([(0, 2), (2, 5), (5, 8)])
#  Remove several groups that do not need to be investigated 
sigFactory.skipFeats=['PosIonizable','NegIonizable','ZnBinder','LumpedHydrophobe','Aromatic']
#  After each parameter modification , All need to be initialized 
sigFactory.Init()
#  Check the fingerprint length 
sigFactory.GetSigSize()

#  Generate molecular fingerprints and view 
fps = Generate.Gen2DFingerprint(mol1,sigFactory)
print(f'len(fps):{len(fps)}')
print(f'fps.GetNumOnBits():{fps.GetNumOnBits()}')
#  Information about the characteristics represented by each fingerprint 、 Distance matrix of features and other information , Both can pass signature factory Check it out. 
print(list(fp.GetOnBits()))  
print(sigFactory.GetBitDescription(1))


''' The second level ： Generate pharmacophore fingerprint codes for the molecules to be screened , Calculate the similarity with the target molecule , Set the threshold to select the appropriate molecule '''


#  Write a similarity detection function 
def similarityMeasure(fps,mol):
        
#     print(f'first')
    fps2 = Generate.Gen2DFingerprint(mol,sigFactory)
#     print(f'second')
    similarityPos = DataStructs.FingerprintSimilarity(fps,fps2, metric=DataStructs.TanimotoSimilarity)
    
    if similarityPos>=0.65:
            
        print (mol2.GetProp('_Name'), Chem.MolToSmiles(mol2), similarityPos)
    return similarityPos 

#  Read in the data 
suppl = pd.read_excel('generate_molecules.xlsx',header=None)
suppl_list = suppl[0].tolist() 
suppl_end = [Chem.MolFromSmiles(x) for x in suppl_list]


#  Set threshold to filter molecules 
pos_x=[]
entry = []
i = 0
for mol in suppl_end:
#     print(f'mol:{mol}')
    i += 1
    pos = similarityMeasure(fps, mol)
    print(f'i:{i},pos:{pos}')
    if pos>=0.5:
        pos_x.append(pos)
        entry.append(i)
print(pos_x,entry)

Sometimes , We need to expand the definition of pharmacophore fingerprints , Face specific problems , You may want to customize specific pharmacophore strategies . Here you need to read SMARTS Grammar and fdef The syntax of the document ：

Daylight Theory: SMARTS - A Language for Describing Molecular Patterns

The RDKit Book — The RDKit 2022.03.1 documentationj

Here is a blog to deal with this kind of problem ：

How to understand and define pharmacophore fingerprints (2D Pharmacophore Fingerprints)？—— With rdkit For example _ZOOEEER The blog of -CSDN Blog The conditions are basically familiar rdkit Use . The text concept pharmacophore fingerprint comprehensively considers chemical information and structural information , In principle, it is suitable for describing the interaction between molecules . Chemical information refers to the use of SMARTS Define a set of rules for specifying atomic types ; Structural information refers to passing 2D The shortest path between a pair of atoms on a molecular graph , For the selected atomic pair （ Binary ）、 A triple 、 Quaternions and so on introduce geometric information . Here's a picture (ref:10.1021/ci7003253) It vividly illustrates the coding scheme of triples . It is worth noting that , The implementation of this document emphasizes that an atom may be assigned to several atomic types at the same time , Therefore, the same atomic combination may contribute several fingerprints . the https://blog.csdn.net/qq_37364789/article/details/123357365