当前位置:网站首页>Rdkit | compound library based on murcko skeleton clustering

Rdkit | compound library based on murcko skeleton clustering

2022-06-21 07:33:00 Dazed flounder

RDKit | be based on Murcko Skeleton clustering compound library

Assessment of compound diversity
One approach is to vectorize compounds using appropriate fingerprint techniques and evaluate the distance between them . This method is often used , But it is difficult for humans to intuitively understand the distance between compounds .

be based on Murcko skeleton , The molecules are roughly clustered in the compound skeleton , And the top compounds in each cluster are all candidate compounds . This method is very consistent with human intuition , Therefore, it can be expected that people will automatically narrow the range of compounds by looking at the list of candidate compounds .

Murcko Skeleton generation

be based on Murcko Skeleton clustering compound library
Import library

import numpy as np
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Scaffolds import MurckoScaffold
from rdkit.Chem.Draw import IPythonConsole

Load data

sdfloader = Chem.SDMolSupplier("Enamine_Premium_processed.sdf")  ### load from multisdfile 
mol_list = [ mol for mol in [sdfloader[i] for i in range(len(sdfloader))] if mol is not None]
 
smi_scaffolds = [  MurckoScaffold.MurckoScaffoldSmiles(mol=mol, includeChirality=False) for mol in mol_list]
mol_scaffolds = [Chem.MolFromSmiles(smi_scaffold) for smi_scaffold in smi_scaffolds]

Visualize the original compound

Draw.MolsToGridImage(mol_list[:9], molsPerRow=3, subImgSize=(300,300))


Visualize skeleton compounds

Draw.MolsToGridImage(mol_scaffolds[:9], molsPerRow=3, subImgSize=(300,300))

be based on Murcko Skeleton clustering

scaffolds = {
    }
clusters_list =[]
 
 
idx = 1
for mol in mol_list:
    scaffold_smi =  MurckoScaffold.MurckoScaffoldSmiles(mol=mol, includeChirality=False)
    if scaffold_smi not in scaffolds.keys():
        scaffolds[scaffold_smi] = idx
        idx+=1
        
    cluster_id = scaffolds[scaffold_smi]
    clusters_list.append(cluster_id)
print("Num of dataset:",len(mol_list))

Num of dataset: 128816

print("Num of Murcko scaffolds in dataset:",len(scaffolds.keys()))

clustering 11 A cluster of , Look at its compounds

clusters_list = np.array(clusters_list)
idx_c15 = np.where(clusters_list==11)[0]
mol_list_c15 = [ mol_list[i] for i in idx_c15]
 
Draw.MolsToGridImage(mol_list_c15, molsPerRow=3, subImgSize=(300,300))

原网站

版权声明
本文为[Dazed flounder]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/172/202206210723401315.html