当前位置:网站首页>Rdkit | fragment decomposition of drug molecules

Rdkit | fragment decomposition of drug molecules

2022-06-21 07:33:00 Dazed flounder

rdkit | Drug molecules undergo fragment decomposition

Chemical informatics is applied in some drug research and development scenarios , Not just the whole drug molecule , Sometimes it is necessary to extract the so-called drug like fragments separately , Extract the commonness of drug like molecular fragments , For database construction or AI Training .
For example, aspirin , Can be broken down into benzene rings , carboxyl 、 Acetaldehyde and a single oxygen atom , A combination of four common drug like fragments .

The following code uses rdkit Of BRICS Algorithm ,BRICS Based on common reactions , Select the site where the fragment breaks the bond , It provides the feasibility in the sense of chemical synthesis .

Scheme 1

at present rdkit There is a more concise scheme one , Update as follows , Compared with scheme II, it is more concise :

from rdkit.Chem import BRICS
aspirin= Chem.MolFromSmiles('CC(=O)OC1=CC=CC=C1C(O)=O')
fragments=BRICS.BRICSDecompose(aspirin,allNodes=None, minFragmentSize=1, 
onlyUseReactions=None, silent=True, keepNonLeafNodes=False, singlePass=False, returnMols=False)
print (sorted['fragments'])
output: ['[1*]C(C)=O', '[16*]c1ccccc1[16*]', '[3*]O[3*]', '[6*]C(=O)O']


Arguments explain

  • allNodes It is necessary to specify the node molecules to be included , Relatively complex , Generally not used ;
  • minFragmentSize, Indicate the minimum number of heavy atoms that the smallest fragment must contain , In this example, it is defined as 2 when ,'[3*]O[3*]' This ether fragment will not be split , But with ‘[16*]c1ccccc1[16*]‘ The benzene rings of are combined to form ’[3*]Oc1ccccc1[16*]’;
  • onlyUseReactions, BRICS The resolution site is determined based on the way of reaction , Here you can define what reaction is used to split , Less used ;
  • silent, If you don't close , It will print information about what reaction is used to split ;
  • keepNonLeafNodes, Set to True when , It will return the middle large fragment that has not been completely split ;
  • singlePass, Set to True The result that the returned fragment contains only one fracture site at most , for example ‘[16*]c1ccccc1[16*]‘ The result will be ’[16*]c1ccccc1C(=O)O’ And ’[3*]OC=O’, Avoid the same fragment being broken by multiple reactions ;
  • returnMols, Set to True The fragment returned by is not SMILES In the form of , It is rdkit.Mol In the form of .

Option two

It is more complicated than scheme I , But you can learn and operate on more details ,

from rdkit import Chem
from rdkit.Chem import BRICS

def fragment_recursive(mol, frags):
    try:
        bonds = list(BRICS.FindBRICSBonds(mol))
        if len(bonds) == 0:
            frags.append(mol_to_smiles(mol))
            return frags
        idxs, labs = list(zip(*bonds))
        bond_idxs = []
        for a1, a2 in idxs:
            bond = mol.GetBondBetweenAtoms(a1, a2)
            bond_idxs.append(bond.GetIdx())
        order = np.argsort(bond_idxs).tolist()
        bond_idxs = [bond_idxs[i] for i in order]
        broken = Chem.FragmentOnBonds(mol,
                                      bondIndices=[bond_idxs[0]],
                                      dummyLabels=[(0, 0)])
        head, tail = Chem.GetMolFrags(broken, asMols=True)
        #print(mol_to_smiles(head), mol_to_smiles(tail))
        frags.append(mol_to_smiles(head))
        return fragment_recursive(tail, frags)
    except Exception as e:
        print (e)
        pass

aspirin= Chem.MolFromSmiles('CC(=O)OC1=CC=CC=C1C(O)=O')
fragments=fragment_recursive(aspirin, [])
print (fragments)

# > output: ['*C(C)=O', '*O*', '*c1ccccc1*', '*C(=O)O']

You can see , The output fragment retained the site where aspirin was cut off , Use the wildcard atomic symbol * Express , The visual effect is .

原网站

版权声明
本文为[Dazed flounder]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/172/202206210723401386.html