当前位置：网站首页>Rdkit I: using rdkit to screen the structural characteristics of chemical small molecules

Rdkit I: using rdkit to screen the structural characteristics of chemical small molecules

2022-07-29 03:25:00 【Order anything】

Recently, I am working on the project of small molecule screening , It involves some processing methods , Later, I will summarize some problems and solutions one by one .

First, a brief introduction RDkit,RDkit It is used to deal with small chemical molecules python Open source package , It was first developed by Novartis , The bottom is made up of C++ Compiling . stay github There is its source code and instructions on , The address is as follows ：

https://github.com/rdkit/rdkithttps://github.com/rdkit/rdkit RDkit stay Anaconda or miniconda Installation in environment ：

conda install -yq -c rdkit rdkit

Here's a brief introduction smiles code ,smiles Coding is essentially a kind of coding that uses strings to express the two-dimensional structure of small molecules , adopt rdkit package , Can be chemdraw in .mol The small molecule structural formula of the format , or sdf The spatial coordinates of the format are converted into smiles code , Thus, it is brought into the machine learning and deep learning models for learning .

Now let's get to the point ：

In onemillion small chemical molecules , Screening , Just stay NO2<2 individual ,Cl < 3 individual ,Br < 2 individual ,F < 6 individual , Number of aromatic rings < 5 Molecules of .

To solve this problem , Two... Are needed RDkit package , Here is only the simplest usage , If you are interested, you can see the package introduction or source code ：

1. rdkit.Chem.Lipinski

Lipinski( Lippings rule ) It is a common constraint rule of small molecule drugs , stay rdkit Of Lipinski The package contains the calculation of various parameters , Just know the small molecules smiles code , You can analyze its ,HeavyAtomCount,NumAromaticRings…… And so on , Some of the more common ones are :NumAromaticRings,NumHAcceptors,NumHDonors,NumRotatableBonds These kinds of .

2.rdkit.Chem

rdkit.Chem The package contains functions for operating small molecule objects , Including atomic operations , Key operation , Ring operation , Pharmacophore search and other functions .

Our main atomic operations here , Atomic operations involve functions including ：

Traverse the atom ：m.GetAtoms()
Get atomic index ：GetIdx()
Get atomic serial number ：GetAtomicNum()
Get atomic symbol ：GetSymbol()
Get the number of atomic connections （ suffer H Whether to hide the influence ）：GetDegree()
Get the total number of atomic connections （ And H Whether to hide or not is irrelevant ）：GetTotalDegree()
Get the atomic form charge ：GetFormalCharge()
Get the atomic hybridization method ：GetHybridization()
Get the atomic explicit valence ：GetExplicitValence()
Get the implicit valence of atoms ：GetImplicitValence()
Get the total valence of atoms ：GetTotalValence()

There is a Zhihu article to read , Write very well ：

RDKit| Molecular basis operation and pharmacophore search - You know List of articles Atomic operation key operation ring operation manual realization oxygen group pharmacophore search 1. Atomic operation in rdkit in , Every atom in a molecule is an object , You can get all kinds of information through the attributes and functions of atomic objects . Traverse the atom ：m.GetAtoms() Get atomic index ：GetIdx() a …https://zhuanlan.zhihu.com/p/143111689 Before you start coding , Let's introduce the format of input data , Because I found that for some contacts RDkit For the late old biochemist , It's better to write more carefully . The input data is molecular smiles code , as follows , All the molecules that need to be screened smiles Put the code in a csv or txt In the file , Column name is smiles：

No verbosity , Code up ：

import pandas as pd
from rdkit import Chem
from rdkit.Chem import Lipinski

#  Read in the data 
df = pd.read_csv('smiles.csv')

#  Number of screening aromatic rings ：aromatic rings < 5
df['NumAromaticRings'] = df['smiles'].apply(lambda x:Lipinski.NumAromaticRings(Chem.MolFromSmiles(x)))
#  It can also be parallel , use parallel_apply, According to your own needs 
df = df.drop(df[df.NumAromaticRings >= 5].index)

#  Screening F,Cl,Br And other elements 
m = [Chem.MolFromSmiles(x) for x in df.smiles.tolist()]
#  Get a list of fluorine numbers 
num_F = []
for i in range(len(m)):
    F = [atom.GetSymbol() for atom in m[i].GetAtoms()].count('F')
    num_F.append(F)

#  Get the list of chlorine content 
num_Cl = []
for i in range(len(m)):
    Cl = [atom.GetSymbol() for atom in m[i].GetAtoms()].count('Cl')
    num_Cl.append(Cl)

#  Get a list of Bromine Numbers 
num_Br = []
for i in range(len(m)):
    Br = [atom.GetSymbol() for atom in m[i].GetAtoms()].count('Br')
    num_Br.append(Br)

# F,Br,CL Quantity can be filtered 
print(f'max_F:{max(num_F)};max_Cl:{max(num_Cl)};max_Br:{max(num_Br)}')

#  Condition screening 
transform = {'num_F':num_F,'num_Cl':num_Cl,'num_Br':num_Br}
atom_nums = pd.DataFrame(transform)
df = pd.concat([df,atom_nums],axis=1)
print(df.info())
df = df.drop(df[df.num_F >= 6].index)
df = df.drop(df[df.num_Cl >= 3].index)
df = df.drop(df[df.num_Br >= 2].index)

#  Screening NO2 Groups , Here is based on the character string search 
dff = df[df['smiles'].str.contains(pat='[N+](=O)[O-]',regex=False)]
#  Get to include NO2 Of the group smiles code dataframe form 
smiles = dff.smiles.tolist()
NO2 = [xcount('[N+](=O)[O-]') for x in n]    
transform2 = {'smiles':smiles,'num_NO2':num_NO2}
FG_nums = pd.DataFrame(transform2)
#  With smiles by inner And df Difference set 
df2 = df.append(FG_nums)
df2 = df2.drop_duplicates(subset=['smiles'],keep=False)

#  Save the final result 
df2.to_csv('result.csv',header = True, index = False)

One article a week , Next week's notice RDkit Dealing with clustering

原网站

版权声明
本文为[Order anything]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/196/202207130553300312.html

当前位置：网站首页>Rdkit I: using rdkit to screen the structural characteristics of chemical small molecules

Rdkit I: using rdkit to screen the structural characteristics of chemical small molecules

边栏推荐

猜你喜欢

随机推荐