当前位置:网站首页>Interpretation of the paper: "i4mc deep: intelligent prediction of N4 methylcytosine sites using deep learning methods with chemical properties"
Interpretation of the paper: "i4mc deep: intelligent prediction of N4 methylcytosine sites using deep learning methods with chemical properties"
2022-07-23 12:22:00 【Windy Street】
i4mC-Deep: An Intelligent Predictor of N4-Methylcytosine Sites Using a Deep Learning Approach with Chemical Properties
The article links :https://www.mdpi.com/2073-4425/12/8/1117
DOI:https://doi.org/10.3390/genes12081117
Periodical :Genes( Three District )
Influencing factors :4.096
Release time :2021 year 7 month 23 Japan
The server :http://nsclbio.jbnu.ac.kr/tools/i4mC-Deep/
Supplementary documents : https://www.mdpi.com/article/10.3390/genes12081117/s1
Code and data :https://github.com/waleed551/i4mC-Deep
1. An overview of the article
DNA suffer N4- Methylcytosine (4mC) Epigenetic modification of molecules .N4- Methylcytosine in DNA Play an important role in repair and replication , Protect the host DNA Free from degradation , Adjust the DNA expression . The current experimental technology is expensive and laborious . Traditional machine learning methods rely on manually extracted features , But the new method saves time and computing costs by using learning features . In this study , We proposed i4mC-Deep, This is a convolutional neural network (CNN) Intelligent predictor , Predictable DNA In the sample 4mC Modification site . extract DNN Nucleotide chemical characteristics and nucleotide density characteristics of the sequence , As CNN Input data for . The results of the proposed method are better than several state-of-the-art predictors . use i4mC-Deep Methods analyze the underground ryegrass DNA, Compared with traditional prediction , Accuracy rate (ACC) Improved 3.9%,MCC Improved 10.5% .
2. background
lately , Some computing tools have been developed to identify 4mC site , Include iDNA4mC,4mCPred,4mCPred-SVM and SOMM4mC. All these tools are based on machine learning technology and handmade functions .iDNA4mC Using nucleotide chemical properties and nucleotide frequency as feature vectors, combined with support vector machine (SVM) To detect 4mC site .4mCPred and 4mCPred-SVM Support vector machine is also used , But there are different characteristics ,4mCPred Using two feature coding techniques , That is, position specific trinucleotide tendency (PSTNP) And the electron of trinucleotide - Ion interaction ;4mCPred-SVM Apply four features to 4mC Combined prediction of loci , namely K-mer Dinucleotide frequency 、 Single nucleotide binary coding 、 Dinucleotide binary coding and local position specific dinucleotide frequency .SOMM4mC The classical first-order and second-order Markov models are used to predict 4mC Epigenetic modification sites , And shows better performance than the other tools mentioned above . Besides ,4mCCNN and DeepTorrent It's a technology based on deep learning .4mCCNN use One-hot Encoded data representation and Convolutional Neural Networks .DeepTorrent Four methods with convolution and LSTM Layer feature extraction technology . Previous deep learning models used complex structures , Parameters and calculation amount are added .
In this study , The author uses a convolutional neural network (CNN) To develop an accurate and effective computing tool .CNN Include : Convolution layer (convolution)、 Batch normalization layer (batch normalization)、 Flattened layer (Flatten)、 Lost layer (Dropout) And full connection layer (Dense), Convolution layer is used to automatically extract encoded DNA Important features in sequences . The author uses the chemical properties of nucleotides (NCP) And nucleotide density (ND) Method code input DNA Sequence , Then use batch normalization and Dropout Control over fitting , Finally, the full connection layer will DNA The sequence is divided into 4mC Site and non - 4mC site . Use 10 Multiple cross validation techniques to evaluate i4mC-Deep,i4mC-Deep The results are better than previous tools .i4mC-Deep The structure of is as shown in the figure 1 Shown . The author also developed a free online web server .
2. data
Data sets play a very important role in developing efficient and reliable computing tools . The author makes use of 6 There are different kinds of prokaryotes and eukaryotes 、Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Escherichia coli, Geoalkalibacter subterraneus, and Geobacter pickeringii. The data of . These datasets are using MethSMRT Database building . The benchmark dataset includes 1554、1769、1978、388、906 and 569 Positive and negative samples . Each sequence in the six datasets has a central cytosine , The length is 41 Base .
3. Method
3.1 Feature code
- Chemical properties of nucleotides (nucleotide chemical properties,NCP)

- Nucleotide density (nucleotide density,ND)
DNA Frequency information of each nucleotide in the sequence .
3.2 Model
Parameter selection range :
The best parameters : The convolution layer is 2, The size of the two-layer filter is 8, The filling amount of the two layers is “same”, The size of the two-tier kernel is 3, The loss probability is 0.3.
application l2 Regularization and dropout Regularization to avoid over fitting of the network , Use learning rate is 0.001 Of Adam Optimizer ,batch size The best is 32, Set the number of iterations (epochs) by 200, It can be stopped in advance .
4. result
4.1 Comparison with other most advanced methods



4.2 Sequence analysis
t-SNE visualization :
Heat map in electron catastrophe analysis :

The effect of mutation on prediction probability :





5.Web The server
link :http://nsclbio.jbnu.ac.kr/tools/i4mC-Deep/

边栏推荐
- 论文解读:《开发和验证深度学习系统对黄斑裂孔的病因进行分类并预测解剖结果》
- Use steps of Charles' packet capturing
- Practical convolution correlation trick
- NVIDIA NVIDIA released H100 GPU, and the water-cooled server is adapted on the road
- Notes | Baidu flying plasma AI talent Creation Camp: How did amazing ideas come into being?
- C语言中,对柔性数组的理解
- Matplotlib Usage Summary
- 单片机学习笔记5--STM32时钟系统(基于百问网STM32F103系列教程)
- Lvgl8.1 version notes
- Data analysis (II)
猜你喜欢

高分子物理考研概念及要点、考点总结

单片机学习笔记4--GPIO(基于百问网STM32F103系列教程)

ARM架构与编程4--串口(基于百问网ARM架构与编程教程视频)

论文解读:《BERT4Bitter:一种基于transformer(BERT)双向编码器表示用于改善苦肽预测的基础模型》

利用google or-tools 求解数独难题

Neo4j 知识图谱的图数据科学-如何助力数据科学家提升数据洞察力线上研讨会于6月8号举行

数字经济“双碳”目标下,“东数西算”数据中心为何依靠液冷散热技术节能减排?

Under the "double carbon" goal of the digital economy, why does the "digital East and digital West" data center rely on liquid cooling technology to save energy and reduce emissions?

单片机学习笔记1--资料下载、环境搭建(基于百问网STM32F103系列教程)

时间序列的数据分析(三):经典时间序列分解
随机推荐
NLP natural language processing - Introduction to machine learning and natural language processing (I)
NVIDIA NVIDIA released H100 GPU, and the water-cooled server is adapted on the road
Gaode positioning - the problem that the permission pop-up box does not appear
Introduction and practice of Google or tools for linear programming
The use of padding.nn.bceloss
利用or-tools来求解带容量限制的路径规划问题(CVRP)
硬件知识1--原理图和接口类型(基于百问网硬件操作大全视频教程)
ARM架构与编程2--ARM架构(基于百问网ARM架构与编程教程视频)
LVGL8.1版本笔记
线性规划之Google OR-Tools 简介与实战
论文解读:《开发和验证深度学习系统对黄斑裂孔的病因进行分类并预测解剖结果》
opencv库安装路径(别打开这个了)
NLP自然语言处理-机器学习和自然语言处理介绍(二)
Interpretation of the paper: a convolutional neural network for identifying N6 methyladenine sites in rice genome using dinucleotide one hot encoder
High level API of propeller realizes image rain removal
数据挖掘场景-发票虚开
高分子物理考研概念及要点、考点总结
CPC客户端的安装教程
ARM架构与编程7--异常与中断(基于百问网ARM架构与编程教程视频)
Data analysis of time series (II): Calculation of data trend