当前位置:网站首页>(5) Introduction to R language bioinformatics -- ORF and sequence analysis
(5) Introduction to R language bioinformatics -- ORF and sequence analysis
2022-07-06 12:21:00 【EricFrenzy】
notes : This blog aims to share personal learning experience , Please forgive me for any irregularities !
The concept is introduced
In the human body , To express DNA Genes on , This gene contains DNA Is transcribed as pre-mRNA After further processing, it becomes mature mRNA,mRNA Then it will be used by ribosomes to synthesize proteins , So as to control the response of organisms . stay mRNA On , Every three bases form a codon , Corresponding to an amino acid . The following figure shows the comparison table of codons and amino acids :
To synthesize a normal protein ,mRNA Both ends of the sequence need to have a starting codon ( Marked with start) And a stop codon ( Marked with stop). But in DNA There are many start and stop codons like this on , Produce many different sequence combinations . In order to be in DNA Find all possible sequence combinations that can be used to make a certain protein , We use open reading frames (ORF,Open Reading Frame) To find all sequences that have the potential to encode proteins .
look for ORF Code implementation of
stay R Find in language ORF The procedure flow of is as follows :
Here is the specific code :
findORF <- function(seq){
# The incoming parameter is DNA Sequence , Pay attention to the direction. It must be 5' To 3'
findStartCodons <- function(seq){
# Find the function of starting codon
startcodons <- numeric(0) # Create an empty function
k <- 1
for(i in 1:(length(seq)-5)){
# Calculate by the first base position of the codon , The last five do not need to be checked , Because the length is too short
if(seq[i] == "a" && seq[i+1] == "t" && seq[i+2] == "g"){
#ATG Corresponding to the starting codon
startcodons[k] <- i # Record location
k <- k + 1 # Position subscript plus one
}
}
return(startcodons) # Return results
}
findStopCodons <- function(seq){
# Find the function that terminates the codon
stopcodons <- numeric(0) # Create an empty function
k <- 1
for(i in 1:(length(seq)-2)){
# Calculate by the first base position of the codon
if((seq[i] == "t" && seq[i+1] == "a" && seq[i+2] == "a") || (seq[i] == "t" && seq[i+1] == "a" && seq[i+2] == "g") || (seq[i] == "t" && seq[i+1] == "g" && seq[i+2] == "a")){
#TAA TAG TGA Corresponding to the stop codon
stopcodons[k] <- i # Record location
k <- k + 1 # Position subscript plus one
}
}
return(stopcodons) # Return results
}
startcodon <- findStartCodons(seq) # Find all the starting codons
stopcodon <- findStopCodons(seq) # Find all the stop codons
usedStop <- numeric(0) # Record used stop codons
ORFs <- character(0) # Record effective open reading frames
k <- 1
for(i in startcodon){
# Traverse all start codons
for(j in stopcodon){
# Traverse all termination codons
if((j-i)%%3==0 && j > i){
# If in a reading box , That is, the position between the two codons is 3 The integer of
if(j %in% usedStop){
# If the stop codon is used
break # Jump out of this cycle , To the next starting codon
}else if(j-i < 300){
# If the sequence length between codons is too short
break # ditto
}else{
ORFs[k] <- paste(i, "to", j) # Generate string , The recorded results are as follows "1 to 3001"
usedStop[k] <- j # Record used stop codons
k <- k + 1 # Position subscript plus one
break # Jump out of this cycle , To the next starting codon
}
}
}
}
return(ORFs) # Return results
}
This kind of search ORF Our algorithm is relatively simple and fast , But the accuracy will decrease accordingly . stay NCBI Official website There is a more accurate algorithm .
Conclusion
Find ORF after , Can put the ORF Compare with the known sequence in the database , Thus, useful information such as the composition and function of genes in this species can be predicted . Next time we will introduce Needleman-Wunsch This sequence global alignment algorithm , Coming soon ! And any questions or ideas are welcome to leave messages and comments !
边栏推荐
- Kconfig Kbuild
- The dolphin scheduler remotely executes shell scripts through the expect command
- 高通&MTK&麒麟 手機平臺USB3.0方案對比
- gcc 编译选项
- js 变量作用域和函数的学习笔记
- Detailed explanation of truncate usage
- Arduino uno R3 register writing method (1) -- pin level state change
- @Autowired 和 @Resource 的区别
- (四)R语言的数据可视化——矩阵图、柱状图、饼图、散点图与线性回归、带状图
- [golang] leetcode intermediate - fill in the next right node pointer of each node & the k-smallest element in the binary search tree
猜你喜欢

JS数组常用方法的分类、理解和运用

Whistle+switchyomega configure web proxy
![[golang] leetcode intermediate - fill in the next right node pointer of each node & the k-smallest element in the binary search tree](/img/6e/0802a92511ac50a652afa1678ad28c.jpg)
[golang] leetcode intermediate - fill in the next right node pointer of each node & the k-smallest element in the binary search tree

Esp8266 uses Arduino to connect Alibaba cloud Internet of things

Cannot change version of project facet Dynamic Web Module to 2.3.

Basic operations of databases and tables ----- classification of data

Kaggle competition two Sigma connect: rental listing inquiries

JS variable types and common type conversions

Comparaison des solutions pour la plate - forme mobile Qualcomm & MTK & Kirin USB 3.0

数据库课程设计:高校教务管理系统(含代码)
随机推荐
JS regular expression basic knowledge learning
Amba, ahb, APB, Axi Understanding
C language callback function [C language]
OSPF message details - LSA overview
GCC compilation options
ESP学习问题记录
Types de variables JS et transformations de type communes
open-mmlab labelImg mmdetection
Page performance optimization of video scene
Common DOS commands
[offer9]用两个栈实现队列
arduino JSON数据信息解析
VSCode基础配置
Postman 中级使用教程【环境变量、测试脚本、断言、接口文档等】
Gateway 根据服务名路由失败,报错 Service Unavailable, status=503
JS 函数提升和var变量的声明提升
2022.2.12 resumption
Problèmes avec MySQL time, fuseau horaire, remplissage automatique 0
Missing value filling in data analysis (focus on multiple interpolation method, miseforest)
关于Gateway中使用@Controller的问题