当前位置:网站首页>(5) Introduction to R language bioinformatics -- ORF and sequence analysis
(5) Introduction to R language bioinformatics -- ORF and sequence analysis
2022-07-06 12:21:00 【EricFrenzy】
notes : This blog aims to share personal learning experience , Please forgive me for any irregularities !
The concept is introduced
In the human body , To express DNA Genes on , This gene contains DNA Is transcribed as pre-mRNA After further processing, it becomes mature mRNA,mRNA Then it will be used by ribosomes to synthesize proteins , So as to control the response of organisms . stay mRNA On , Every three bases form a codon , Corresponding to an amino acid . The following figure shows the comparison table of codons and amino acids :
To synthesize a normal protein ,mRNA Both ends of the sequence need to have a starting codon ( Marked with start) And a stop codon ( Marked with stop). But in DNA There are many start and stop codons like this on , Produce many different sequence combinations . In order to be in DNA Find all possible sequence combinations that can be used to make a certain protein , We use open reading frames (ORF,Open Reading Frame) To find all sequences that have the potential to encode proteins .
look for ORF Code implementation of
stay R Find in language ORF The procedure flow of is as follows :
Here is the specific code :
findORF <- function(seq){
# The incoming parameter is DNA Sequence , Pay attention to the direction. It must be 5' To 3'
findStartCodons <- function(seq){
# Find the function of starting codon
startcodons <- numeric(0) # Create an empty function
k <- 1
for(i in 1:(length(seq)-5)){
# Calculate by the first base position of the codon , The last five do not need to be checked , Because the length is too short
if(seq[i] == "a" && seq[i+1] == "t" && seq[i+2] == "g"){
#ATG Corresponding to the starting codon
startcodons[k] <- i # Record location
k <- k + 1 # Position subscript plus one
}
}
return(startcodons) # Return results
}
findStopCodons <- function(seq){
# Find the function that terminates the codon
stopcodons <- numeric(0) # Create an empty function
k <- 1
for(i in 1:(length(seq)-2)){
# Calculate by the first base position of the codon
if((seq[i] == "t" && seq[i+1] == "a" && seq[i+2] == "a") || (seq[i] == "t" && seq[i+1] == "a" && seq[i+2] == "g") || (seq[i] == "t" && seq[i+1] == "g" && seq[i+2] == "a")){
#TAA TAG TGA Corresponding to the stop codon
stopcodons[k] <- i # Record location
k <- k + 1 # Position subscript plus one
}
}
return(stopcodons) # Return results
}
startcodon <- findStartCodons(seq) # Find all the starting codons
stopcodon <- findStopCodons(seq) # Find all the stop codons
usedStop <- numeric(0) # Record used stop codons
ORFs <- character(0) # Record effective open reading frames
k <- 1
for(i in startcodon){
# Traverse all start codons
for(j in stopcodon){
# Traverse all termination codons
if((j-i)%%3==0 && j > i){
# If in a reading box , That is, the position between the two codons is 3 The integer of
if(j %in% usedStop){
# If the stop codon is used
break # Jump out of this cycle , To the next starting codon
}else if(j-i < 300){
# If the sequence length between codons is too short
break # ditto
}else{
ORFs[k] <- paste(i, "to", j) # Generate string , The recorded results are as follows "1 to 3001"
usedStop[k] <- j # Record used stop codons
k <- k + 1 # Position subscript plus one
break # Jump out of this cycle , To the next starting codon
}
}
}
}
return(ORFs) # Return results
}
This kind of search ORF Our algorithm is relatively simple and fast , But the accuracy will decrease accordingly . stay NCBI Official website There is a more accurate algorithm .
Conclusion
Find ORF after , Can put the ORF Compare with the known sequence in the database , Thus, useful information such as the composition and function of genes in this species can be predicted . Next time we will introduce Needleman-Wunsch This sequence global alignment algorithm , Coming soon ! And any questions or ideas are welcome to leave messages and comments !
边栏推荐
- JS function promotion and declaration promotion of VaR variable
- js题目:输入数组,最大的与第一个元素交换,最小的与最后一个元素交换,输出数组。
- Esp8266 connect onenet (old mqtt mode)
- Gateway 根据服务名路由失败,报错 Service Unavailable, status=503
- [offer9]用两个栈实现队列
- Embedded startup process
- Detailed explanation of truncate usage
- Arduino uno R3 register writing method (1) -- pin level state change
- @Autowired 和 @Resource 的区别
- Comparaison des solutions pour la plate - forme mobile Qualcomm & MTK & Kirin USB 3.0
猜你喜欢

Symbolic representation of functions in deep learning papers

Comparison of solutions of Qualcomm & MTK & Kirin mobile platform USB3.0

JS object and event learning notes

Gravure sans fil Bluetooth sur micro - ordinateur à puce unique

小天才电话手表 Z3工作原理

RT thread API reference manual
![Detailed explanation of Union [C language]](/img/d2/99f288b1705a3d072387cd2dde827c.jpg)
Detailed explanation of Union [C language]

Amba, ahb, APB, Axi Understanding

Cannot change version of project facet Dynamic Web Module to 2.3.

ESP learning problem record
随机推荐
Kconfig Kbuild
ESP学习问题记录
基于Redis的分布式ID生成器
Flink late data processing (3)
JS regular expression basic knowledge learning
(五)R语言入门生物信息学——ORF和序列分析
arduino JSON数据信息解析
A possible cause and solution of "stuck" main thread of RT thread
Classification, understanding and application of common methods of JS array
关于Gateway中使用@Controller的问题
Detailed explanation of Union [C language]
STM32 how to locate the code segment that causes hard fault
JS object and event learning notes
ES6语法总结--下篇(进阶篇 ES6~ES11)
Dead loop in FreeRTOS task function
Arduino get random number
(三)R语言的生物信息学入门——Function, data.frame, 简单DNA读取与分析
E-commerce data analysis -- salary prediction (linear regression)
I2C bus timing explanation
(四)R语言的数据可视化——矩阵图、柱状图、饼图、散点图与线性回归、带状图