当前位置:网站首页>(5) Introduction to R language bioinformatics -- ORF and sequence analysis
(5) Introduction to R language bioinformatics -- ORF and sequence analysis
2022-07-06 12:21:00 【EricFrenzy】
notes : This blog aims to share personal learning experience , Please forgive me for any irregularities !
The concept is introduced
In the human body , To express DNA Genes on , This gene contains DNA Is transcribed as pre-mRNA After further processing, it becomes mature mRNA,mRNA Then it will be used by ribosomes to synthesize proteins , So as to control the response of organisms . stay mRNA On , Every three bases form a codon , Corresponding to an amino acid . The following figure shows the comparison table of codons and amino acids :
To synthesize a normal protein ,mRNA Both ends of the sequence need to have a starting codon ( Marked with start) And a stop codon ( Marked with stop). But in DNA There are many start and stop codons like this on , Produce many different sequence combinations . In order to be in DNA Find all possible sequence combinations that can be used to make a certain protein , We use open reading frames (ORF,Open Reading Frame) To find all sequences that have the potential to encode proteins .
look for ORF Code implementation of
stay R Find in language ORF The procedure flow of is as follows :
Here is the specific code :
findORF <- function(seq){
# The incoming parameter is DNA Sequence , Pay attention to the direction. It must be 5' To 3'
findStartCodons <- function(seq){
# Find the function of starting codon
startcodons <- numeric(0) # Create an empty function
k <- 1
for(i in 1:(length(seq)-5)){
# Calculate by the first base position of the codon , The last five do not need to be checked , Because the length is too short
if(seq[i] == "a" && seq[i+1] == "t" && seq[i+2] == "g"){
#ATG Corresponding to the starting codon
startcodons[k] <- i # Record location
k <- k + 1 # Position subscript plus one
}
}
return(startcodons) # Return results
}
findStopCodons <- function(seq){
# Find the function that terminates the codon
stopcodons <- numeric(0) # Create an empty function
k <- 1
for(i in 1:(length(seq)-2)){
# Calculate by the first base position of the codon
if((seq[i] == "t" && seq[i+1] == "a" && seq[i+2] == "a") || (seq[i] == "t" && seq[i+1] == "a" && seq[i+2] == "g") || (seq[i] == "t" && seq[i+1] == "g" && seq[i+2] == "a")){
#TAA TAG TGA Corresponding to the stop codon
stopcodons[k] <- i # Record location
k <- k + 1 # Position subscript plus one
}
}
return(stopcodons) # Return results
}
startcodon <- findStartCodons(seq) # Find all the starting codons
stopcodon <- findStopCodons(seq) # Find all the stop codons
usedStop <- numeric(0) # Record used stop codons
ORFs <- character(0) # Record effective open reading frames
k <- 1
for(i in startcodon){
# Traverse all start codons
for(j in stopcodon){
# Traverse all termination codons
if((j-i)%%3==0 && j > i){
# If in a reading box , That is, the position between the two codons is 3 The integer of
if(j %in% usedStop){
# If the stop codon is used
break # Jump out of this cycle , To the next starting codon
}else if(j-i < 300){
# If the sequence length between codons is too short
break # ditto
}else{
ORFs[k] <- paste(i, "to", j) # Generate string , The recorded results are as follows "1 to 3001"
usedStop[k] <- j # Record used stop codons
k <- k + 1 # Position subscript plus one
break # Jump out of this cycle , To the next starting codon
}
}
}
}
return(ORFs) # Return results
}
This kind of search ORF Our algorithm is relatively simple and fast , But the accuracy will decrease accordingly . stay NCBI Official website There is a more accurate algorithm .
Conclusion
Find ORF after , Can put the ORF Compare with the known sequence in the database , Thus, useful information such as the composition and function of genes in this species can be predicted . Next time we will introduce Needleman-Wunsch This sequence global alignment algorithm , Coming soon ! And any questions or ideas are welcome to leave messages and comments !
边栏推荐
- Stm32f1+bc20+mqtt+freertos system is connected to Alibaba cloud to transmit temperature and humidity and control LED lights
- Who says that PT online schema change does not lock the table, or deadlock
- Missing value filling in data analysis (focus on multiple interpolation method, miseforest)
- Gravure sans fil Bluetooth sur micro - ordinateur à puce unique
- 程序员老鸟都会搞错的问题 C语言基础 指针和数组
- ESP8266通过arduino IED连接巴法云(TCP创客云)
- 关于Gateway中使用@Controller的问题
- ESP8266使用arduino连接阿里云物联网
- HCIP Day 12
- History object
猜你喜欢
【ESP32学习-1】Arduino ESP32开发环境搭建
MySQL占用内存过大解决方案
JS 函数提升和var变量的声明提升
Cannot change version of project facet Dynamic Web Module to 2.3.
Walk into WPF's drawing Bing Dwen Dwen
AMBA、AHB、APB、AXI的理解
(五)R语言入门生物信息学——ORF和序列分析
JS variable types and common type conversions
Basic operations of databases and tables ----- modifying data tables
Comparison of solutions of Qualcomm & MTK & Kirin mobile platform USB3.0
随机推荐
MySQL time, time zone, auto fill 0
Who says that PT online schema change does not lock the table, or deadlock
2022.2.12 resumption
程序员老鸟都会搞错的问题 C语言基础 指针和数组
JS function promotion and declaration promotion of VaR variable
Remember an experience of ECS being blown up by passwords - closing a small black house, changing passwords, and changing ports
Time slice polling scheduling of RT thread threads
Esp8266 connect onenet (old mqtt mode)
基于Redis的分布式锁 以及 超详细的改进思路
Reno7 60W super flash charging architecture
A possible cause and solution of "stuck" main thread of RT thread
程序设计大作业:教务管理系统(C语言)
(五)R语言入门生物信息学——ORF和序列分析
open-mmlab labelImg mmdetection
Imgcat usage experience
Important methods of array and string
Arm pc=pc+8 is the most understandable explanation
By v$rman_ backup_ job_ Oracle "bug" caused by details
[esp32 learning-1] construction of Arduino esp32 development environment
ORA-02030: can only select from fixed tables/views