当前位置:网站首页>(5) Introduction to R language bioinformatics -- ORF and sequence analysis
(5) Introduction to R language bioinformatics -- ORF and sequence analysis
2022-07-06 12:21:00 【EricFrenzy】
notes : This blog aims to share personal learning experience , Please forgive me for any irregularities !
The concept is introduced
In the human body , To express DNA Genes on , This gene contains DNA Is transcribed as pre-mRNA After further processing, it becomes mature mRNA,mRNA Then it will be used by ribosomes to synthesize proteins , So as to control the response of organisms . stay mRNA On , Every three bases form a codon , Corresponding to an amino acid . The following figure shows the comparison table of codons and amino acids :
To synthesize a normal protein ,mRNA Both ends of the sequence need to have a starting codon ( Marked with start) And a stop codon ( Marked with stop). But in DNA There are many start and stop codons like this on , Produce many different sequence combinations . In order to be in DNA Find all possible sequence combinations that can be used to make a certain protein , We use open reading frames (ORF,Open Reading Frame) To find all sequences that have the potential to encode proteins .
look for ORF Code implementation of
stay R Find in language ORF The procedure flow of is as follows :
Here is the specific code :
findORF <- function(seq){
# The incoming parameter is DNA Sequence , Pay attention to the direction. It must be 5' To 3'
findStartCodons <- function(seq){
# Find the function of starting codon
startcodons <- numeric(0) # Create an empty function
k <- 1
for(i in 1:(length(seq)-5)){
# Calculate by the first base position of the codon , The last five do not need to be checked , Because the length is too short
if(seq[i] == "a" && seq[i+1] == "t" && seq[i+2] == "g"){
#ATG Corresponding to the starting codon
startcodons[k] <- i # Record location
k <- k + 1 # Position subscript plus one
}
}
return(startcodons) # Return results
}
findStopCodons <- function(seq){
# Find the function that terminates the codon
stopcodons <- numeric(0) # Create an empty function
k <- 1
for(i in 1:(length(seq)-2)){
# Calculate by the first base position of the codon
if((seq[i] == "t" && seq[i+1] == "a" && seq[i+2] == "a") || (seq[i] == "t" && seq[i+1] == "a" && seq[i+2] == "g") || (seq[i] == "t" && seq[i+1] == "g" && seq[i+2] == "a")){
#TAA TAG TGA Corresponding to the stop codon
stopcodons[k] <- i # Record location
k <- k + 1 # Position subscript plus one
}
}
return(stopcodons) # Return results
}
startcodon <- findStartCodons(seq) # Find all the starting codons
stopcodon <- findStopCodons(seq) # Find all the stop codons
usedStop <- numeric(0) # Record used stop codons
ORFs <- character(0) # Record effective open reading frames
k <- 1
for(i in startcodon){
# Traverse all start codons
for(j in stopcodon){
# Traverse all termination codons
if((j-i)%%3==0 && j > i){
# If in a reading box , That is, the position between the two codons is 3 The integer of
if(j %in% usedStop){
# If the stop codon is used
break # Jump out of this cycle , To the next starting codon
}else if(j-i < 300){
# If the sequence length between codons is too short
break # ditto
}else{
ORFs[k] <- paste(i, "to", j) # Generate string , The recorded results are as follows "1 to 3001"
usedStop[k] <- j # Record used stop codons
k <- k + 1 # Position subscript plus one
break # Jump out of this cycle , To the next starting codon
}
}
}
}
return(ORFs) # Return results
}
This kind of search ORF Our algorithm is relatively simple and fast , But the accuracy will decrease accordingly . stay NCBI Official website There is a more accurate algorithm .
Conclusion
Find ORF after , Can put the ORF Compare with the known sequence in the database , Thus, useful information such as the composition and function of genes in this species can be predicted . Next time we will introduce Needleman-Wunsch This sequence global alignment algorithm , Coming soon ! And any questions or ideas are welcome to leave messages and comments !
边栏推荐
- 高通&MTK&麒麟 手機平臺USB3.0方案對比
- RT thread API reference manual
- Custom view puzzle getcolor r.color The color obtained by colorprimary is incorrect
- 【ESP32学习-1】Arduino ESP32开发环境搭建
- Learning notes of JS variable scope and function
- RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
- Imgcat usage experience
- Arm pc=pc+8 is the most understandable explanation
- Arduino gets the length of the array
- arduino UNO R3的寄存器写法(1)-----引脚电平状态变化
猜你喜欢
Characteristics, task status and startup of UCOS III
Types de variables JS et transformations de type communes
Detailed explanation of Union [C language]
Postman 中级使用教程【环境变量、测试脚本、断言、接口文档等】
ESP learning problem record
ES6语法总结--下篇(进阶篇 ES6~ES11)
Remember an experience of ECS being blown up by passwords - closing a small black house, changing passwords, and changing ports
arduino UNO R3的寄存器写法(1)-----引脚电平状态变化
C language callback function [C language]
Gravure sans fil Bluetooth sur micro - ordinateur à puce unique
随机推荐
Mp3mini playback module Arduino < dfrobotdfplayermini H> function explanation
Flink late data processing (3)
MySQL时间、时区、自动填充0的问题
Walk into WPF's drawing Bing Dwen Dwen
Gravure sans fil Bluetooth sur micro - ordinateur à puce unique
JS 函数提升和var变量的声明提升
Générateur d'identification distribué basé sur redis
Who says that PT online schema change does not lock the table, or deadlock
RuntimeError: cuDNN error: CUDNN_ STATUS_ NOT_ INITIALIZED
[Red Treasure Book Notes simplified version] Chapter 12 BOM
Kaggle competition two Sigma connect: rental listing inquiries (xgboost)
Oppo vooc fast charging circuit and protocol
ARM PC=PC+8 最便于理解的阐述
ES6 grammar summary -- Part I (basic)
Esp8266 connects to onenet cloud platform (mqtt) through Arduino IDE
MP3mini播放模块arduino<DFRobotDFPlayerMini.h>函数详解
ORA-02030: can only select from fixed tables/views
Learning notes of JS variable scope and function
Redis cache update strategy, cache penetration, avalanche, breakdown problems
【ESP32学习-2】esp32地址映射