当前位置:网站首页>K-nucleotide frequencies (KNF) or k-mer frequencies
K-nucleotide frequencies (KNF) or k-mer frequencies
2022-07-23 12:21:00 【Windy Street】
K Nucleotide frequency (KNF,k-nucleotide frequencies) or K-mer frequency
KNF Describes the existence of k Frequencies of all possible polynucleotides of nucleotides . If k=2, Then the calculated frequency is dinucleotide ( namely AA、AT、AG、AC、……TT), common 42=16 Kind of ; If k=3, Then the calculated frequency is dinucleotide ( namely AAA、AAT、AAG、AAC、……TTT), common 43=64 Kind of ; And so on .
K-mer The frequency method is the same as above .
Method 1 :
# Extract nucleotide type ( Permutation and combination )
from itertools import product
def nucleotide_type(k):
z = []
for i in product('ACGT', repeat = k): # The cartesian product ( There is a sampling arrangement to put back )
z.append(''.join(i)) # hold ('A,A,A') Into a (AAA) form
return z
# Number statistics of base pairs
def char_count(sequence,num,k):
n = 0
char = nucleotide_type(k) # Call the extract nucleotide type module
for i in range(len(sequence)-k+1): # Count the number of corresponding characters
if sequence[i:i+k] == char[num]:
n += 1
return n/(len(sequence)-k+1) # Return frequency ( Number of occurrences / The total number of times ) The total number of times = Sequence length - Take a few bases +1
def feature(seq,k):
list = []
for i in range(4**k): # Take value according to the number of nucleotide types ( Two 、 3、 ... and 、 Tetranucleotides cycle separately 16、64、256 Time )
list.append(char_count(seq,i,k))
return (list)
# Call feature code line by line
def Sequence_replacement(sequ,k):
sequen = [None]*len(sequ)
for i in range(len(sequ)):
s = sequ[i]
sequen[i] = feature(s,k)
return sequen
# Call with specific data
feature_knf = Sequence_replacement(data,k) #data For specific data ,k Set the value of as needed
Method 2 :
# First, divide the data into K-mer form
def Kmers_funct(seq,x):
X = [None]*len(seq) # If the data has only one sequence , This definition is not necessary
for i in range(len(seq)): # If the data has only one sequence , This cycle is not needed
a = seq[i]
t=0
l=[]
for index in range(len(a)-x+1):
t=a[index:index+x]
if (len(t))==x:
l.append(t)
X[i] = l
return X # See the specific return needs , Or directly :return X
# Extract nucleotide type ( Permutation and combination )
from itertools import product
def nucleotide_type(k):
z = []
for i in product('ACGU', repeat = k): # The cartesian product ( There is a sampling arrangement to put back )
z.append(''.join(i)) # hold ('A,A,A') Into a (AAA) form
return z
# Definition K-mer Frequency module
def Kmers_frequency(seq,x):
X = []
char = nucleotide_type(x) # Call extract nucleotide type ( Permutation and combination ) Code
for i in range(len(seq)):
s = seq[i]
frequence = []
for a in char:
number = s.count(a) # Count the number of characters in turn
char_frequence = number/(len(s)-k+1) # Calculate the frequency
frequence.append(char_frequence)
X.append(frequence)
return X
# call K-mer Module code
feature_kmer = Kmers_frequency(data,k)
# use k-mer The generated data calls K-mer Frequency module
feature_kmer_frequency = Kmers_frequency(feature_kmer,k)
边栏推荐
- 单片机学习笔记8--按键和外部中断(基于百问网STM32F103系列教程)
- “东数西算”下数据中心的液冷GPU服务器如何发展?
- Six trends and eight technologies of high-performance computing in data centers under "data center white paper 2022" and "computing from the east to the west"
- Interpretation of the paper: using attention mechanism to improve the identification of N6 methyladenine sites in DNA
- Integrate all lvgl controls into one project (lvgl6.0 version)
- Gartner调查研究:中国的数字化发展较之世界水平如何?高性能计算能否占据主导地位?
- 时间序列的数据分析(三):经典时间序列分解
- numpy总结
- After the VR project of ue4.24 is packaged, the handle controller does not appear
- Smart pointer shared_ PTR and unique_ ptr
猜你喜欢

“东数西算”数据中心下算力、AI智能芯片如何发展?

硬件知識1--原理圖和接口類型(基於百問網硬件操作大全視頻教程)

单片机学习笔记7--SysTick定时器(基于百问网STM32F103系列教程)

论文解读:《开发和验证深度学习系统对黄斑裂孔的病因进行分类并预测解剖结果》

数据挖掘场景-发票虚开

Using or tools to solve the path planning problem with capacity constraints (CVRP)

Data analysis of time series (II): Calculation of data trend

ARM架构与编程1--LED闪烁(基于百问网ARM架构与编程教程视频)

Interpretation of the paper: attention based multi label neural network for comprehensive prediction and interpretation of 12 widely existing RNA modifications

The data set needed to generate yolov3 from the existing voc207 data set, and the places that need to be modified to officially start the debugging program
随机推荐
论文解读:《一种利用二核苷酸One-hot编码器识别水稻基因组中N6甲基腺嘌呤位点的卷积神经网络》
How to build a liquid cooling data center is supported by blue ocean brain liquid cooling technology
论文解读:《基于注意力的多标签神经网络用于12种广泛存在的RNA修饰的综合预测和解释》
Gaode positioning - the problem that the permission pop-up box does not appear
利用or-tools来求解带容量限制的路径规划问题(CVRP)
液冷数据中心如何构建,蓝海大脑液冷技术保驾护航
Using Google or tools to solve logical problems: Zebra problem
Necessary mathematical knowledge for machine learning / deep learning
opencv库安装路径(别打开这个了)
“东数西算”下数据中心的液冷GPU服务器如何发展?
Data analysis of time series (I): main components
单片机学习笔记6--中断系统(基于百问网STM32F103系列教程)
Nt68661 screen parameter upgrade-rk3128-start up and upgrade screen parameters yourself
实用卷积相关trick
Analyze the pre integration of vio with less rigorous but logical mathematical theory
2021可信隐私计算高峰论坛暨数据安全产业峰会上百家争鸣
Development and deployment of steel defect detection using paddlex yolov3 of propeller
常用数学知识汇总
Comparison between pytorch and paddlepaddle -- Taking the implementation of dcgan network as an example
G2o installation path record -- for uninstallation