当前位置：网站首页>K-nucleotide frequencies (KNF) or k-mer frequencies

K-nucleotide frequencies (KNF) or k-mer frequencies

2022-07-23 12:21:00 【Windy Street】

K Nucleotide frequency （KNF,k-nucleotide frequencies） or K-mer frequency

KNF Describes the existence of k Frequencies of all possible polynucleotides of nucleotides . If k=2, Then the calculated frequency is dinucleotide （ namely AA、AT、AG、AC、……TT）, common 4²=16 Kind of ; If k=3, Then the calculated frequency is dinucleotide （ namely AAA、AAT、AAG、AAC、……TTT）, common 4³=64 Kind of ; And so on .

K-mer The frequency method is the same as above .

Method 1 ：

# Extract nucleotide type （ Permutation and combination ）
from itertools import product
def nucleotide_type(k):
    z = []
    for i in product('ACGT', repeat = k):  # The cartesian product （ There is a sampling arrangement to put back ）
        z.append(''.join(i))  # hold ('A,A,A') Into a （AAA） form 
    return z

#  Number statistics of base pairs 
def char_count(sequence,num,k):
    n = 0
    char = nucleotide_type(k)  # Call the extract nucleotide type module 
    for i in range(len(sequence)-k+1):   # Count the number of corresponding characters 
        if sequence[i:i+k] == char[num]:
            n += 1
    return n/(len(sequence)-k+1)  # Return frequency （ Number of occurrences / The total number of times ） The total number of times = Sequence length - Take a few bases +1

def feature(seq,k):
    list = []
    for i in range(4**k):   # Take value according to the number of nucleotide types （ Two 、 3、 ... and 、 Tetranucleotides cycle separately 16、64、256 Time ）
        list.append(char_count(seq,i,k))
    return (list)

#  Call feature code line by line 
def Sequence_replacement(sequ,k):
    sequen = [None]*len(sequ)
    for i in range(len(sequ)):
        s = sequ[i]
        sequen[i] = feature(s,k)
    return sequen

# Call with specific data 
feature_knf = Sequence_replacement(data,k)  #data For specific data ,k Set the value of as needed

Method 2 ：

# First, divide the data into K-mer form 
def Kmers_funct(seq,x): 
    X = [None]*len(seq)    # If the data has only one sequence , This definition is not necessary 
    for i in range(len(seq)):  # If the data has only one sequence , This cycle is not needed 
        a = seq[i]
        t=0
        l=[]
        for index in range(len(a)-x+1):
            t=a[index:index+x]
            if (len(t))==x:
                l.append(t)
        X[i] = l
    return X  # See the specific return needs , Or directly ：return X

# Extract nucleotide type （ Permutation and combination ）
from itertools import product
def nucleotide_type(k):
    z = []
    for i in product('ACGU', repeat = k):  # The cartesian product （ There is a sampling arrangement to put back ）
        z.append(''.join(i))  # hold ('A,A,A') Into a （AAA） form 
    return z

# Definition K-mer Frequency module 
def Kmers_frequency(seq,x):
    X = []
    char = nucleotide_type(x)  # Call extract nucleotide type （ Permutation and combination ） Code 
    for i in range(len(seq)):
        s = seq[i]
        frequence = []
        for a in char:
            number = s.count(a)  # Count the number of characters in turn 
            char_frequence = number/（len(s)-k+1)  # Calculate the frequency 
            frequence.append(char_frequence)
        X.append(frequence)
    return X

# call K-mer Module code 
feature_kmer = Kmers_frequency(data,k)
# use k-mer The generated data calls K-mer Frequency module 
feature_kmer_frequency = Kmers_frequency(feature_kmer,k)

原网站

版权声明
本文为[Windy Street]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/204/202207230539029148.html

当前位置：网站首页>K-nucleotide frequencies (KNF) or k-mer frequencies

K-nucleotide frequencies (KNF) or k-mer frequencies

K Nucleotide frequency （KNF,k-nucleotide frequencies） or K-mer frequency

边栏推荐

猜你喜欢

随机推荐