当前位置:网站首页>K-nucleotide frequencies (KNF) or k-mer frequencies

K-nucleotide frequencies (KNF) or k-mer frequencies

2022-07-23 12:21:00 Windy Street

K Nucleotide frequency (KNF,k-nucleotide frequencies) or K-mer frequency

KNF Describes the existence of k Frequencies of all possible polynucleotides of nucleotides . If k=2, Then the calculated frequency is dinucleotide ( namely AA、AT、AG、AC、……TT), common 42=16 Kind of ; If k=3, Then the calculated frequency is dinucleotide ( namely AAA、AAT、AAG、AAC、……TTT), common 43=64 Kind of ; And so on .

K-mer The frequency method is the same as above .

Method 1 :

# Extract nucleotide type ( Permutation and combination )
from itertools import product
def nucleotide_type(k):
    z = []
    for i in product('ACGT', repeat = k):  # The cartesian product ( There is a sampling arrangement to put back )
        z.append(''.join(i))  # hold ('A,A,A') Into a (AAA) form 
    return z
#  Number statistics of base pairs 
def char_count(sequence,num,k):
    n = 0
    char = nucleotide_type(k)  # Call the extract nucleotide type module 
    for i in range(len(sequence)-k+1):   # Count the number of corresponding characters 
        if sequence[i:i+k] == char[num]:
            n += 1
    return n/(len(sequence)-k+1)  # Return frequency ( Number of occurrences / The total number of times ) The total number of times = Sequence length - Take a few bases +1
def feature(seq,k):
    list = []
    for i in range(4**k):   # Take value according to the number of nucleotide types ( Two 、 3、 ... and 、 Tetranucleotides cycle separately 16、64、256 Time )
        list.append(char_count(seq,i,k))
    return (list)
#  Call feature code line by line 
def Sequence_replacement(sequ,k):
    sequen = [None]*len(sequ)
    for i in range(len(sequ)):
        s = sequ[i]
        sequen[i] = feature(s,k)
    return sequen
# Call with specific data 
feature_knf = Sequence_replacement(data,k)  #data For specific data ,k Set the value of as needed 

Method 2 :

# First, divide the data into K-mer form 
def Kmers_funct(seq,x): 
    X = [None]*len(seq)    # If the data has only one sequence , This definition is not necessary 
    for i in range(len(seq)):  # If the data has only one sequence , This cycle is not needed 
        a = seq[i]
        t=0
        l=[]
        for index in range(len(a)-x+1):
            t=a[index:index+x]
            if (len(t))==x:
                l.append(t)
        X[i] = l
    return X  # See the specific return needs , Or directly :return X
# Extract nucleotide type ( Permutation and combination )
from itertools import product
def nucleotide_type(k):
    z = []
    for i in product('ACGU', repeat = k):  # The cartesian product ( There is a sampling arrangement to put back )
        z.append(''.join(i))  # hold ('A,A,A') Into a (AAA) form 
    return z
# Definition K-mer Frequency module 
def Kmers_frequency(seq,x):
    X = []
    char = nucleotide_type(x)  # Call extract nucleotide type ( Permutation and combination ) Code 
    for i in range(len(seq)):
        s = seq[i]
        frequence = []
        for a in char:
            number = s.count(a)  # Count the number of characters in turn 
            char_frequence = number/len(s)-k+1)  # Calculate the frequency 
            frequence.append(char_frequence)
        X.append(frequence)
    return X
# call K-mer Module code 
feature_kmer = Kmers_frequency(data,k)
# use k-mer The generated data calls K-mer Frequency module 
feature_kmer_frequency = Kmers_frequency(feature_kmer,k)
原网站

版权声明
本文为[Windy Street]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/204/202207230539029148.html