当前位置:网站首页>Hidden Markov model (HMM) learning notes

Hidden Markov model (HMM) learning notes

2022-07-07 07:12:00 Wsyoneself

  1. Markov chain : At any moment t, The value of the observation variable depends only on the current state variable , It has nothing to do with the state variables at other times, that is, the observed variables ; meanwhile , The current state value only depends on the state of the previous moment , It has nothing to do with anything else .
  2. Obtained by Markov HMM The joint probability distribution of
  3. A model must contain parameters , The essence of machine learning is to find a set of optimal parameters , Make the fitting effect of the model the best .
  4. HMM Parameters of : State transition probability ( Infer the probability of the next state from the current state ), Output observation probability ( The probability of inferring the observed value from the current state )
  5. Three basic questions :
    1. The problem of probability calculation : Given the model and observation sequence , Calculate the probability of occurrence of the observation sequence
    2. Learning problems : Known observation sequence , Estimate model parameters , Maximize the output probability of the observation sequence
    3. Prediction problem : Given the model and observation sequence , Find the conditional probability of a given observation sequence P(I|O) The largest hidden state I
  6. Algorithm :
    1. Forward algorithm : Given hidden Markov model , And to the moment t The sequence of observations , And the state is qi The forward probability of : It's actually from t=1 Start calculating , According to the implicit Markov hypothesis , Iterative calculation can get ( I understand : Forward probability is the moment t Transfer to time t+1 Probability , Multiply each state by the state transition probability of the current state , Because the final result is the observed value , We also need to multiply the observation probability at the last moment )
    2. Backward algorithm : Given hidden Markov model , And from t+1 Moment to T The sequence of observations , And the state is qi Backward probability : Suppose the probability of the last moment is 1, Then the inverse calculation of the conditional probability of reason is pushed forward
    3. Baum-welch Algorithm : If the sample data has no label , Then the training data only contains the observation sequence O, But the corresponding state I Unknown , Then the hidden Markov model at this time is a probability model with hidden variables .

      The essence of parameter learning is still EM,EM The basic idea of is to first add the initial estimate of the parameter to the likelihood function , Then maximize the likelihood function ( It's usually derivative , Make it equal to 0), Get new parameter estimates , repeated , Until it converges .

    4. Viterbi (Viterbi) Algorithm : It's a dynamic programming algorithm , Used to find the most likely sequence of observed events - Viterbi path - Implicit state sequence .

    5. Generalization :

      1. Given models and observations ,Forward The algorithm can calculate the probability of observing a specific sequence from the model ;Viterbi The algorithm can calculate the most likely internal state ;Baum-Welch The algorithm can be used for training HMM. When there is enough training data , use Baum-Welch Work out HMM State transition probability and observation probability function , And then you can use it Viterbi The algorithm calculates the most likely phoneme sequence behind each input speech . But if the amount of data is limited , Often first train some smaller HMM Used to identify each monosyllabic (monophone), Or triphones (triphone), Then put these small HMM String together to recognize continuous speech

      2. For speech synthesis , Given a string of phonemes , Go to the database to find a bunch of small ones that best match this crosstalk HMM, String them into a long HMM, Stands for the whole sentence . Then according to this combination HMM, Calculate the sequence of speech parameters that are most likely to be observed , The rest is to generate speech from the parameter sequence . This is a simplification of the whole process , The main problem is , The speech parameters thus generated are discontinuous , because HMM The state of is discrete . To solve this problem ,Keiichi Tokuda Learn from the dynamic parameters widely used in speech recognition ( The first and second derivatives of parameters ), It is introduced into the parameter generation of speech synthesis , The coherence of generated speech has been greatly improved . The key is to use the recessive state , Such as grammar , The habit of using words, etc , Infer the output with higher probability .

  7. When the model parameters are known , The observation sequence is O when , The state can be any state , Then the observation sequence in each state is O The probability accumulation of is the probability of the observation sequence .
  8. For supervised learning , The state transition probability and output observation probability can be calculated directly from the data ( For participles , The observation sequence corresponds to the text sentence , The hidden state corresponds to the label of each word in the sentence )
  9. The observation sequence in speech recognition is language , The hidden state is text , The function of speech recognition is to convert speech into corresponding words .

Reference resources :

「 hidden Markov model 」(HMM-Based) How is it applied in speech synthesis ? - You know (zhihu.com)

    
    

原网站

版权声明
本文为[Wsyoneself]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207070204493895.html