当前位置：网站首页>ML - natural language processing - Basics

ML - natural language processing - Basics

2022-07-25 15:24:00 【sword_ csdn】

Catalog

Reference resources

Huawei cloud College
https://www.cnblogs.com/pinard/p/7160330.html

Language model

Language model is a language abstract modeling based on the objective facts of language , It's a correspondence , Suppose there are the following problems ：
（1） Machine translation （I have a dream）：P( I have a dream )>P( I have a dream )
（2） Spelling correction ：P(about fifteen minutes from)>P(about fifteenminuets from)
（3） speech recognition ：P( You look like your mother )>P( You look like your mother )
（4） Phonetic conversion ：P( What are you doing now? |nixianzaiganshenme)>P( What are you doing in Xi'an |nixianzaiganshenme)
If we formalize the above problem , The chain rule can be expressed as follows
Insert picture description here

Neural network language model

Insert picture description here

N - gram Language model

utilize n Metamodel （n-gram model） Estimate conditional probability , That is, ignore the distance greater than or equal to n The influence of the above words , Therefore, if the ratio of frequency counting is used to calculate n The meta conditional probability can be expressed as ：
Insert picture description here

NN The relationship between language model and statistical language model

The same thing ： They regard a sentence as a sequence of words , Then calculate the probability of the sentence
Difference ：
（1） How to calculate probability ：N-gram Based on Markov hypothesis, only the former n Word ,NNLM Consider the context of the whole sentence .
（2） How to train the model ：N-gram Calculate parameters based on maximum likelihood estimation , It is based on the word itself ;NNLM be based on RNN Optimization method training model .
（3） The cyclic neural network can store any length of context information in the hidden state , Not limited to N-gram Window restrictions in the model .

Text vectorization

Express the text as a series of vectors that can express the semantics of the text . Commonly used vectorization algorithms are ：one-hot,TF-IDF,word2vec(CBOW,Skip-gram),doc2vec/str2vec(DM,DBOW).
Insert picture description here

word2vec - CBOW Model

CBOW The training input of the model is the word vector corresponding to the context related word of a characteristic word , And the output is the word vector of this particular word .
Insert picture description here
For example, the following paragraph , Our context size is 4, The specific word is "Learning", That is, the output word vector we need , The words corresponding to the context are 8 individual , Before and after 4 individual , this 8 One word is the input of our model . because CBOW The word bag model is used , So this 8 All words are equal , That is, regardless of the distance between them and the words we focus on , As long as it's within our context .
Insert picture description here

word2vec - Skip-gram Model

Skip-Gram Models and CBOW On the contrary , That is, the input is a specific word vector , The output is the context word vector corresponding to a specific word . Or the example above , Our context size is 4, The specific word "Learning" It's our input , And this 8 A contextual word is our output .
Insert picture description here

doc2vec - DM Model

Insert picture description here
Each paragraph is represented as a vector , The corresponding matrix D A column vector in , Each word is represented as a vector , The corresponding matrix W A column vector in . Paragraph vectors and word vectors are averaged or connected to the context (context) Predict the next word in .

doc2vec - DBOW Model

Insert picture description here
The model samples a text window in each iteration of random gradient descent (text window), Then randomly sample a word from the text window , So as to form a multi classification task for word prediction with a given paragraph vector . The model and Skip-gram The model is similar .