当前位置:网站首页>Discriminant model: a discriminant model creation framework log linear model

Discriminant model: a discriminant model creation framework log linear model

2022-07-06 10:30:00 HadesZ~

Log-Linear Model It is a framework for creating discriminant model algorithm , It does not refer to a particular model 、 It refers to a kind of model .

1. Definition

Let the model predict and consider J J J Species characteristics , j = 1 , 2 , ⋯   , J j=1,2, \cdots, J j=1,2,,J; w j w_j wj Indicates that the model is right j j j Parameters of kinds of features , Its value is estimated in the process of model training ; F j ( X , y ) F_j(X, y) Fj(X,y) Represents the second part of the model j j j Characteristic function of kinds of characteristics (feature function), It expresses characteristics X X X And labels y y y Some of the relationships between , The dependent variable is the... Used for model prediction j j j Features ; Z ( X , W ) Z(X,W) Z(X,W) The normalization coefficient representing the predicted value of model characteristics , It is called normalization term or partion function. Under these conditions , The objective function of the model can be expressed as follows :
P ( y ∣   X ; W ) = e x p [ ∑ j = 1 J w j F j ( X , y ) ] Z ( X , W ) (1) P(y|\ X;W) = \frac{exp[\sum_{j=1}^{J} w_jF_j(X, y)]}{Z(X,W)} \tag{1} P(y X;W)=Z(X,W)exp[j=1JwjFj(X,y)](1)

In the model , The characteristic function of each feature (feature function) Set manually , Given different characteristic functions, different kinds of models can be derived , establish feature function It is a process of Feature Engineering . When given manually feature function when , It's a machine learning process , When given by automatic feature mechanism feature function when , It's a deep learning process .

Z ( X , W ) Z(X,W) Z(X,W) Equal to the sum of the numerators of all possible categories of the label , namely Z ( X , W ) = ∑ i = 1 C e x p [ ∑ j = 1 J w j F j ( X , y = c i ) ] Z(X,W) = \sum_{i=1}^{C}exp[\sum_{j=1}^{J} w_jF_j(X, y = c_i)] Z(X,W)=i=1Cexp[j=1JwjFj(X,y=ci)], Its function is to normalize the molecular term , Let the fractional result satisfy the conditional probability property .

2. Derivative logistic regression model

Let the set of all possible tags be C = { c 1 , c 2 , ⋯   , c N } C = \{c_1, c_2, \cdots, c_N\} C={ c1,c2,,cN}、 Input characteristics X X X Is a length of J J J Vector X = ( x 1 , x 2 , ⋯   , x d ) X=(x_1, x_2, \cdots, x_d) X=(x1,x2,,xd). So given F j ( X , y ) = x j ⋅ I ( y = c i ) F_j(X, y) = x_j \cdot I(y=c_i) Fj(X,y)=xjI(y=ci), I ( y = c i ) I(y=c_i) I(y=ci) yes indicator function, When y = c i y=c_i y=ci when indicator function The value of is 1, Otherwise 0. therefore , The objective function of the model is :

P ( y = c i ∣   X ; W ) = e x p [ ∑ j = 1 + d ( i − 1 ) d + d ( i − 1 ) w j x j − d ( i − 1 ) ] ∑ i = 1 C e x p [ ∑ j = 1 + d ( i − 1 ) d + d ( i − 1 ) w j x j − d ( i − 1 ) ] (2) P(y=c_i|\ X;W) = \frac{ exp \begin{bmatrix} \sum_{j=1 + d(i-1)}^{d+d(i-1)} w_jx_{j-d(i-1)} \end{bmatrix} }{ \sum_{i=1}^{C} exp \begin{bmatrix} \sum_{j=1 + d(i-1)}^{d+d(i-1)} w_jx_{j-d(i-1)} \end{bmatrix} } \tag{2} P(y=ci X;W)=i=1Cexp[j=1+d(i1)d+d(i1)wjxjd(i1)]exp[j=1+d(i1)d+d(i1)wjxjd(i1)](2)

Where the model parameters w j ∈ R 3 d wj \in R^{3d} wjR3d, Parameter vector W = ( w 1 , w 2 , ⋯   , w d , w d + 1 , ⋯   , w 2 d , ⋯   , w 1 + d ( C − 1 ) , ⋯   , w d + d ( C − 1 ) ) W = (w_1, w_2, \cdots, w_d, w_{d+1}, \cdots, w_{2d}, \cdots,w_{1+d(C-1), \cdots, w_{d+ d(C-1)}}) W=(w1,w2,,wd,wd+1,,w2d,,w1+d(C1),,wd+d(C1)); Let's take the sub vectors in the parameter vector ( w 1 + d ( i − 1 ) , ⋯   , w d + d ( i − 1 ) ) (w_{1 + d(i-1)}, \cdots, w_{d + d(i-1)}) (w1+d(i1),,wd+d(i1)) Write it down as W i W_{i} Wi, So the parameter vector can be rewritten as W = ( W 1 , W 2 , ⋯   , W C ) W=(W_{1}, W_{2}, \cdots, W_{C}) W=(W1,W2,,WC), Bring it into type ( 2 ) type (2) type (2) The objective function of the model can be abbreviated as :

P ( y = c i ∣   X ; W ) = e x p [ W i T ⋅ X ] ∑ i = 1 C e x p [ W i T ⋅ X ] (3) P(y=c_i|\ X;W) = \frac{ exp [W_{i}^T \cdot X] }{ \sum_{i=1}^{C} exp [W_{i}^T \cdot X] } \tag{3} P(y=ci X;W)=i=1Cexp[WiTX]exp[WiTX](3)
obviously , type ( 3 ) type (3) type (3) Equivalent to P ( y ∣   X ; W ) = S o f t m a x ( W T X ) P(y|\ X;W) = Softmax(W^TX) P(y X;W)=Softmax(WTX); thus , We have Log-Linear Model A multi classification logistic regression model is derived (Multinomial Logistic Regression).

3. derivative CRF Model

Empathy , set up X ˉ \bar{X} Xˉ Is a length of T T T Observable feature sequence of , y ˉ \bar{y} yˉ Is its corresponding tag sequence , If given F j ( X , y ) = ∑ t = 2 T f t ( y t − 1 , y t , X ˉ , t ) F_j(X, y) = \sum_{t=2}^{T} f_t(y_{t-1}, y_t, \bar{X}, t) Fj(X,y)=t=2Tft(yt1,yt,Xˉ,t) , Then you can get Linera CRF The objective function of the model :

P ( y ˉ ∣   X ˉ ; W ) = 1 Z ( X , W ) e x p [ ∑ t = 2 T f t ( y t − 1 , y t , X ˉ , t ) ] (4) P(\bar{y}|\ \bar{X};W) = \frac{1}{Z(X,W)}exp \begin{bmatrix} \sum_{t=2}^{T} f_t(y_{t-1}, y_t, \bar{X}, t) \end{bmatrix} \tag{4} P(yˉ Xˉ;W)=Z(X,W)1exp[t=2Tft(yt1,yt,Xˉ,t)](4)

The specific process can be seen in the author's article :

原网站

版权声明
本文为[HadesZ~]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202131712502301.html