当前位置：网站首页>Discriminant model: a discriminant model creation framework log linear model

Discriminant model: a discriminant model creation framework log linear model

2022-07-06 10:30:00 【HadesZ~】

Log-Linear Model It is a framework for creating discriminant model algorithm , It does not refer to a particular model 、 It refers to a kind of model .

1. Definition

Let the model predict and consider $J$ Species characteristics , $\cdots, J$ ; $w_j$ Indicates that the model is right $j$ Parameters of kinds of features , Its value is estimated in the process of model training ; $F_j(X, y)$ Represents the second part of the model $j$ Characteristic function of kinds of characteristics （feature function）, It expresses characteristics $X$ And labels $y$ Some of the relationships between , The dependent variable is the... Used for model prediction $j$ Features ; $Z (X, W)$ The normalization coefficient representing the predicted value of model characteristics , It is called normalization term or partion function. Under these conditions , The objective function of the model can be expressed as follows ：
$P(y|\ X;W) = \frac{exp[\sum_{j=1}^{J} w_jF_j(X, y)]}{Z(X,W)} \tag{1}$

In the model , The characteristic function of each feature （feature function） Set manually , Given different characteristic functions, different kinds of models can be derived , establish feature function It is a process of Feature Engineering . When given manually feature function when , It's a machine learning process , When given by automatic feature mechanism feature function when , It's a deep learning process .

$Z (X, W)$ Equal to the sum of the numerators of all possible categories of the label , namely $\sum_{i=1}^{C}exp[\sum_{j=1}^{J} w_jF_j(X, y = c_i)]$ , Its function is to normalize the molecular term , Let the fractional result satisfy the conditional probability property .

2. Derivative logistic regression model

Let the set of all possible tags be $\{c_1, c_2, \cdots, c_N\}$ 、 Input characteristics $X$ Is a length of $J$ Vector $X=(x_1, x_2, \cdots, x_d)$ . So given $F_j(X, y) = x_j \cdot I(y=c_i)$ , $I(y=c_i)$ yes indicator function, When $y=c_i$ when indicator function The value of is 1, Otherwise 0. therefore , The objective function of the model is ：

$P(y=c_i|\ X;W) = \frac{ exp \begin{bmatrix} \sum_{j=1 + d(i-1)}^{d+d(i-1)} w_jx_{j-d(i-1)} \end{bmatrix} }{ \sum_{i=1}^{C} exp \begin{bmatrix} \sum_{j=1 + d(i-1)}^{d+d(i-1)} w_jx_{j-d(i-1)} \end{bmatrix} } \tag{2}$

Where the model parameters $\in R^{3d}$ , Parameter vector $(w_1, w_2, \cdots, w_d, w_{d+1}, \cdots, w_{2d}, \cdots,w_{1+d(C-1), \cdots, w_{d+ d(C-1)}})$ ; Let's take the sub vectors in the parameter vector $(w_{1 + d(i-1)}, \cdots, w_{d + d(i-1)})$ Write it down as $W_{i}$ , So the parameter vector can be rewritten as $W=(W_{1}, W_{2}, \cdots, W_{C})$ , Bring it into $type (2)$ The objective function of the model can be abbreviated as ：

$P(y=c_i|\ X;W) = \frac{ exp [W_{i}^T \cdot X] }{ \sum_{i=1}^{C} exp [W_{i}^T \cdot X] } \tag{3}$
obviously , $type (3)$ Equivalent to $P(y|\ X;W) = Softmax(W^TX)$ ; thus , We have Log-Linear Model A multi classification logistic regression model is derived （Multinomial Logistic Regression）.

3. derivative CRF Model

Empathy , set up $\bar{X}$ Is a length of $T$ Observable feature sequence of , $\bar{y}$ Is its corresponding tag sequence , If given $F_j(X, y) = \sum_{t=2}^{T} f_t(y_{t-1}, y_t, \bar{X}, t)$ , Then you can get Linera CRF The objective function of the model ：