当前位置:网站首页>Discriminant model: a discriminant model creation framework log linear model
Discriminant model: a discriminant model creation framework log linear model
2022-07-06 10:30:00 【HadesZ~】
Log-Linear Model It is a framework for creating discriminant model algorithm , It does not refer to a particular model 、 It refers to a kind of model .
1. Definition
Let the model predict and consider J J J Species characteristics , j = 1 , 2 , ⋯ , J j=1,2, \cdots, J j=1,2,⋯,J; w j w_j wj Indicates that the model is right j j j Parameters of kinds of features , Its value is estimated in the process of model training ; F j ( X , y ) F_j(X, y) Fj(X,y) Represents the second part of the model j j j Characteristic function of kinds of characteristics (feature function), It expresses characteristics X X X And labels y y y Some of the relationships between , The dependent variable is the... Used for model prediction j j j Features ; Z ( X , W ) Z(X,W) Z(X,W) The normalization coefficient representing the predicted value of model characteristics , It is called normalization term or partion function. Under these conditions , The objective function of the model can be expressed as follows :
P ( y ∣ X ; W ) = e x p [ ∑ j = 1 J w j F j ( X , y ) ] Z ( X , W ) (1) P(y|\ X;W) = \frac{exp[\sum_{j=1}^{J} w_jF_j(X, y)]}{Z(X,W)} \tag{1} P(y∣ X;W)=Z(X,W)exp[∑j=1JwjFj(X,y)](1)
In the model , The characteristic function of each feature (feature function) Set manually , Given different characteristic functions, different kinds of models can be derived , establish feature function It is a process of Feature Engineering . When given manually feature function when , It's a machine learning process , When given by automatic feature mechanism feature function when , It's a deep learning process .
Z ( X , W ) Z(X,W) Z(X,W) Equal to the sum of the numerators of all possible categories of the label , namely Z ( X , W ) = ∑ i = 1 C e x p [ ∑ j = 1 J w j F j ( X , y = c i ) ] Z(X,W) = \sum_{i=1}^{C}exp[\sum_{j=1}^{J} w_jF_j(X, y = c_i)] Z(X,W)=∑i=1Cexp[∑j=1JwjFj(X,y=ci)], Its function is to normalize the molecular term , Let the fractional result satisfy the conditional probability property .
2. Derivative logistic regression model
Let the set of all possible tags be C = { c 1 , c 2 , ⋯ , c N } C = \{c_1, c_2, \cdots, c_N\} C={ c1,c2,⋯,cN}、 Input characteristics X X X Is a length of J J J Vector X = ( x 1 , x 2 , ⋯ , x d ) X=(x_1, x_2, \cdots, x_d) X=(x1,x2,⋯,xd). So given F j ( X , y ) = x j ⋅ I ( y = c i ) F_j(X, y) = x_j \cdot I(y=c_i) Fj(X,y)=xj⋅I(y=ci), I ( y = c i ) I(y=c_i) I(y=ci) yes indicator function, When y = c i y=c_i y=ci when indicator function The value of is 1, Otherwise 0. therefore , The objective function of the model is :
P ( y = c i ∣ X ; W ) = e x p [ ∑ j = 1 + d ( i − 1 ) d + d ( i − 1 ) w j x j − d ( i − 1 ) ] ∑ i = 1 C e x p [ ∑ j = 1 + d ( i − 1 ) d + d ( i − 1 ) w j x j − d ( i − 1 ) ] (2) P(y=c_i|\ X;W) = \frac{ exp \begin{bmatrix} \sum_{j=1 + d(i-1)}^{d+d(i-1)} w_jx_{j-d(i-1)} \end{bmatrix} }{ \sum_{i=1}^{C} exp \begin{bmatrix} \sum_{j=1 + d(i-1)}^{d+d(i-1)} w_jx_{j-d(i-1)} \end{bmatrix} } \tag{2} P(y=ci∣ X;W)=∑i=1Cexp[∑j=1+d(i−1)d+d(i−1)wjxj−d(i−1)]exp[∑j=1+d(i−1)d+d(i−1)wjxj−d(i−1)](2)
Where the model parameters w j ∈ R 3 d wj \in R^{3d} wj∈R3d, Parameter vector W = ( w 1 , w 2 , ⋯ , w d , w d + 1 , ⋯ , w 2 d , ⋯ , w 1 + d ( C − 1 ) , ⋯ , w d + d ( C − 1 ) ) W = (w_1, w_2, \cdots, w_d, w_{d+1}, \cdots, w_{2d}, \cdots,w_{1+d(C-1), \cdots, w_{d+ d(C-1)}}) W=(w1,w2,⋯,wd,wd+1,⋯,w2d,⋯,w1+d(C−1),⋯,wd+d(C−1)); Let's take the sub vectors in the parameter vector ( w 1 + d ( i − 1 ) , ⋯ , w d + d ( i − 1 ) ) (w_{1 + d(i-1)}, \cdots, w_{d + d(i-1)}) (w1+d(i−1),⋯,wd+d(i−1)) Write it down as W i W_{i} Wi, So the parameter vector can be rewritten as W = ( W 1 , W 2 , ⋯ , W C ) W=(W_{1}, W_{2}, \cdots, W_{C}) W=(W1,W2,⋯,WC), Bring it into type ( 2 ) type (2) type (2) The objective function of the model can be abbreviated as :
P ( y = c i ∣ X ; W ) = e x p [ W i T ⋅ X ] ∑ i = 1 C e x p [ W i T ⋅ X ] (3) P(y=c_i|\ X;W) = \frac{ exp [W_{i}^T \cdot X] }{ \sum_{i=1}^{C} exp [W_{i}^T \cdot X] } \tag{3} P(y=ci∣ X;W)=∑i=1Cexp[WiT⋅X]exp[WiT⋅X](3)
obviously , type ( 3 ) type (3) type (3) Equivalent to P ( y ∣ X ; W ) = S o f t m a x ( W T X ) P(y|\ X;W) = Softmax(W^TX) P(y∣ X;W)=Softmax(WTX); thus , We have Log-Linear Model A multi classification logistic regression model is derived (Multinomial Logistic Regression).
3. derivative CRF Model
Empathy , set up X ˉ \bar{X} Xˉ Is a length of T T T Observable feature sequence of , y ˉ \bar{y} yˉ Is its corresponding tag sequence , If given F j ( X , y ) = ∑ t = 2 T f t ( y t − 1 , y t , X ˉ , t ) F_j(X, y) = \sum_{t=2}^{T} f_t(y_{t-1}, y_t, \bar{X}, t) Fj(X,y)=∑t=2Tft(yt−1,yt,Xˉ,t) , Then you can get Linera CRF The objective function of the model :
P ( y ˉ ∣ X ˉ ; W ) = 1 Z ( X , W ) e x p [ ∑ t = 2 T f t ( y t − 1 , y t , X ˉ , t ) ] (4) P(\bar{y}|\ \bar{X};W) = \frac{1}{Z(X,W)}exp \begin{bmatrix} \sum_{t=2}^{T} f_t(y_{t-1}, y_t, \bar{X}, t) \end{bmatrix} \tag{4} P(yˉ∣ Xˉ;W)=Z(X,W)1exp[∑t=2Tft(yt−1,yt,Xˉ,t)](4)
边栏推荐
- If someone asks you about the consistency of database cache, send this article directly to him
- Time complexity (see which sentence is executed the most times)
- The 32 year old programmer left and was admitted by pinduoduo and foreign enterprises. After drying out his annual salary, he sighed: it's hard to choose
- MySQL32-锁
- MySQL實戰優化高手08 生產經驗:在數據庫的壓測過程中,如何360度無死角觀察機器性能?
- 好博客好资料记录链接
- Transactions have four characteristics?
- First blog
- 在jupyter NoteBook使用Pytorch进行MNIST实现
- The governor of New Jersey signed seven bills to improve gun safety
猜你喜欢
使用OVF Tool工具从Esxi 6.7中导出虚拟机
MySQL Real Time Optimization Master 04 discute de ce qu'est binlog en mettant à jour le processus d'exécution des déclarations dans le moteur de stockage InnoDB.
Introduction tutorial of typescript (dark horse programmer of station B)
UEditor国际化配置,支持中英文切换
Typescript入门教程(B站黑马程序员)
Use JUnit unit test & transaction usage
MySQL combat optimization expert 02 in order to execute SQL statements, do you know what kind of architectural design MySQL uses?
14 medical registration system_ [Alibaba cloud OSS, user authentication and patient]
Mysql27 index optimization and query optimization
Isn't there anyone who doesn't know how to write mine sweeping games in C language
随机推荐
Mexican SQL manual injection vulnerability test (mongodb database) problem solution
颜值爆表,推荐两款JSON可视化工具,配合Swagger使用真香
C miscellaneous lecture continued
MySQL实战优化高手04 借着更新语句在InnoDB存储引擎中的执行流程,聊聊binlog是什么?
Notes of Dr. Carolyn ROS é's social networking speech
第一篇博客
MySQL实战优化高手06 生产经验:互联网公司的生产环境数据库是如何进行性能测试的?
What should the redis cluster solution do? What are the plans?
MySQL32-锁
Pytorch LSTM实现流程(可视化版本)
Implement sending post request with form data parameter
Pytoch LSTM implementation process (visual version)
Chrome浏览器端跨域不能访问问题处理办法
Windchill configure remote Oracle database connection
[unity] simulate jelly effect (with collision) -- tutorial on using jellysprites plug-in
MySQL ERROR 1040: Too many connections
Complete web login process through filter
简单解决phpjm加密问题 免费phpjm解密工具
Technology | diverse substrate formats
MySQL实战优化高手09 生产经验:如何为生产环境中的数据库部署监控系统?