当前位置:网站首页>Watermelon book chapter 3 - linear model learning notes
Watermelon book chapter 3 - linear model learning notes
2022-07-27 07:07:00 【Dr. J】
1. The basic form of linear model
- Definition : Given a dataset D, The samples are d Attributes ,
It's No i Values of attributes , So the linear model can learn a function of linear combination of attributes through such a combination to predict :

In vector form, that's
, In determining w and b after , You can get the final model .
2. Linear regression
Univariate linear regression ( Single attribute )
- Property value conversion : If there is an ordered relationship between the values of discrete attributes , That is, the relationship between values can be found by sorting , Then you can directly convert the value of this attribute into a continuous value , For example, tall, short, fat and thin , Total assets and other attributes ; If you cannot find the numerical relationship by sorting , You need to convert the attribute value into the form of numeric vector , For example, classify melons ,“ watermelon ”,“ pumpkin ”,“ cucumber ” Can be converted to (0,0,1),(0,1,0),(1,0,0)
- Parameters w and b The solution of : The core idea is to minimize the loss function
Specific solution method , Minimize the mean square error :

The mathematical method used is the least square method , That is, trying to find a straight line , Minimize the sum of Euclidean distances from all samples to a straight line :
- First pair w and b To derive separately

- Equal the above two equations to 0 Then set up the equations , To solve the , available :


Multiple linear regression
- The form of the objective function to be learned is :

- The optimization goal is similar to monism , And the mathematical method is also the least square method
summary
The structure of linear model is simple , The solution is also easy to understand , And it has many changes , For example, the logistic regression model in the next chapter
3. Logical regression ( It is called logarithmic probability regression )
- principle : Suppose that the real value of the sample changes on the exponential scale , Then the logarithm of the output marker can be used as the approximation target of the linear model :

The above formula is still linear regression in form , But the prediction result is a nonlinear mapping from input space to output space

Activation function ( The connection function ): Activation function g(x) It is a function that approximates the predicted value and the real value , above lnx Is a kind of activation function , Thrill The basic properties of a living function need to be monotonically differentiable and always smooth . Therefore, the principle of logical regression classification is to find a suitable activation function to make a more effective connection between the real value and the predicted value
2. Choice of activation function of logistic regression
- Because the real value of the second category is only 0,1 Two options , So the goal is to find an activation function to map the value obtained by linear regression to 0,1 value , As shown in the figure below

According to the requirement that the activation function is monotonically differentiable and smooth enough , Select the logarithmic probability function as
, The final image effect is shown in the above figure , Logistic regression is to use the value of the linear model to approximate the logarithmic probability 
3. The advantages of logical regression
- The classification problem can be modeled without first assuming the distribution of data
- While predicting the category, the approximate probability is also calculated
- It can be solved directly by numerical optimization algorithm
summary
Logistic regression has many advantages and the modeling process is simple , See Chapter 3 of watermelon book for specific optimization process
4. Linear discriminant analysis (LDA)
- The core idea : Given a training set , Use some methods to project the data in the training set onto a straight line , The specific task is to make the distance between the projections of similar sample points as close as possible , The projections of different sample points are as far as possible , Here's the picture :

- Definition : The projection of the center of the two types of samples on a straight line is
and
, The covariance of the two sample points is 
and
, Intra class divergence matrix and inter class divergence matrix

- Optimization objectives : Make the covariance between the same species as small as possible , Make the projection points between different classes as far away as possible , So you get LDA The goal of optimization , It's written in :

perhaps

- computing method : Using Lagrange operator method , It can be solved w Value
5. Multi category learning
- The main idea : Convert multi classification into two classification , Train a classifier for each binary classification task , Then use these classifiers to predict at the same time , Compare the prediction accuracy of each classifier . The main data set splitting methods include one to many , one-on-one , Many to many
- One to one and one to many thoughts : One on one is to N Two species are paired , And then it explains N(N-1)/2 Two sub tasks ; One to many is to treat a class as positive , Others are negative , Training out N A classifier , The specific implementation process is shown in the following figure :

- Advantages and disadvantages of one-to-one and one to many : More classifiers need to be trained one-on-one , It will consume more storage overhead , But one-on-one training only uses two types of data at a time , And one to many, every training needs to use all the data , This will also lead to a lot of time overhead , So when there are many categories , The time cost of one-on-one is often smaller
6. Category imbalance
- It mainly refers to the classification task , There are great differences in the number of training samples in different categories , such as 1000 Samples , Yes 999 One is positive , Only one is the opposite , Then the learned model cannot predict the counterexample , It loses its predictive value
- But this phenomenon can be used in anomaly detection , For example, train a model to detect the quality of aircraft engines through a large number of good samples , In this way, it can learn many characteristics of good samples , Prefer good samples , So if you enter a bad test sample , Then this model will get an abnormal prediction result , At this time, it can be judged that there is a problem with the aircraft engine
边栏推荐
- Pytorch uses data_ Prefetcher improves data reading speed
- Jest single test style problem [identity obj proxy] NPM package
- AI: play games in your spare time - earn it a small goal - [Alibaba security × ICDM 2022] large scale e-commerce map of risk commodity inspection competition
- VIVO应用市场APP上架总结
- EasyRecovery14数据恢复软件官方功能简介
- Unittest suite and runner
- Analysis of strong tennis cup 2021 PWN competition -- babypwn
- 采用QT进行OpenGL开发(一)绘制平面图形
- 网易云信亮相 GIAC 全球互联网架构大会,解密新一代音视频架构在元宇宙场景的实践...
- Significance of NVIDIA SMI parameters
猜你喜欢

Music website management system based on SSM

Derivative, partial derivative and gradient

肽核酸PNA-多肽PNA-TPP|Glt-Ala-Ala-Pro-Leu-pNA|Suc-Ala-Pro-pNA|Suc-AAPL-pNA|Suc-AAPM-pNA

Web configuration software for industrial control is more efficient than configuration software

PNA peptide nucleic acid modified peptide suc Tyr Leu Val PNA | suc ala Pro Phe PNA 11

基于SSM实现的校园新闻发布管理系统

Boostrap

Auto encoder (AE), denoising auto encoder (DAE), variable auto encoder (VAE) differences

MangoDB

DNA修饰贵金属纳米颗粒|脱氧核糖核酸DNA修饰纳米金(科研级)
随机推荐
DNA修饰贵金属纳米颗粒|脱氧核糖核酸DNA修饰纳米金(科研级)
MySQL的基本语句(1)—增删改查
EasyRecovery14数据恢复软件官方功能简介
Matlab drawing (ultra detailed)
Details of cross entropy loss function in pytorch
Alibaba cloud SMS authentication third-party interface (fast use)
Auto encoder (AE), denoising auto encoder (DAE), variable auto encoder (VAE) differences
Numpy array and image conversion
Unittest suite and runner
Keras OCR instance test
Introduction to the official functions of easyrecovery14 data recovery software
Book borrowing management system based on SSM
AI:业余时间打比赛—挣它个小小目标—【阿里安全×ICDM 2022】大规模电商图上的风险商品检测比赛
DataScience:数据生成之在原始数据上添加小量噪声(可自定义噪声)进而实现构造新数据(dataframe格式数据存储案例)
Student achievement management system based on SSM
Express framework
银行业客户体验管理现状与优化策略分析
Shell编程的规范和变量
Jest single test style problem [identity obj proxy] NPM package
Interpretation of deepsort source code (IV)
It's No i Values of attributes , So the linear model can learn a function of linear combination of attributes through such a combination to predict :
and
, The covariance of the two sample points is 