当前位置:网站首页>Understanding of L1 regularization and L2 regularization [easy to understand]
Understanding of L1 regularization and L2 regularization [easy to understand]
2022-07-27 21:44:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm your friend, Quan Jun .
One 、 Okam razor (Occam’s razor) principle :
In all possible models , We should choose data that can well explain , And a very simple model . From a Bayesian point of view , The regular term corresponds to the prior probability of the model . It can be assumed that the complex model has a small a priori probability , A simple model has a large a priori probability .
Two 、 Regularization term
2.1、 What is regularization ?
Regularization is the realization of structural risk minimization strategy , Add a regular term or penalty term to the empirical risk , There are two kinds of regular terms L1 Regularization and L2 Regularization , perhaps L1 Norm sum L2 norm . For linear regression model , Use L1 The regularized model is called Lasso Return to ; Use L2 The regularized model is called Ridge Return to ( Ridge return )
2.2、 The relationship between regularization term and model complexity
The regularization term is generally a monotonically increasing function of model complexity , The more complex the model , The larger the regularization value is .
Generally speaking , Supervised learning can be seen as minimizing the following objective function :
The second in the above formula 1 Item is experience risk , namely Model f(x) About the average loss of training data set ; The first 2 The term is a regularization term , Go to Constraining our model is simpler
3、 ... and 、L1 norm
3.1 Concept : L1 Norm is the sum of the absolute values of the elements in a vector .
3.2 Why? L1 The norm makes the weights sparse ?
Any regularization operator , If he were Wi=0 You can't be tiny , And can be decomposed into “ Sum up ” In the form of , Then the regularization operator can achieve sparse .
3.3 What are the benefits of sparse parameters ?
(1) feature selection (Feature Selection)
Parameter sparse regularization can realize automatic feature selection , In the process of Feature Engineering , Generally speaking ,xi Most of the elements of ( features ) And its label yi unconcerned . When we minimize the objective function , These irrelevant features are considered , Although the minimum training error can be obtained , But for new samples , This useless information is instead considered , It interferes with the prediction of the sample . Sparse regularization sets the weight of these useless features to 0, Get rid of these useless features .
(2) Interpretability
Set irrelevant features to 0, Models are easier to explain . for example : The probability of suffering from a certain disease is y, The data we collected x yes 1000 Dimensional , Our task is to find this 1000 How do the three factors affect the probability of suffering from this disease . hypothesis , We have a regression model :y=w1*x1+w2*x2+…+w1000*x1000+b, Through the study , We finally learned w* There are only a few non-zero elements . For example, only 5 A non-zero w*, So this 5 individual w* Contains key information about the disease . in other words , Whether suffering from this disease and this 5 Two features are related , That thing has become much easier to deal with .
Four 、L2 norm
4.1 Concept :L2 Norm refers to the sum of the squares of the elements of a vector, and then find the square root .
The regularization term can take different forms . For the regression problem , The loss function is the square loss , The regularization term is a parameter vector L2 The norm of .
4.2 Why? L2 Norms prevent over fitting ?
The left one : Under fitting ; middle : Normal fit ; On the right side : Over fitting
Linear regression fitting diagram
Give Way L2 Regular term of norm ||W||2 Minimum , You can make W Each of the elements is very small , Are close to 0.(L1 Norm let W be equal to 0), The smaller the parameter, the simpler the model is , The simpler the model, the less likely it is to produce over fitting .( According to the linear regression fitting diagram above , Some parameters are limited to very small , In fact, it limits the influence of some components of the polynomial to be very small , This is equivalent to reducing the number of variables )
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/128182.html Link to the original text :https://javaforall.cn
边栏推荐
- In addition to "adding machines", in fact, your micro service can be optimized like this
- 一文读懂Plato Farm的ePLATO,以及其高溢价缘由
- Qmodbus library is used, and it is written as ROS node publishing topic and program cmakelist
- For 3nm and below processes, ASML new generation EUV lithography machine exposure
- 微软商店无法下载应用,VS2019无法下载插件问题解决方案
- An article takes you into the world of pycharm - stop asking me about pycharm installation and environment configuration!!!
- STL源码剖析
- 聊聊 MySQL 事务二阶段提交
- Will the United States prohibit all Chinese enterprises from purchasing American chips? Trump responded like this
- C语言-入门-语法-指针(十二)
猜你喜欢
随机推荐
DAY_ 4. Operation -- judge whether there is a certain data in the array -- realize array mapping (enlarge by 10 times) -- insert the array in sequence (modify bugs) -- realize array de duplication
Software testing interview question: what is the focus of unit testing, integration testing, and system testing?
IDEA连接MySQL数据库并执行SQL查询操作
紫光展锐:2020年将有数十款基于春藤510的5G终端商用
crsctl中,显示的HOME的作用
美国将禁止所有中国企业采购美国芯片?特朗普这样回应
2021-11-05 understand main method syntax, code block and final keyword
除了「加机器」,其实你的微服务还能这样优化
Chinese and English instructions - abfluor 488 cell apoptosis detection kit
30 minutes to thoroughly understand the synchronized lock upgrade process
Software test interview question: please say who is the best person to complete these tests, and what is the test?
8000字讲透OBSA原理与应用实践
In crsctl, the function of displayed home
软件测试面试题:设计测试用例时应该考虑哪些方面,即不同的测试用例针对那些方面进行测试?
美国新宣布制裁的6家中国企业到底是何方神圣?
一口气学完 Redis 集群方案
MySQL执行过程及执行顺序
数组扩容、排序、嵌套语句应用
@The difference between Autowired annotation and @resource annotation
University of Tilburg, Federal University of the Netherlands | neural data to text generation based on small datasets: comparing the added value of two semi supervised learning approvals on top of a l







