当前位置：网站首页>Hands on deep learning pytorch version exercise solution - 3.1 linear regression

Hands on deep learning pytorch version exercise solution - 3.1 linear regression

2022-07-03 10:20:00 【Innocent^_^】

The first question and the third question are open questions , There are many angles to think and answer . I hope this reference can help you in your study , Your correction is also of great benefit to me .

Suppose we have ⼀ Some data x1, . . . , xn ∈ R. our ⽬ Mark is to find ⼀ It's a constant b, Minimize $\sum_{i}(x_i-b)^2$
（1） Find the best value b The analytic solution of .
（2） What is the relationship between this problem and its solution and normal distribution ?
2. Derive the envoy ⽤ flat ⽅ Analytical solution of linear regression optimization problem of error . To simplify the problem , The offset can be ignored b（ We can go to X Add all values to 1 Of ⼀ Column to do this ⼀ spot ）.
（1）⽤ Matrix and vector table ⽰ Can't write optimization problems （ Treat all data as a single matrix , Will all ⽬ The scalar value is treated as a single vector ）
（2） Calculate loss pair w Gradient of .
（3） By setting the gradient to 0、 Solving the matrix ⽅ To find the analytical solution .
（4） When might ⽐ send ⽤ Random gradient descent is better ？ such ⽅ When will the law expire ？
3. It is assumed that additional noise is controlled ϵ The noise model is exponential distribution . in other words , $p(\epsilon)=\frac{1}{2}exp(-|\epsilon|)$

$\qquad$ （1） Write the model $- l o g P (y ∣ X)$ The negative log likelihood of the data

$\qquad$ Explain ： set up $w=(b,w_1,\dots,w_n),x_i=(1,x_{i1},\dots,x_{in})$ .

be $P(y_i|x_i)=\frac{1}{2}e^{-|y_i-wx_i|} \Rightarrow -log\prod_{i=1}^{n}P(y_i|x_i) = -\sum_{i=1}^{n}log\frac{1}{2}e^{-|y_i-wx_i|}$

$\qquad \quad \ \, = -\sum_{i=1}^{n}(-|y_i-wx_i|+log\frac{1}{2})=nlog2+\sum_{i=1}^{n}|y_i-wx_i|$

$\qquad$ （2） Can you write an analytical solution ？

$\qquad$ Explain ： The optimization goal is to minimize $-logP(\textbf{y}|\textbf{X})$ , That is, the above results , About w Only the one with absolute value behind , So the absolute value is 0 Is the analytical solution , namely ： $-\sum_{i=1}^{n}|y_i-wx_i|=-\sum_{i=1}^{n}(y_i-wx_i)=0$

If y It's about x One variable function of , So above $w=\frac{y_1+y_2+\dots+y_n}{x_1+x_2+\dots+x_n}$ . If it is a multivariate function , Need to be written $-\sum_{i=1}^{n}(y_i-w^Tx_i)=0$ , Then it needs to be solved $Y-w^TX=0$ , So we get $w=(YX^{-1})^T$

$\qquad$ （3） Put forward ⼀ A random gradient descent algorithm to solve this problem . which ⾥ Possible error ？（ carry ⽰： When we keep updating parameters , It will be sent near the stagnation point ⽣ What circumstance ） Can you solve this problem ？

$\qquad$ Explain ： This function is not differentiable at zero . And near zero , This is L1-loss, After derivation, the result is $w$ , May correspond to loss Very small, but the gradient is too large , In this way, it is easy to oscillate , It's not easy to converge . Consider changing the optimization goal to smoothL1-loss（ Refer to the forum in the book “Yang_Liu” User on 10 month 21 Answer from Japan ）

原网站

版权声明
本文为[Innocent^_^]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202150538194913.html

当前位置：网站首页>Hands on deep learning pytorch version exercise solution - 3.1 linear regression

Hands on deep learning pytorch version exercise solution - 3.1 linear regression

边栏推荐

猜你喜欢

随机推荐