当前位置:网站首页>2021 Li Hongyi machine learning (1): basic concepts

2021 Li Hongyi machine learning (1): basic concepts

2022-07-05 02:38:00 Three ears 01


B On the site 2021 Li Hongyi's learning notes of machine learning course , For reuse .

1 Basic concepts

Machine learning is ultimately about finding a function .

1.1 Different function categories

  • Return to Regression—— Output is numeric
  • classification Classification—— The output is in different categories classes, Do multiple choice questions
  • Structural learning Structured Learning—— Generate a structured file ( Draw a picture 、 Write an article ), Let the machine learn to create

1.2 How to find functions (Training):

  • First , Write a function with unknown parameters ;
  • secondly , Definition loss( A function related to parameters ,MAE—— Absolute error ,MSE—— Mean square error );
  • Last , Optimize , Find the loss Minimum parameters —— gradient descent
    1) Randomly select the initial value of the parameter ;
    2) Calculation ∂ L ∂ w ∣ w = w 0 \left.\frac{\partial L}{\partial w}\right|_{w=w^{0}} wLw=w0, Then step down the gradient , The step size is   l r × ∂ L ∂ w ∣ w = w 0 \left.lr\times\frac{\partial L}{\partial w}\right|_{w=w^{0}} lr×wLw=w0
    3) Update parameters
    This method has a huge drawback : Usually we will find Local minima, But what we want is global minima

1.3 Model

Linear model linear model There's a big limit , Cannot simulate polyline 、 Curve , This restriction is called model bias, So we need to improve .
How to improve :Piecewise Linear Curves
 Insert picture description here
Many such sets can be fitted into curves .

1.3.1 sigmoid

It can be used sigmoid function y = c 1 1 + e − ( b + w x 1 ) = c s i g m o i d ( b + w x 1 ) y=c \frac{1}{1+e^{-\left(b+w x_{1}\right)}}=c sigmoid(b+wx_1) y=c1+e(b+wx1)1sigmoid(b+wx1) Fit the blue broken line :
y = b + ∑ i c i sigmoid ⁡ ( b i + ∑ j w i j x j ) y=b+\sum_{i} c_{i} \operatorname{sigmoid}\left(b_{i}+\sum_{j} w_{i j} x_{j}\right) y=b+icisigmoid(bi+jwijxj)
 Insert picture description here
All unknown parameters in this , Use both θ \theta θ Express :
 Insert picture description here
Use all at once θ \theta θ To calculate , Make a gradient descent , Such a large amount of data , Therefore, small batches are used batch:
 Insert picture description here
every last data The number of updates depends on the total amount of data and batch Number :
 Insert picture description here

1.3.2 ReLU

In front of it is soft sigmoid, That's the curve , In fact, you can use two ReLU Quasi synthesis hard sigmoid, That's the broken line :
 Insert picture description here
above sigmoid The formula becomes :
 Insert picture description here

1.3.3 Yes sigmoid The calculation of can be done several more times

 Insert picture description here
There are many such layers , It is called neural network Neural Network, Later called Deep learning=Many hidden layers

原网站

版权声明
本文为[Three ears 01]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202140912208554.html