当前位置：网站首页>2021 Li Hongyi machine learning (1): basic concepts

2021 Li Hongyi machine learning (1): basic concepts

2022-07-05 02:38:00 【Three ears 01】

2021 Li hongyi machine learning （1）： Basic concepts

1 Basic concepts

B On the site 2021 Li Hongyi's learning notes of machine learning course , For reuse .

1 Basic concepts

Machine learning is ultimately about finding a function .

1.1 Different function categories

Return to Regression—— Output is numeric
classification Classification—— The output is in different categories classes, Do multiple choice questions
Structural learning Structured Learning—— Generate a structured file （ Draw a picture 、 Write an article ）, Let the machine learn to create

1.2 How to find functions （Training）：

First , Write a function with unknown parameters ;
secondly , Definition loss（ A function related to parameters ,MAE—— Absolute error ,MSE—— Mean square error ）;
Last , Optimize , Find the loss Minimum parameters —— gradient descent
１） Randomly select the initial value of the parameter ;
２） Calculation $\left.\frac{\partial L}{\partial w}\right|_{w=w^{0}}$ , Then step down the gradient , The step size is 　 $\left.lr\times\frac{\partial L}{\partial w}\right|_{w=w^{0}}$
３） Update parameters
This method has a huge drawback ： Usually we will find Local minima, But what we want is global minima

1.3 Model

Linear model linear model There's a big limit , Cannot simulate polyline 、 Curve , This restriction is called model bias, So we need to improve .
How to improve ：Piecewise Linear Curves
Insert picture description here
Many such sets can be fitted into curves .

1.3.1 sigmoid

It can be used sigmoid function $\frac{1}{1+e^{-\left(b+w x_{1}\right)}}＝ｃ sigmoid(b+wx_1)$ Fit the blue broken line ：
$y=b+\sum_{i} c_{i} \operatorname{sigmoid}\left(b_{i}+\sum_{j} w_{i j} x_{j}\right)$
Insert picture description here
All unknown parameters in this , Use both $\theta$ Express ：

Use all at once $\theta$ To calculate , Make a gradient descent , Such a large amount of data , Therefore, small batches are used batch：

every last data The number of updates depends on the total amount of data and batch Number ：
Insert picture description here

1.3.2 ReLU

In front of it is soft sigmoid, That's the curve , In fact, you can use two ReLU Quasi synthesis hard sigmoid, That's the broken line ：
Insert picture description here
above sigmoid The formula becomes ：

1.3.3 Yes sigmoid The calculation of can be done several more times

Insert picture description here
There are many such layers , It is called neural network Neural Network, Later called Deep learning=Many hidden layers