当前位置：网站首页>Machine learning related concepts

Machine learning related concepts

2022-07-28 14:58:00 【Swlaaa】

come from ：https://www.jianshu.com/p/ddcaeefb5b97

One 、 Concept

fitting , Under fitting , Over fitting
- fitting ： The matching degree of the tester data to the model , Tends to be between under fitting and over fitting ;
- Under fitting ： Learn little ;
- Over fitting ： Over learning ;
variance , deviation
- variance ： Describe the concentration of data ;
- deviation ： Describe the distance to the target ;
Model ≈ law ≈ Equation coefficient ≈ Parameter weight (Weight);
- namely ： A model is a set of parameters used to measure the importance of a set of quantities
machine learning ≈ pattern recognition ;
Algorithm ≈ equation ;
fitting ≈ matching ;
Training ≈ Solve a set of equations ;

Two 、 machine learning

What is machine learning
- The official definition is no longer said , In layman's terms , It is to let machines think and solve problems like people ; Write a little , Machine learning for programmers ： Let the machine solve the equation , Find the optimal set of coefficients ( Model ); It can be understood that machine learning is an algorithm of data mining ;
The category of machine learning
- Machine learning is an interdisciplinary subject , pattern recognition 、 data mining 、 Statistical learning 、 Computer vision 、 speech recognition 、 Natural language processing, etc , Each is a heavyweight discipline
Classification of machine learning
- Supervised learning （ The raw data is y）
  - Classified learning
  - Return to learning
- Unsupervised learning （ The original data does not y）
  - Clustering learning
  - Dimension reduction learning
dependent Python library
- numpy： Mathematical calculation framework
- scipy： Physical computing framework
- pandas： Data analysis framework , It is mainly used to analyze table data
- matplotlib： The frame of the drawing
- scikit-learn： The framework of machine learning
- tensorflow： Google's open source framework for deep learning
- keras： Open source deep learning framework
Terminology understanding
- machine learning ： According to a set of parameters (w:weight) Find an equation , With the parameter x Transformation of , result y As close as possible to the real results ; Using graphics ： Look for a line , Make this line in y As far away from all points as possible in the direction y The average of the values is similar , Even if the loss function is minimal ; Then the equation we get is the core goal of machine learning , We need to solve two problems ：① The coefficients of a set of equations ;② The power of the equation ： It's a several variable and several degree equation ;
  - Loss function of linear regression ：
    - Loss function (LossFunction) Also known as ： Objective function 、 Cost function (CostFunction)： The smaller the loss function , The greater the total maximum likelihood estimation , The more accurate our model ;
    - Loss function Is a convex function ;
    - Least square method ,(R)MSE((Root: Square root ) mean squared error), Square mean loss function ( Mean square error );
    - The formula here is here Generate , The mathematical formulas in the short book ( Support is not very good , Patchy look , Set up your own blog over time )：① The in line formula uses $ The parcel ② Block level formulas use $$ The parcel ;
- The relationship between the predictive value of multiple linear regression and the model ： perhaps
  - ：weight, Sometimes use $\theta$ Express ;
- Algorithm ： An equation consisting of coefficients floating up and down within a certain range , Parameters are also called dimensions that affect results ;
- Linear regression ：
  - linear ： Linear transformation of first-order equation ;
  - Return to ： Any one x The point on the axis y The overall trend formed by averaging the values on the axis ;
- Maximum likelihood estimation ： It belongs to statistics , Parameters used to solve the probability density function of the sample set ; likelihood ：Likelihood; namely ： To estimate m The probability of each sample in the normal distribution , When multiplied, it is m The total likelihood of samples appearing in the normal distribution ;
  - Total likelihood Equal to the product of all probabilities , We want to obtain the minimum total likelihood , That is, the probability of getting all is the smallest , However, probability has no solution ( The actual data is discrete , Calculus in mathematics is a continuous value ), When we can find the maximum probability density multiplication , It is equivalent to finding the maximum probability multiplication , So use probability density multiplication to minimize the substitution probability ;
  - according to Central limit theorem , Suppose there are independent events between samples , Error variable ( error ) Randomly generated , Then obey the positive distribution , Therefore, the positive distribution is used when calculating the total likelihood ;
- Central limit theorem ： It belongs to the category of probability theory , It refers to the distribution of random data of most of the same kind of things asymptotically in the normal distribution , Or Gaussian distribution ; That is, the data of transactions are generally convergent ; But one condition is ： Each sample data is independent ;
- The relationship between the real value and the predicted value ：, namely
  - $\varepsilon$ ： A set of errors ;
  - ： A set of real values ;
  - $\hat{y}$ ： A set of predictions ;
  - A random variable , When there are enough samples , According to the central limit theorem , The data is normally distributed ;
- Probability density function ： Used to measure the degree of probability ; Each function has its corresponding probability density function , Divided into two ：
  - Uniform distribution (Uniform Distribution);
  - normal ( gaussian ) Distribution (Normal (Gaussian) Distribution), The points on the normal distribution curve should x Probability density of $f\left ( x \right )$ , Not probability ;
  - Other distribution ...
  - The probability density function obeying the normal distribution is ： $f\left ( x \right ) = \frac{1}{\sigma \sqrt{2\pi }}e^{-\frac{\left ( x - u \right )^{2}}{2\mu ^{2}}}$ ;
- Assumption of loss function of linear regression ： Samples are independent , Sample randomised , Normal distribution ;
- Solutions to linear regression problems ：
  - Analytic method ： Use the formula directly ： $W =\left ( X^{T} \right X )^{-1}X^{T}y$ Calculation Value , The value of is the coefficient of linear equation , That is, the model Model; Massive data is not applicable ;
  - Try again and again ： The most used is Gradient descent method (GD); The gradient descent method is aimed at the loss function , Abscissa is $\theta$ , The ordinate is $J\left ( \theta \right )$ ;
- Understanding ideas ： The return question → Central limit theorem → The data is normally distributed → The loss function is the smallest → Maximum likelihood estimation is maximum → The probability density is the largest → Maximum probability ;
- Deep learning (DL: Deep Learning) Based on machine learning (ML: Machine Learning) The artificial neural network (ANN: Artificial Neural Network)
- Gradient descent method (GD)：
  - Gradient descent formula : $\theta ^{\left ( t+1 \right )} = \theta ^{\left ( t \right )} - \eta \cdot g$ , among $\eta$ It's the learning rate , It is called hyperparametric (hyper parameter), The value is generally small ; Is the derivative of a specific point of the loss function ; This formula is fully expanded as follows ： $\theta _{j}:=\theta _{j}+\eta \cdot \frac{1}{m}\sum_{i=1}^{m}\left ( y^{i}-h_{\theta } x^{i}\right )x_{j}^{i}$
  - At the threshold (threshold) We will stop iterating within , It is approximately 0 Stop iteration when ;
  - Steps of gradient descent method ：
    1. Pick one at random $\theta$ value ;
    2. At present $\theta$ Gradient of ( Derivative at current point , That is, the slope of the change point ), Solve the formula ：
    3. Find the next one according to the gradient descent formula $\theta$ ： It's a negative number , Then increase $\theta$ , Otherwise decrease $\theta$ ;
    4. Repeat step 2 and 3, Until the gradient is within the threshold , If the threshold cannot be reached all the time , It means that the learning rate is too high , Need to adjust the hyperparameter ;
- Batch gradient descent method (BGD：Batch Gradient Descent)：
  - By deriving the loss function j The gradient of dimension is : $g\left ( \right )=\frac{1}{m}\cdot \left ( x_{j} \right )^{T}\cdot \left ( h^{\theta } \cdot X-y\right )$
  - The overall gradient is : $g=\frac{1}{m}\cdot X^{T}\cdot \left ( h^{\theta } \cdot X-y\right )$
  - With the increasing number of iterations , Constant learning rate , The absolute value of the gradient keeps decreasing , So the step size will also become smaller ;
- Partial batch gradient descent method (MBGD：Mini-Batch Gradient Descent)：
- Random gradient descent method (SGD：Stochastic Gradient Descent)：