当前位置:网站首页>Machine learning related concepts

Machine learning related concepts

2022-07-28 14:58:00 Swlaaa

come from :https://www.jianshu.com/p/ddcaeefb5b97

One 、 Concept

  1. fitting , Under fitting , Over fitting
    • fitting : The matching degree of the tester data to the model , Tends to be between under fitting and over fitting ;
    • Under fitting : Learn little ;
    • Over fitting : Over learning ;
  2. variance , deviation
    • variance : Describe the concentration of data ;
    • deviation : Describe the distance to the target ;
  3. Model ≈ law ≈ Equation coefficient ≈ Parameter weight (Weight);
    • namely : A model is a set of parameters used to measure the importance of a set of quantities
  4. machine learning ≈ pattern recognition ;
  5. Algorithm ≈ equation ;
  6. fitting ≈ matching ;
  7. Training ≈ Solve a set of equations ;

Two 、 machine learning

  1. What is machine learning

    • The official definition is no longer said , In layman's terms , It is to let machines think and solve problems like people ; Write a little , Machine learning for programmers : Let the machine solve the equation , Find the optimal set of coefficients ( Model ); It can be understood that machine learning is an algorithm of data mining ;
  2. The category of machine learning

    • Machine learning is an interdisciplinary subject , pattern recognition 、 data mining 、 Statistical learning 、 Computer vision 、 speech recognition 、 Natural language processing, etc , Each is a heavyweight discipline
  3. Classification of machine learning

    • Supervised learning ( The raw data is y)
      • Classified learning
      • Return to learning
    • Unsupervised learning ( The original data does not y)
      • Clustering learning
      • Dimension reduction learning
  4. dependent Python library

    • numpy: Mathematical calculation framework
    • scipy: Physical computing framework
    • pandas: Data analysis framework , It is mainly used to analyze table data
    • matplotlib: The frame of the drawing
    • scikit-learn: The framework of machine learning
    • tensorflow: Google's open source framework for deep learning
    • keras: Open source deep learning framework
  5. Terminology understanding

    • machine learning : According to a set of parameters (w:weight) Find an equation , With the parameter x Transformation of , result y As close as possible to the real results ; Using graphics : Look for a line , Make this line in y As far away from all points as possible in the direction y The average of the values is similar , Even if the loss function is minimal ; Then the equation we get is the core goal of machine learning , We need to solve two problems :① The coefficients of a set of equations ;② The power of the equation : It's a several variable and several degree equation ;
      • Loss function of linear regression :J\left ( \theta \right )= \frac{1}{m}\sum_{i=1}^{m}\left (y _{i}-\hat{y_{i}} \right )^2
        • Loss function (LossFunction) Also known as : Objective function Cost function (CostFunction): The smaller the loss function , The greater the total maximum likelihood estimation , The more accurate our model ;
        • Loss function Is a convex function ;
        • Least square method ,(R)MSE((Root: Square root ) mean squared error), Square mean loss function ( Mean square error );
        • The formula here is here Generate , The mathematical formulas in the short book ( Support is not very good , Patchy look , Set up your own blog over time ):① The in line formula uses $ The parcel ② Block level formulas use $$ The parcel ;
    • The relationship between the predictive value of multiple linear regression and the model :Y = W^{T} \cdot X perhaps Y = \theta ^{T}\cdot X
      • W:weight, Sometimes use \theta Express ;
    • Algorithm : An equation consisting of coefficients floating up and down within a certain range , Parameters are also called dimensions that affect results ;
    • Linear regression
      • linear : Linear transformation of first-order equation ;
      • Return to : Any one x The point on the axis y The overall trend formed by averaging the values on the axis ;
    • Maximum likelihood estimation : It belongs to statistics , Parameters used to solve the probability density function of the sample set ; likelihood :Likelihood; namely : To estimate m The probability of each sample in the normal distribution , When multiplied, it is m The total likelihood of samples appearing in the normal distribution ;
      • Total likelihood Equal to the product of all probabilities , We want to obtain the minimum total likelihood , That is, the probability of getting all is the smallest , However, probability has no solution ( The actual data is discrete , Calculus in mathematics is a continuous value ), When we can find the maximum probability density multiplication , It is equivalent to finding the maximum probability multiplication , So use probability density multiplication to minimize the substitution probability ;
      • according to Central limit theorem , Suppose there are independent events between samples , Error variable ( error ) Randomly generated , Then obey the positive distribution , Therefore, the positive distribution is used when calculating the total likelihood ;
    • Central limit theorem : It belongs to the category of probability theory , It refers to the distribution of random data of most of the same kind of things asymptotically in the normal distribution , Or Gaussian distribution ; That is, the data of transactions are generally convergent ; But one condition is : Each sample data is independent ;
    • The relationship between the real value and the predicted value \varepsilon _{i} = y_{i} - \hat{y_{i}}, namely \varepsilon _{i} = y_{i} - W^{T} \cdot x_{i}
      • \varepsilon: A set of errors ;
      • y: A set of real values ;
      • \hat{y}: A set of predictions ;
      • A random variable , When there are enough samples , According to the central limit theorem , The data is normally distributed ;
    • Probability density function : Used to measure the degree of probability ; Each function has its corresponding probability density function , Divided into two :
      • Uniform distribution (Uniform Distribution);
      • normal ( gaussian ) Distribution (Normal (Gaussian) Distribution), The points on the normal distribution curve should x Probability density of f\left ( x \right ), Not probability ;
      • Other distribution ...
      • The probability density function obeying the normal distribution is :f\left ( x \right ) = \frac{1}{\sigma \sqrt{2\pi }}e^{-\frac{\left ( x - u \right )^{2}}{2\mu ^{2}}};
    • Assumption of loss function of linear regression : Samples are independent , Sample randomised , Normal distribution ;
    • Solutions to linear regression problems :
      • Analytic method : Use the formula directly :W =\left ( X^{T} \right X )^{-1}X^{T}y Calculation W Value ,W The value of is the coefficient of linear equation , That is, the model Model; Massive data is not applicable ;
      • Try again and again : The most used is Gradient descent method (GD); The gradient descent method is aimed at the loss function , Abscissa is \theta, The ordinate is J\left ( \theta \right );
    • Understanding ideas : The return question → Central limit theorem → The data is normally distributed → The loss function is the smallest → Maximum likelihood estimation is maximum → The probability density is the largest → Maximum probability ;
    • Deep learning (DL: Deep Learning) Based on machine learning (ML: Machine Learning) The artificial neural network (ANN: Artificial Neural Network)
    • Gradient descent method (GD):
      • Gradient descent formula :\theta ^{\left ( t+1 \right )} = \theta ^{\left ( t \right )} - \eta \cdot g , among \eta It's the learning rate , It is called hyperparametric (hyper parameter), The value is generally small ;g Is the derivative of a specific point of the loss function ; This formula is fully expanded as follows :\theta _{j}:=\theta _{j}+\eta \cdot \frac{1}{m}\sum_{i=1}^{m}\left ( y^{i}-h_{\theta } x^{i}\right )x_{j}^{i}
      • g At the threshold (threshold) We will stop iterating within , It is approximately 0 Stop iteration when ;
      • Steps of gradient descent method :
        1. Pick one at random \theta value ;
        2. At present \theta Gradient of g( Derivative at current point , That is, the slope of the change point ), Solve the formula :
        3. Find the next one according to the gradient descent formula \thetag It's a negative number , Then increase \theta, Otherwise decrease \theta;
        4. Repeat step 2 and 3, Until the gradient is within the threshold , If the threshold cannot be reached all the time , It means that the learning rate is too high , Need to adjust the hyperparameter ;
    • Batch gradient descent method (BGD:Batch Gradient Descent):
      • By deriving the loss function j The gradient of dimension is :g\left ( \right )=\frac{1}{m}\cdot \left ( x_{j} \right )^{T}\cdot \left ( h^{\theta } \cdot X-y\right )
      • The overall gradient is :g=\frac{1}{m}\cdot X^{T}\cdot \left ( h^{\theta } \cdot X-y\right )
      • With the increasing number of iterations , Constant learning rate , The absolute value of the gradient keeps decreasing , So the step size will also become smaller ;
    • Partial batch gradient descent method (MBGD:Mini-Batch Gradient Descent):
    • Random gradient descent method (SGD:Stochastic Gradient Descent):
原网站

版权声明
本文为[Swlaaa]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/196/202207130924317043.html