当前位置:网站首页>Machine learning related concepts
Machine learning related concepts
2022-07-28 14:58:00 【Swlaaa】
come from :https://www.jianshu.com/p/ddcaeefb5b97
One 、 Concept
- fitting , Under fitting , Over fitting
- fitting : The matching degree of the tester data to the model , Tends to be between under fitting and over fitting ;
- Under fitting : Learn little ;
- Over fitting : Over learning ;
- variance , deviation
- variance : Describe the concentration of data ;
- deviation : Describe the distance to the target ;
- Model ≈ law ≈ Equation coefficient ≈ Parameter weight (Weight);
- namely : A model is a set of parameters used to measure the importance of a set of quantities
- machine learning ≈ pattern recognition ;
- Algorithm ≈ equation ;
- fitting ≈ matching ;
- Training ≈ Solve a set of equations ;
Two 、 machine learning
What is machine learning
- The official definition is no longer said , In layman's terms , It is to let machines think and solve problems like people ; Write a little , Machine learning for programmers : Let the machine solve the equation , Find the optimal set of coefficients ( Model ); It can be understood that machine learning is an algorithm of data mining ;
The category of machine learning
- Machine learning is an interdisciplinary subject , pattern recognition 、 data mining 、 Statistical learning 、 Computer vision 、 speech recognition 、 Natural language processing, etc , Each is a heavyweight discipline
Classification of machine learning
- Supervised learning ( The raw data is y)
- Classified learning
- Return to learning
- Unsupervised learning ( The original data does not y)
- Clustering learning
- Dimension reduction learning
- Supervised learning ( The raw data is y)
dependent Python library
- numpy: Mathematical calculation framework
- scipy: Physical computing framework
- pandas: Data analysis framework , It is mainly used to analyze table data
- matplotlib: The frame of the drawing
- scikit-learn: The framework of machine learning
- tensorflow: Google's open source framework for deep learning
- keras: Open source deep learning framework
Terminology understanding
- machine learning : According to a set of parameters (w:weight) Find an equation , With the parameter x Transformation of , result y As close as possible to the real results ; Using graphics : Look for a line , Make this line in y As far away from all points as possible in the direction y The average of the values is similar , Even if the loss function is minimal ; Then the equation we get is the core goal of machine learning , We need to solve two problems :① The coefficients of a set of equations ;② The power of the equation : It's a several variable and several degree equation ;
- Loss function of linear regression :

- Loss function (LossFunction) Also known as : Objective function 、 Cost function (CostFunction): The smaller the loss function , The greater the total maximum likelihood estimation , The more accurate our model ;
- Loss function Is a convex function ;
- Least square method ,(R)MSE((Root: Square root ) mean squared error), Square mean loss function ( Mean square error );
- The formula here is here Generate , The mathematical formulas in the short book ( Support is not very good , Patchy look , Set up your own blog over time ):① The in line formula uses
$The parcel ② Block level formulas use$$The parcel ;
- Loss function of linear regression :
- The relationship between the predictive value of multiple linear regression and the model :
perhaps 
:weight, Sometimes use
Express ;
- Algorithm : An equation consisting of coefficients floating up and down within a certain range , Parameters are also called dimensions that affect results ;
- Linear regression :
- linear : Linear transformation of first-order equation ;
- Return to : Any one x The point on the axis y The overall trend formed by averaging the values on the axis ;
- Maximum likelihood estimation : It belongs to statistics , Parameters used to solve the probability density function of the sample set ; likelihood :Likelihood; namely : To estimate m The probability of each sample in the normal distribution , When multiplied, it is m The total likelihood of samples appearing in the normal distribution ;
- Total likelihood Equal to the product of all probabilities , We want to obtain the minimum total likelihood , That is, the probability of getting all is the smallest , However, probability has no solution ( The actual data is discrete , Calculus in mathematics is a continuous value ), When we can find the maximum probability density multiplication , It is equivalent to finding the maximum probability multiplication , So use probability density multiplication to minimize the substitution probability ;
- according to Central limit theorem , Suppose there are independent events between samples , Error variable ( error ) Randomly generated , Then obey the positive distribution , Therefore, the positive distribution is used when calculating the total likelihood ;
- Central limit theorem : It belongs to the category of probability theory , It refers to the distribution of random data of most of the same kind of things asymptotically in the normal distribution , Or Gaussian distribution ; That is, the data of transactions are generally convergent ; But one condition is : Each sample data is independent ;
- The relationship between the real value and the predicted value :
, namely 
: A set of errors ;
: A set of real values ;
: A set of predictions ;- A random variable , When there are enough samples , According to the central limit theorem , The data is normally distributed ;
- Probability density function : Used to measure the degree of probability ; Each function has its corresponding probability density function , Divided into two :
- Uniform distribution (Uniform Distribution);
- normal ( gaussian ) Distribution (Normal (Gaussian) Distribution), The points on the normal distribution curve should x Probability density of
, Not probability ; - Other distribution ...
- The probability density function obeying the normal distribution is :
;
- Assumption of loss function of linear regression : Samples are independent , Sample randomised , Normal distribution ;
- Solutions to linear regression problems :
- Analytic method : Use the formula directly :
Calculation
Value ,
The value of is the coefficient of linear equation , That is, the model Model; Massive data is not applicable ; - Try again and again : The most used is Gradient descent method (GD); The gradient descent method is aimed at the loss function , Abscissa is
, The ordinate is
;
- Analytic method : Use the formula directly :
- Understanding ideas : The return question → Central limit theorem → The data is normally distributed → The loss function is the smallest → Maximum likelihood estimation is maximum → The probability density is the largest → Maximum probability ;
- Deep learning (DL: Deep Learning) Based on machine learning (ML: Machine Learning) The artificial neural network (ANN: Artificial Neural Network)
- Gradient descent method (GD):
- Gradient descent formula :
, among
It's the learning rate , It is called hyperparametric (hyper parameter), The value is generally small ;
Is the derivative of a specific point of the loss function ; This formula is fully expanded as follows :
At the threshold (threshold) We will stop iterating within , It is approximately 0 Stop iteration when ;- Steps of gradient descent method :
- Pick one at random
value ; - At present
Gradient of
( Derivative at current point , That is, the slope of the change point ), Solve the formula : - Find the next one according to the gradient descent formula
:
It's a negative number , Then increase
, Otherwise decrease
; - Repeat step 2 and 3, Until the gradient is within the threshold , If the threshold cannot be reached all the time , It means that the learning rate is too high , Need to adjust the hyperparameter ;
- Pick one at random
- Gradient descent formula :
- Batch gradient descent method (BGD:Batch Gradient Descent):
- By deriving the loss function j The gradient of dimension is :

- The overall gradient is :

- With the increasing number of iterations , Constant learning rate , The absolute value of the gradient keeps decreasing , So the step size will also become smaller ;
- By deriving the loss function j The gradient of dimension is :
- Partial batch gradient descent method (MBGD:Mini-Batch Gradient Descent):
- Random gradient descent method (SGD:Stochastic Gradient Descent):
- machine learning : According to a set of parameters (w:weight) Find an equation , With the parameter x Transformation of , result y As close as possible to the real results ; Using graphics : Look for a line , Make this line in y As far away from all points as possible in the direction y The average of the values is similar , Even if the loss function is minimal ; Then the equation we get is the core goal of machine learning , We need to solve two problems :① The coefficients of a set of equations ;② The power of the equation : It's a several variable and several degree equation ;
边栏推荐
- VTK annotation class widget vtkborderwidget
- MITK create module
- How long can we "eat" the dividends of domestic databases?
- 19、 ROS parameter name setting
- Qtableview in QT sets three methods of paging display [easy to understand]
- [leetcode] sticker spelling (dynamic planning)
- VTK vtkcontourwidget extracts regions of interest
- 1st pre class exercise
- 基础架构之日志管理平台及钉钉&邮件告警通知
- Store and guarantee rancher data based on Minio objects
猜你喜欢

Chapter 3 stack, queue and array

9、 Uni popup usage popup effect at the bottom of the drop-down box

Deploy flask on Alibaba cloud server

How to reduce the resolution of only 3D camera but not UI camera

围绕新市民金融聚焦差异化产品设计、智能技术提效及素养教育

Chapter I Introduction
![[Tanabata] Tanabata lonely little frog research edition? The final chapter of Tanabata Festival!](/img/0b/4fc583a3dd4794b0c2b0d64d905be7.png)
[Tanabata] Tanabata lonely little frog research edition? The final chapter of Tanabata Festival!

Redis-Redis在Jedis中的使用

10、 Timestamp

Raspberry pie foundation | summarize and record some operations in the learning process of raspberry pie
随机推荐
SSH service
Qtableview in QT sets three methods of paging display [easy to understand]
Install pytorch geometric on colab, and libcudart.so.10.2 appears when importing the package
[thread safety] what risks may multithreading bring?
21、 TF coordinate transformation (I): coordinate MSG message
BGP experiment
Added the ability of class @published for @cloudstorage
10、 C enum enumeration
How to use the C language library function getchar ()
7、 Detailed explanation of C language function definition
Chapter 3 stack, queue and array
7月29日 ApacheCon|Apache Pulsar 在 vivo 的探索与实践 即将开播
How long can we "eat" the dividends of domestic databases?
Read the introduction tutorial of rainbow
如何只降3D相机不降UI相机的分辨率
Error reason for converting string to long type: to convert to long type, it must be int, double, float type [easy to understand]
RPC (remote procedure call protocol) telecommunication framework
Four basic data types
NCBI experience accumulation
数字化转型安全问题频发,山石网科助力数字政府建设

perhaps 
:weight, Sometimes use
Express ;
, namely 
: A set of errors ;
: A set of real values ;
: A set of predictions ;
, Not probability ;
;
Calculation
;
, among
It's the learning rate , It is called hyperparametric (hyper parameter), The value is generally small ;
Is the derivative of a specific point of the loss function ; This formula is fully expanded as follows :

