当前位置:网站首页>Machine learning related concepts
Machine learning related concepts
2022-07-28 14:58:00 【Swlaaa】
come from :https://www.jianshu.com/p/ddcaeefb5b97
One 、 Concept
- fitting , Under fitting , Over fitting
- fitting : The matching degree of the tester data to the model , Tends to be between under fitting and over fitting ;
- Under fitting : Learn little ;
- Over fitting : Over learning ;
- variance , deviation
- variance : Describe the concentration of data ;
- deviation : Describe the distance to the target ;
- Model ≈ law ≈ Equation coefficient ≈ Parameter weight (Weight);
- namely : A model is a set of parameters used to measure the importance of a set of quantities
- machine learning ≈ pattern recognition ;
- Algorithm ≈ equation ;
- fitting ≈ matching ;
- Training ≈ Solve a set of equations ;
Two 、 machine learning
What is machine learning
- The official definition is no longer said , In layman's terms , It is to let machines think and solve problems like people ; Write a little , Machine learning for programmers : Let the machine solve the equation , Find the optimal set of coefficients ( Model ); It can be understood that machine learning is an algorithm of data mining ;
The category of machine learning
- Machine learning is an interdisciplinary subject , pattern recognition 、 data mining 、 Statistical learning 、 Computer vision 、 speech recognition 、 Natural language processing, etc , Each is a heavyweight discipline
Classification of machine learning
- Supervised learning ( The raw data is y)
- Classified learning
- Return to learning
- Unsupervised learning ( The original data does not y)
- Clustering learning
- Dimension reduction learning
- Supervised learning ( The raw data is y)
dependent Python library
- numpy: Mathematical calculation framework
- scipy: Physical computing framework
- pandas: Data analysis framework , It is mainly used to analyze table data
- matplotlib: The frame of the drawing
- scikit-learn: The framework of machine learning
- tensorflow: Google's open source framework for deep learning
- keras: Open source deep learning framework
Terminology understanding
- machine learning : According to a set of parameters (w:weight) Find an equation , With the parameter x Transformation of , result y As close as possible to the real results ; Using graphics : Look for a line , Make this line in y As far away from all points as possible in the direction y The average of the values is similar , Even if the loss function is minimal ; Then the equation we get is the core goal of machine learning , We need to solve two problems :① The coefficients of a set of equations ;② The power of the equation : It's a several variable and several degree equation ;
- Loss function of linear regression :

- Loss function (LossFunction) Also known as : Objective function 、 Cost function (CostFunction): The smaller the loss function , The greater the total maximum likelihood estimation , The more accurate our model ;
- Loss function Is a convex function ;
- Least square method ,(R)MSE((Root: Square root ) mean squared error), Square mean loss function ( Mean square error );
- The formula here is here Generate , The mathematical formulas in the short book ( Support is not very good , Patchy look , Set up your own blog over time ):① The in line formula uses
$The parcel ② Block level formulas use$$The parcel ;
- Loss function of linear regression :
- The relationship between the predictive value of multiple linear regression and the model :
perhaps 
:weight, Sometimes use
Express ;
- Algorithm : An equation consisting of coefficients floating up and down within a certain range , Parameters are also called dimensions that affect results ;
- Linear regression :
- linear : Linear transformation of first-order equation ;
- Return to : Any one x The point on the axis y The overall trend formed by averaging the values on the axis ;
- Maximum likelihood estimation : It belongs to statistics , Parameters used to solve the probability density function of the sample set ; likelihood :Likelihood; namely : To estimate m The probability of each sample in the normal distribution , When multiplied, it is m The total likelihood of samples appearing in the normal distribution ;
- Total likelihood Equal to the product of all probabilities , We want to obtain the minimum total likelihood , That is, the probability of getting all is the smallest , However, probability has no solution ( The actual data is discrete , Calculus in mathematics is a continuous value ), When we can find the maximum probability density multiplication , It is equivalent to finding the maximum probability multiplication , So use probability density multiplication to minimize the substitution probability ;
- according to Central limit theorem , Suppose there are independent events between samples , Error variable ( error ) Randomly generated , Then obey the positive distribution , Therefore, the positive distribution is used when calculating the total likelihood ;
- Central limit theorem : It belongs to the category of probability theory , It refers to the distribution of random data of most of the same kind of things asymptotically in the normal distribution , Or Gaussian distribution ; That is, the data of transactions are generally convergent ; But one condition is : Each sample data is independent ;
- The relationship between the real value and the predicted value :
, namely 
: A set of errors ;
: A set of real values ;
: A set of predictions ;- A random variable , When there are enough samples , According to the central limit theorem , The data is normally distributed ;
- Probability density function : Used to measure the degree of probability ; Each function has its corresponding probability density function , Divided into two :
- Uniform distribution (Uniform Distribution);
- normal ( gaussian ) Distribution (Normal (Gaussian) Distribution), The points on the normal distribution curve should x Probability density of
, Not probability ; - Other distribution ...
- The probability density function obeying the normal distribution is :
;
- Assumption of loss function of linear regression : Samples are independent , Sample randomised , Normal distribution ;
- Solutions to linear regression problems :
- Analytic method : Use the formula directly :
Calculation
Value ,
The value of is the coefficient of linear equation , That is, the model Model; Massive data is not applicable ; - Try again and again : The most used is Gradient descent method (GD); The gradient descent method is aimed at the loss function , Abscissa is
, The ordinate is
;
- Analytic method : Use the formula directly :
- Understanding ideas : The return question → Central limit theorem → The data is normally distributed → The loss function is the smallest → Maximum likelihood estimation is maximum → The probability density is the largest → Maximum probability ;
- Deep learning (DL: Deep Learning) Based on machine learning (ML: Machine Learning) The artificial neural network (ANN: Artificial Neural Network)
- Gradient descent method (GD):
- Gradient descent formula :
, among
It's the learning rate , It is called hyperparametric (hyper parameter), The value is generally small ;
Is the derivative of a specific point of the loss function ; This formula is fully expanded as follows :
At the threshold (threshold) We will stop iterating within , It is approximately 0 Stop iteration when ;- Steps of gradient descent method :
- Pick one at random
value ; - At present
Gradient of
( Derivative at current point , That is, the slope of the change point ), Solve the formula : - Find the next one according to the gradient descent formula
:
It's a negative number , Then increase
, Otherwise decrease
; - Repeat step 2 and 3, Until the gradient is within the threshold , If the threshold cannot be reached all the time , It means that the learning rate is too high , Need to adjust the hyperparameter ;
- Pick one at random
- Gradient descent formula :
- Batch gradient descent method (BGD:Batch Gradient Descent):
- By deriving the loss function j The gradient of dimension is :

- The overall gradient is :

- With the increasing number of iterations , Constant learning rate , The absolute value of the gradient keeps decreasing , So the step size will also become smaller ;
- By deriving the loss function j The gradient of dimension is :
- Partial batch gradient descent method (MBGD:Mini-Batch Gradient Descent):
- Random gradient descent method (SGD:Stochastic Gradient Descent):
- machine learning : According to a set of parameters (w:weight) Find an equation , With the parameter x Transformation of , result y As close as possible to the real results ; Using graphics : Look for a line , Make this line in y As far away from all points as possible in the direction y The average of the values is similar , Even if the loss function is minimal ; Then the equation we get is the core goal of machine learning , We need to solve two problems :① The coefficients of a set of equations ;② The power of the equation : It's a several variable and several degree equation ;
边栏推荐
- Hcip day 10
- 22、 TF coordinate transformation (II): static coordinate transformation
- Qt中QTableView设置分页显示的三种方法[通俗易懂]
- 9、 C array explanation
- Swiftui layout - size (bottom)
- 58子站安居,经纪人营销管理平台登录接口加密逆向
- Redis persistence
- 35道MySQL面试必问题图解,这样也太好理解了吧
- It's so hot that solar power can't take off? Hello, head
- Downloading PIP package is too slow
猜你喜欢

如何让照片中的人物笑起来?HMS Core视频编辑服务一键微笑功能,让人物笑容更自然

Digital transformation security issues occur frequently, and Shanshi Netcom helps build a digital government

基于 MinIO 对象存储保障 Rancher 数据

Interviewer: what are the usage scenarios of ThreadLocal? How to avoid memory leakage?

VTK vtkcontourwidget extracts regions of interest

It's so hot that solar power can't take off? Hello, head

看了就会的 Rainbond 入门教程

Simple data analysis using Weka and excel

OKR and grad

使用Weka与Excel进行简单的数据分析
随机推荐
Redis-Redis在Jedis中的使用
linux安装redis
BGP experiment
Install biological sequence de redundancy software CD hit
Added the ability of class @published for @cloudstorage
How to reduce the resolution of only 3D camera but not UI camera
C language: mathematical method of converting decimal system into binary system
Read the introduction tutorial of rainbow
7、 Detailed explanation of C language function definition
How long can we "eat" the dividends of domestic databases?
21、 TF coordinate transformation (I): coordinate MSG message
基础架构之日志管理平台及钉钉&邮件告警通知
Chapter 3 stack, queue and array
Examples of Pareto optimality and Nash equilibrium
Four basic data types
[Tanabata] Tanabata lonely little frog research edition? The final chapter of Tanabata Festival!
First class exercise
SwiftUI 布局 —— 尺寸( 上 )
Redis-配置文件讲解
VTK vtkcontourwidget extracts regions of interest

perhaps 
:weight, Sometimes use
Express ;
, namely 
: A set of errors ;
: A set of real values ;
: A set of predictions ;
, Not probability ;
;
Calculation
;
, among
It's the learning rate , It is called hyperparametric (hyper parameter), The value is generally small ;
Is the derivative of a specific point of the loss function ; This formula is fully expanded as follows :

