当前位置:网站首页>Introduction and basic knowledge of machine learning
Introduction and basic knowledge of machine learning
2022-07-01 02:55:00 【zhang. yao】
1. Introduction to machine learning
machine learning : Without programming directly for the problem , A research area that empowers computer learning
For a certain type of task T And performance metrics P, If a computer program T In order to P Performance measured with experience E And self-improvement , So we call this computer program from experience E Study
2. Common algorithms
2.1 Supervised algorithms
There are result marks in the sample data
classification
According to the principle
- Based on Statistics Bayesian classification
- rule-based Decision tree algorithm
- Based on Neural Networks Neural network algorithm
- Distance based KNN(K Nearest neighbor )
Common evaluation indicators
- Accuracy Ratio of predicted results to actual results
- Recall rate The correct coverage of certain results in the prediction results
- F1-Score statistic , Comprehensive evaluation classification model Value 0-1 Between
Regression algorithm
2.1.1 KNN Algorithm
k-Nearest Neighbour One of the simplest classification algorithms , If the nearest to a sample k Most of the data in the samples belong to a certain category , Then it is considered that the sample also belongs to this category , And has the characteristics of the samples on this class ,KNN Can not only predict the classification , Regression analysis can also be done ( Predict specific values )

2.1.2 Decision tree algorithm






2.2. Unsupervised algorithm
There is no result mark in the sample data
2.2.1 clustering
- Hierarchical clustering
- Density clustering
- Partition clustering


2.3 Semi supervised algorithm
Part of the sample data is marked with results

3. Detailed explanation of machine learning algorithm
3.1 Machine learning Basics
3.1.1 The basic concept of machine learning
- input space : The set of all possible values of input is called input space
- Output space : The set of all possible values of the output is called the output space
- features : The property
- Eigenvector : A vector composed of multiple features becomes a feature vector
- The feature space : The space where the eigenvector exists is called the eigenspace
- Hypothetical space : A set of mappings from input space to output space
3.1.2 The essence of machine learning
3.1.3 Three elements of machine learning methods
Method = Model + Strategy + Algorithm
- Model : Mapping from input space to output space , Choose the appropriate assumption space
- Strategy : Learning criteria or rules for calculating rules from numerous hypothesis spaces to optimal models

- Loss function : Used to measure the difference between the predicted results and the real results , The less it's worth , Represents the expected results and the real results It's usually a non negative real valued function , The process of reducing the loss function in various ways is called optimization , The loss function is recorded as L(Y,f(x))
- 0-1 Loss function If the predicted value is equal to the actual value, there will be no loss , Otherwise, it is a complete loss
- Absolute loss function : The absolute value of the difference between the predicted result and the real result
- Square loss function : The square of the difference between the predicted result and the real result
- Logarithmic loss function : Logarithmic functions are monotonic , When solving optimization problems , The result is consistent with the original goal , You can convert multiplication to addition
- Exponential loss function : monotonicity , Excellent properties of nonnegativity . Make the closer to the correct result, the smaller the error
- Folding loss function
- Empirical risk & Risk function
- Structural risk
3.2 Model evaluation and selection
3.2.1 Principles of model selection
3.2.1.1 Basic concepts
- error : The difference between the predicted output value of the model and its real value
- Training : Learning through known sample data , The process of obtaining the model
- Training error : The error between the model action and the training set
- generalization : From special to general , For machine learning, it refers to applying new sample data from the model
- The generalization error : The error of the new sample model
- Model capacity : Ability to fit various models
- Over fitting : A model performs well on the sample , Poor performance on new samples
- Under fitting : The model does not perform well on the training set
- Model selection : Choose the model with the least generalization error
3.2.2 Performance index of the model
3.2.3 The method of model evaluation
- Set aside method : Use 80% The known data set is used as the training set to train the model , Use the rest of 20% Test the trained model as a test set , The test error obtained from the test set is used as the approximate value of the generalization error , Take the model with small test error
- Test set and training set shall be mutually exclusive as far as possible
- The test set and training set are independent and identically distributed
- Cross validation : Divide the dataset into k Two mutually exclusive data subsets . Subset data is sampled hierarchically , Select one data set as the test set at a time , The rest are used as training sets , Conduct k Training and testing , Get the average , This verification method is called k Crossover verification Use different divisions , repeat p Time , be called p Time k Crossover verification
3.2.4 Comparison of model performance
3.2.4.1 Performance measurement of regression algorithm

3.2.4.2 Performance measurement of classification algorithm
边栏推荐
- 通信协议——分类及其特征介绍
- Mouse over effect I
- MCU firmware packaging Script Software
- 鼠标悬停效果十
- Evaluation of the entry-level models of 5 mainstream smart speakers: apple, Xiaomi, Huawei, tmall, Xiaodu, who is better?
- Prototype and prototype chain in JS
- 鼠标悬停效果三
- 鼠标悬停效果七
- Xception learning notes
- Pychart software deployment gray unable to point
猜你喜欢

UE4渲染管线学习笔记

MnasNet学习笔记

产业互联网中,「小」程序有「大」作为

Optimal Transport系列1

【Qt】添加第三方库的知识补充

RestCloud ETL实践之无标识位实现增量数据同步

彻底解决Lost connection to MySQL server at ‘reading initial communication packet
![[applet project development -- Jingdong Mall] user defined search component of uni app (Part 1)](/img/73/a22ab1dbb46e743ffd5f78b40e66a2.png)
[applet project development -- Jingdong Mall] user defined search component of uni app (Part 1)

Multithreaded printing

Nacos configuration center tutorial
随机推荐
Multithreaded printing
Evaluation of the entry-level models of 5 mainstream smart speakers: apple, Xiaomi, Huawei, tmall, Xiaodu, who is better?
Restcloud ETL实践之数据行列转换
Mouse over effect I
In the industrial Internet, "small" programs have "big" effects
在国内如何买港股的股?用什么平台安全一些?
SAP ALV summary is inconsistent with exported excel summary data
js中的原型和原型链
【小程序项目开发-- 京东商城】uni-app之首页商品楼层
Big orange crazy blog move notice
Huawei operator level router configuration example | configuration static VPLS example
Restcloud ETL practice data row column conversion
Complete training and verification of a neural network based on pytorch
robots. Txt restrict search engine inclusion
彻底解决Lost connection to MySQL server at ‘reading initial communication packet
Classic programming problem: finding the number of daffodils
Pulsar geo replication/ disaster recovery / regional replication
C language a little bit (may increase in the future)
Record a service deployment failure troubleshooting
产业互联网中,「小」程序有「大」作为



