当前位置:网站首页>Introduction and basic knowledge of machine learning
Introduction and basic knowledge of machine learning
2022-07-01 02:55:00 【zhang. yao】
1. Introduction to machine learning
machine learning : Without programming directly for the problem , A research area that empowers computer learning
For a certain type of task T And performance metrics P, If a computer program T In order to P Performance measured with experience E And self-improvement , So we call this computer program from experience E Study
2. Common algorithms
2.1 Supervised algorithms
There are result marks in the sample data
classification
According to the principle
- Based on Statistics Bayesian classification
- rule-based Decision tree algorithm
- Based on Neural Networks Neural network algorithm
- Distance based KNN(K Nearest neighbor )
Common evaluation indicators
- Accuracy Ratio of predicted results to actual results
- Recall rate The correct coverage of certain results in the prediction results
- F1-Score statistic , Comprehensive evaluation classification model Value 0-1 Between
Regression algorithm
2.1.1 KNN Algorithm
k-Nearest Neighbour One of the simplest classification algorithms , If the nearest to a sample k Most of the data in the samples belong to a certain category , Then it is considered that the sample also belongs to this category , And has the characteristics of the samples on this class ,KNN Can not only predict the classification , Regression analysis can also be done ( Predict specific values )

2.1.2 Decision tree algorithm






2.2. Unsupervised algorithm
There is no result mark in the sample data
2.2.1 clustering
- Hierarchical clustering
- Density clustering
- Partition clustering


2.3 Semi supervised algorithm
Part of the sample data is marked with results

3. Detailed explanation of machine learning algorithm
3.1 Machine learning Basics
3.1.1 The basic concept of machine learning
- input space : The set of all possible values of input is called input space
- Output space : The set of all possible values of the output is called the output space
- features : The property
- Eigenvector : A vector composed of multiple features becomes a feature vector
- The feature space : The space where the eigenvector exists is called the eigenspace
- Hypothetical space : A set of mappings from input space to output space
3.1.2 The essence of machine learning
3.1.3 Three elements of machine learning methods
Method = Model + Strategy + Algorithm
- Model : Mapping from input space to output space , Choose the appropriate assumption space
- Strategy : Learning criteria or rules for calculating rules from numerous hypothesis spaces to optimal models

- Loss function : Used to measure the difference between the predicted results and the real results , The less it's worth , Represents the expected results and the real results It's usually a non negative real valued function , The process of reducing the loss function in various ways is called optimization , The loss function is recorded as L(Y,f(x))
- 0-1 Loss function If the predicted value is equal to the actual value, there will be no loss , Otherwise, it is a complete loss
- Absolute loss function : The absolute value of the difference between the predicted result and the real result
- Square loss function : The square of the difference between the predicted result and the real result
- Logarithmic loss function : Logarithmic functions are monotonic , When solving optimization problems , The result is consistent with the original goal , You can convert multiplication to addition
- Exponential loss function : monotonicity , Excellent properties of nonnegativity . Make the closer to the correct result, the smaller the error
- Folding loss function
- Empirical risk & Risk function
- Structural risk
3.2 Model evaluation and selection
3.2.1 Principles of model selection
3.2.1.1 Basic concepts
- error : The difference between the predicted output value of the model and its real value
- Training : Learning through known sample data , The process of obtaining the model
- Training error : The error between the model action and the training set
- generalization : From special to general , For machine learning, it refers to applying new sample data from the model
- The generalization error : The error of the new sample model
- Model capacity : Ability to fit various models
- Over fitting : A model performs well on the sample , Poor performance on new samples
- Under fitting : The model does not perform well on the training set
- Model selection : Choose the model with the least generalization error
3.2.2 Performance index of the model
3.2.3 The method of model evaluation
- Set aside method : Use 80% The known data set is used as the training set to train the model , Use the rest of 20% Test the trained model as a test set , The test error obtained from the test set is used as the approximate value of the generalization error , Take the model with small test error
- Test set and training set shall be mutually exclusive as far as possible
- The test set and training set are independent and identically distributed
- Cross validation : Divide the dataset into k Two mutually exclusive data subsets . Subset data is sampled hierarchically , Select one data set as the test set at a time , The rest are used as training sets , Conduct k Training and testing , Get the average , This verification method is called k Crossover verification Use different divisions , repeat p Time , be called p Time k Crossover verification
3.2.4 Comparison of model performance
3.2.4.1 Performance measurement of regression algorithm

3.2.4.2 Performance measurement of classification algorithm
边栏推荐
- Od modify DLL and exe pop-up contents [OllyDbg]
- [machine learning] vectorized computing -- a must on the way of machine learning
- Catch 222222
- Magnetic manometer and measurement of foreign coins
- Résumé des styles de développement d'applets Wechat
- Mouse over effect 8
- Restcloud ETl数据通过时间戳实现增量数据同步
- Gartner研究:在中国,混合云的采用已成为主流趋势
- Mouse over effect VI
- 鼠标悬停效果六
猜你喜欢

Introduction to kubernetes resource objects and common commands (II)

Xception学习笔记

UE4渲染管线学习笔记

Restcloud ETL实践之数据行列转换

Here comes the share creators budding talent training program!

基于Pytorch完整的训练一个神经网络并进行验证

Saving images of different depths in opencv

旷世轻量化网络ShuffulNetV2学习笔记

【小程序项目开发--京东商城】uni-app之自定义搜索组件(上)

Proxy support and SNI routing of pulsar
随机推荐
Pulsar theme compression
Const and the secret of pointers
Restcloud ETL WebService data synchronization to local
【机器学习】向量化计算 -- 机器学习路上必经路
[machine learning] vectorized computing -- a must on the way of machine learning
Optimal Transport系列1
Applet custom top navigation bar, uni app wechat applet custom top navigation bar
如果我在北京,到哪里开户比较好?另外,手机开户安全么?
【微信小程序開發】樣式匯總
基于OPENCV和图像减法的PCB缺陷检测
Youmeng (a good helper for real-time monitoring of software exceptions: crash) access tutorial (the easiest tutorial for Xiaobai with some foundation)
Here comes the share creators budding talent training program!
go: finding module for package
PCB defect detection based on OpenCV and image subtraction
鼠标悬停效果七
Record a service deployment failure troubleshooting
xxl-job使用指南
Proxy support and SNI routing of pulsar
Introduction to kubernetes resource objects and common commands (II)
Detailed data governance knowledge system



