当前位置:网站首页>Introduction and basic knowledge of machine learning
Introduction and basic knowledge of machine learning
2022-07-01 02:55:00 【zhang. yao】
1. Introduction to machine learning
machine learning : Without programming directly for the problem , A research area that empowers computer learning
For a certain type of task T And performance metrics P, If a computer program T In order to P Performance measured with experience E And self-improvement , So we call this computer program from experience E Study
2. Common algorithms
2.1 Supervised algorithms
There are result marks in the sample data
classification
According to the principle
- Based on Statistics Bayesian classification
- rule-based Decision tree algorithm
- Based on Neural Networks Neural network algorithm
- Distance based KNN(K Nearest neighbor )
Common evaluation indicators
- Accuracy Ratio of predicted results to actual results
- Recall rate The correct coverage of certain results in the prediction results
- F1-Score statistic , Comprehensive evaluation classification model Value 0-1 Between
Regression algorithm
2.1.1 KNN Algorithm
k-Nearest Neighbour One of the simplest classification algorithms , If the nearest to a sample k Most of the data in the samples belong to a certain category , Then it is considered that the sample also belongs to this category , And has the characteristics of the samples on this class ,KNN Can not only predict the classification , Regression analysis can also be done ( Predict specific values )

2.1.2 Decision tree algorithm






2.2. Unsupervised algorithm
There is no result mark in the sample data
2.2.1 clustering
- Hierarchical clustering
- Density clustering
- Partition clustering


2.3 Semi supervised algorithm
Part of the sample data is marked with results

3. Detailed explanation of machine learning algorithm
3.1 Machine learning Basics
3.1.1 The basic concept of machine learning
- input space : The set of all possible values of input is called input space
- Output space : The set of all possible values of the output is called the output space
- features : The property
- Eigenvector : A vector composed of multiple features becomes a feature vector
- The feature space : The space where the eigenvector exists is called the eigenspace
- Hypothetical space : A set of mappings from input space to output space
3.1.2 The essence of machine learning
3.1.3 Three elements of machine learning methods
Method = Model + Strategy + Algorithm
- Model : Mapping from input space to output space , Choose the appropriate assumption space
- Strategy : Learning criteria or rules for calculating rules from numerous hypothesis spaces to optimal models

- Loss function : Used to measure the difference between the predicted results and the real results , The less it's worth , Represents the expected results and the real results It's usually a non negative real valued function , The process of reducing the loss function in various ways is called optimization , The loss function is recorded as L(Y,f(x))
- 0-1 Loss function If the predicted value is equal to the actual value, there will be no loss , Otherwise, it is a complete loss
- Absolute loss function : The absolute value of the difference between the predicted result and the real result
- Square loss function : The square of the difference between the predicted result and the real result
- Logarithmic loss function : Logarithmic functions are monotonic , When solving optimization problems , The result is consistent with the original goal , You can convert multiplication to addition
- Exponential loss function : monotonicity , Excellent properties of nonnegativity . Make the closer to the correct result, the smaller the error
- Folding loss function
- Empirical risk & Risk function
- Structural risk
3.2 Model evaluation and selection
3.2.1 Principles of model selection
3.2.1.1 Basic concepts
- error : The difference between the predicted output value of the model and its real value
- Training : Learning through known sample data , The process of obtaining the model
- Training error : The error between the model action and the training set
- generalization : From special to general , For machine learning, it refers to applying new sample data from the model
- The generalization error : The error of the new sample model
- Model capacity : Ability to fit various models
- Over fitting : A model performs well on the sample , Poor performance on new samples
- Under fitting : The model does not perform well on the training set
- Model selection : Choose the model with the least generalization error
3.2.2 Performance index of the model
3.2.3 The method of model evaluation
- Set aside method : Use 80% The known data set is used as the training set to train the model , Use the rest of 20% Test the trained model as a test set , The test error obtained from the test set is used as the approximate value of the generalization error , Take the model with small test error
- Test set and training set shall be mutually exclusive as far as possible
- The test set and training set are independent and identically distributed
- Cross validation : Divide the dataset into k Two mutually exclusive data subsets . Subset data is sampled hierarchically , Select one data set as the test set at a time , The rest are used as training sets , Conduct k Training and testing , Get the average , This verification method is called k Crossover verification Use different divisions , repeat p Time , be called p Time k Crossover verification
3.2.4 Comparison of model performance
3.2.4.1 Performance measurement of regression algorithm

3.2.4.2 Performance measurement of classification algorithm
边栏推荐
- Borrowing constructor inheritance and composite inheritance
- How to determine the progress bar loaded in the loading interface when opening the game
- How to open a stock account? Also, is it safe to open an account online?
- 鼠标悬停效果三
- xxl-job使用指南
- mybati sql 语句打印
- PTA 1017
- 【微信小程序开发】样式汇总
- [applet project development -- JD mall] uni app commodity classification page (first)
- Dell server restart Idrac method
猜你喜欢

The operation efficiency of the park is improved, and the application platform management of applet container technology is accelerated

Restcloud ETL practice data row column conversion

单片机 MCU 固件打包脚本软件

Cloud native annual technology inventory is released! Ride the wind and waves at the right time

Restcloud ETL实践之数据行列转换
![[machine learning] vectorized computing -- a must on the way of machine learning](/img/3f/d672bb254f845ea705b3a0ca10ee19.png)
[machine learning] vectorized computing -- a must on the way of machine learning

STM32 - DS18B20 temperature sampling of first-line protocol
![Lavaweb [first understanding the solution of subsequent problems]](/img/8a/08cb2736c2c198d926dbe00c004c3f.png)
Lavaweb [first understanding the solution of subsequent problems]

STM32——一线协议之DS18B20温度采样

kubernetes资源对象介绍及常用命令(二)
随机推荐
[applet project development -- Jingdong Mall] user defined search component of uni app (Part 1)
Mouse over effect IV
Gartner research: in China, the adoption of hybrid cloud has become the mainstream trend
DenseNet网络论文学习笔记
性能测试常见面试题
SSH configuration password free login error: /usr/bin/ssh copy ID: error: no identities found solution
园区运营效率提升,小程序容器技术加速应用平台化管理
Servlet [first introduction]
[machine learning] vectorized computing -- a must on the way of machine learning
Why are strings immutable in many programming languages? [repeated] - why are strings immutable in many programming languages? [duplicate]
Detailed data governance knowledge system
How to buy Hong Kong shares in China? What platform is safer?
【小程序项目开发 -- 京东商城】uni-app 商品分类页面(上)
Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and
访问url 404 的错误
[applet project development -- JD mall] uni app commodity classification page (Part 2)
产业互联网中,「小」程序有「大」作为
LeetCode_栈_困难_227.基本计算器(不含乘除)
【Qt】添加第三方库的知识补充
基于Pytorch完整的训练一个神经网络并进行验证



