当前位置:网站首页>[machine learning] - Introduction to vernacular and explanation of terms
[machine learning] - Introduction to vernacular and explanation of terms
2022-06-26 22:12:00 【FlyLolo】
List of articles
Preface
Machine learning and artificial intelligence , I always think it's mysterious and tall , I often hear that , But it has not been put into practice because of various mathematical concepts .
however , If you don't do full-time relevant post development , Is it feasible to run some learning programs by yourself ? There are a lot of frameworks now , Even if it's not enough , It's also good to know what you can do .
Starting with Zhou Zhihua's 《 Artificial intelligence 》 A Book , Just read the beginning , I think it's very good . Start with examples and gradually deepen , Made the following notes .

One 、 Step by step with common examples to explain what machine learning is
Start with the example of picking watermelon : Why is the color green 、 Root curl 、 The sound of knocking is cloudy , You can judge that it is a good ripe melon ?
Because we ate 、 I've seen a lot of watermelons , So based on the color 、 roots 、 We can make a fairly good judgment on the characteristics of knocking .
** transition , Lead to learning experience :** Allied , We know from our previous learning experience , Worked hard 、 Figured out the concept 、 Finished the homework , Naturally, you will get good results . It can be seen that , We can make effective predictions ? Because we have accumulated a lot of experience , And through the use of experience ? Can make effective decisions about new situations .
** Further, machine learning :** If computer science is about " Algorithm " Knowledge of , So similar , It can be said that machine learning is about " Learning algorithms " Knowledge of .

Two 、 Learn some related terms through the analogy of watermelon
1. Learn in the form of data tables
Summarize the examples of watermelon into the following table :
| Serial number | Colour and lustre | roots | Knock sound |
|---|---|---|---|
| 1 | dark green | Curl up | Murmur |
| 2 | It's dark | Curl up a little | Dull |
| … |
Comparison table , Know some related terms :
- ** Data sets :** The record set of the whole table .
- Example (instance) Or sample (sample): Each record is about an event or object ( Here is a watermelon ) Description of , Called an example (instance) Or sample (sample). Sometimes the whole dataset is also called a " sample " Because it can be seen as a sample of the sample space , It can be judged by the context " sample " Single sample or dataset .
- ** attribute (attribute) Or characteristics (feature):** In the table “ Colour and lustre ”、“ roots ”、“ Knock sound ”.
- ** Property value (attribute va1ue):** In the table “ Colour and lustre ”、“ roots ”、“ Knock sound ” Corresponding value .
2. Remember the coordinate system
For a single record , With “ Colour and lustre ”、“ roots ”、“ Knock sound ” The three attribute identifiers are shown in the figure below :

Each attribute acts as a coordinate axis , It forms a three-dimensional coordinate system . Remember the coordinate system , I hope I haven't returned it all to the teacher .
The space formed by attributes is called **“ Property space ” (attribute space) “ sample space ” (sample space) or " input space "**, That is, the box in the figure .
Of course , In fact, a sample ( watermelon ) There must be more than these three attributes , This is just an example . Each attribute represents a coordinate axis , That will form a d Dimensional space ,d Is the number of attributes of the sample .
Each watermelon can find its own coordinate position in this space . Because each point in space corresponds to a coordinate vector , So we also put … An example is called a **“ Eigenvector ” (feature vector).**
3. Some terms related to training
The process of learning models from data is called **“ Study ” (learning) or " Training " (training)**;
This is done by executing a learning algorithm . The data used in the training process is called **“ Training data ” (training data)** ;
Each of these samples is called a The training sample " (training sample),;
The set of training samples is called " Training set " (training set).
The learned model corresponds to some potential law about data , So it's also called " hypothesis " (hypothesis);
The underlying law itself , It is called a **“ The truth " or " real ” (ground-truth)** ;
The learning process is to find out or approach the truth . This book sometimes calls models " Learner " (learner) , It can be regarded as the instantiation of learning algorithm in given data and parameter space .
Training requires more than the attribute information of the sample , Samples are also required " result " Information , for example " (( Colour and lustre : dark green ; Rooty two curled up ; Knock sound = Murmur ), Good melon )" . Here's information about the example results , for example " Good melon ", be called Mark (labe1);
With an example of tag information , It is called a **“ Examples ” (example)**. Here's the picture

If we want to predict discrete values , for example " Good melon " “ Bad melon ”, Such learning tasks are called **“ classification ” (classification)**;
If you want to predict a continuous value ? For example, watermelon maturity 0.95 0.37, Such learning tasks are called **“ Return to ” (regression)**.
Yes, there are only two categories **“ Two classification ” (binary classification)** Mission , One of the classes is usually called Just like (positive class), The other class is **“ Anti class ” (negative class);**
When multiple categories are involved , It is called a **“ Many classification ” (multi-class classificatio)** Mission .
The ability of learning model to apply to new samples , be called " generalization " (generalization) Ability . The model with strong generalization ability can be well applied to the whole sample space .

3、 ... and 、 Hypothetical space
In the mirror of mathematical axioms , Based on a set of axioms and inference rules, the corresponding theorems are deduced , This is the interpretation ; and " Learn from examples " It's obviously a process of induction , So it's also called " Inductive learning " (inductive learning) ..
We can think of the learning process as a process in which all assumptions (hypothesis) The process of searching in the composed space . Here's the picture

How many possibilities are there , It's a permutation . In reality, we often face a lot of hypothetical space , But the learning process is based on the limited sample training set , therefore , There may be multiple assumptions that are consistent with the training set , That is, there is a consistent " Suppose the set ", We call it **“ Version space ” (version space)**.


Four 、 Generalize preferences
For Graphs 1.2 Watermelon version space , Corresponding ( Color mouth = dark green ; roots = Curl up ; Knock sound = Dull ) This new melon , If we were to use the " Good melon <->( Colour and lustre =* )( roots = Curl up )( Knock sound =*), Then the new melon will be judged as a good melon , And if two other assumptions are used , Then the result of judgment will not be good . If only the watch 1. Training samples in , It is impossible to determine which of the above three assumptions " Better , Then the computer is stupid .
What do I do , Any effective machine learning algorithm must have its inductive preference , Otherwise it will be assumed that the space appears on the training set " equivalent " Confused by your assumptions , And can't produce certain learning results .

that , Is there any general principle to guide algorithm establishment " Correct " What about preferences ? “ Okam razor ” (Occam’s razor) Is a common 、 The most basic principle in Natural Science Research , namely " If there are more than one hypothesis consistent with observation , Choose the simplest .

summary
Away from specific problems , Talk about in an empty way " What learning algorithm is better " meaningless , Because if you think about all the potential problems , Then all learning algorithms are just as good . We should talk about the relative advantages and disadvantages of the algorithm , Specific learning problems must be addressed ; Good learning algorithm in some problems , On other issues, it may not be satisfactory , Whether the inductive preference of the learning algorithm itself matches the problem , It often plays a decisive role .
边栏推荐
- Yolov6: un cadre de détection de cibles rapide et précis est Open Source
- curl: (35) LibreSSL SSL_ connect: SSL_ ERROR_ SYSCALL in connection
- SAP commerce cloud project Spartacus getting started
- Introduction to dependency injection in SAP Spartacus
- VB. Net class library (Advanced - 2 overload)
- How to enable Hana cloud service on SAP BTP platform
- MATLAB与Mysql数据库连接并数据交换(基于ODBC)
- WordPress collection plug-ins are recommended to be free collection plug-ins
- Product design in the extreme Internet Era
- AI智能抠图工具--头发丝都可见
猜你喜欢

Matrix derivation and its chain rule

Yolov6: the fast and accurate target detection framework is open source

VB. Net class library (Advanced - 2 overload)

CVPR 2022 - Interpretation of selected papers of meituan technical team

Leetcode (452) - detonate the balloon with the minimum number of arrows

Restfultoolkitx of idea utility plug-in -- restful interface debugging

CVPR 2022 | 美团技术团队精选论文解读

360手机助手首家接入APP签名服务系统 助力隐私安全分发

leetcode:152. 乘积最大子数组【考虑两个维度的dp】

【题解】剑指 Offer 15. 二进制中1的个数(C语言)
随机推荐
买股票通过中金证券经理的开户二维码开户资金是否安全?想开户炒股
VB. Net class library - 4 screen shots, clipping
Different subsequence problems I
Unity布料系统_Cloth组件(包含动态调用相关)
leetcode:141. 环形链表【哈希表 + 快慢指针】
FPGA -vga display
Are there any risks for the top ten securities companies to register and open accounts? Is it safe?
中金证券经理给的开户二维码办理股票开户安全吗?我想开个户
Talk about my remote work experience | community essay solicitation
Using C to operate SQLSERVER database through SQL statement tutorial
YOLOv6:又快又准的目标检测框架开源啦
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection
[solution] sword finger offer 15 Number of 1 in binary (C language)
Release of dolphin scheduler video tutorial in Shangsi Valley
MacOS環境下使用HomeBrew安裝[email protected]
Unity3D插件 AnyPortrait 2D骨骼動畫制作
Vulnhub's DC8
The network connection is disconnected. Please refresh and try again
Common configuration of jupyterlab
【图像处理基础】基于matlab GUI图像直方图均衡化系统【含Matlab源码 1924期】