当前位置:网站首页>[machine learning] - Introduction to vernacular and explanation of terms
[machine learning] - Introduction to vernacular and explanation of terms
2022-06-26 22:12:00 【FlyLolo】
List of articles
Preface
Machine learning and artificial intelligence , I always think it's mysterious and tall , I often hear that , But it has not been put into practice because of various mathematical concepts .
however , If you don't do full-time relevant post development , Is it feasible to run some learning programs by yourself ? There are a lot of frameworks now , Even if it's not enough , It's also good to know what you can do .
Starting with Zhou Zhihua's 《 Artificial intelligence 》 A Book , Just read the beginning , I think it's very good . Start with examples and gradually deepen , Made the following notes .

One 、 Step by step with common examples to explain what machine learning is
Start with the example of picking watermelon : Why is the color green 、 Root curl 、 The sound of knocking is cloudy , You can judge that it is a good ripe melon ?
Because we ate 、 I've seen a lot of watermelons , So based on the color 、 roots 、 We can make a fairly good judgment on the characteristics of knocking .
** transition , Lead to learning experience :** Allied , We know from our previous learning experience , Worked hard 、 Figured out the concept 、 Finished the homework , Naturally, you will get good results . It can be seen that , We can make effective predictions ? Because we have accumulated a lot of experience , And through the use of experience ? Can make effective decisions about new situations .
** Further, machine learning :** If computer science is about " Algorithm " Knowledge of , So similar , It can be said that machine learning is about " Learning algorithms " Knowledge of .

Two 、 Learn some related terms through the analogy of watermelon
1. Learn in the form of data tables
Summarize the examples of watermelon into the following table :
| Serial number | Colour and lustre | roots | Knock sound |
|---|---|---|---|
| 1 | dark green | Curl up | Murmur |
| 2 | It's dark | Curl up a little | Dull |
| … |
Comparison table , Know some related terms :
- ** Data sets :** The record set of the whole table .
- Example (instance) Or sample (sample): Each record is about an event or object ( Here is a watermelon ) Description of , Called an example (instance) Or sample (sample). Sometimes the whole dataset is also called a " sample " Because it can be seen as a sample of the sample space , It can be judged by the context " sample " Single sample or dataset .
- ** attribute (attribute) Or characteristics (feature):** In the table “ Colour and lustre ”、“ roots ”、“ Knock sound ”.
- ** Property value (attribute va1ue):** In the table “ Colour and lustre ”、“ roots ”、“ Knock sound ” Corresponding value .
2. Remember the coordinate system
For a single record , With “ Colour and lustre ”、“ roots ”、“ Knock sound ” The three attribute identifiers are shown in the figure below :

Each attribute acts as a coordinate axis , It forms a three-dimensional coordinate system . Remember the coordinate system , I hope I haven't returned it all to the teacher .
The space formed by attributes is called **“ Property space ” (attribute space) “ sample space ” (sample space) or " input space "**, That is, the box in the figure .
Of course , In fact, a sample ( watermelon ) There must be more than these three attributes , This is just an example . Each attribute represents a coordinate axis , That will form a d Dimensional space ,d Is the number of attributes of the sample .
Each watermelon can find its own coordinate position in this space . Because each point in space corresponds to a coordinate vector , So we also put … An example is called a **“ Eigenvector ” (feature vector).**
3. Some terms related to training
The process of learning models from data is called **“ Study ” (learning) or " Training " (training)**;
This is done by executing a learning algorithm . The data used in the training process is called **“ Training data ” (training data)** ;
Each of these samples is called a The training sample " (training sample),;
The set of training samples is called " Training set " (training set).
The learned model corresponds to some potential law about data , So it's also called " hypothesis " (hypothesis);
The underlying law itself , It is called a **“ The truth " or " real ” (ground-truth)** ;
The learning process is to find out or approach the truth . This book sometimes calls models " Learner " (learner) , It can be regarded as the instantiation of learning algorithm in given data and parameter space .
Training requires more than the attribute information of the sample , Samples are also required " result " Information , for example " (( Colour and lustre : dark green ; Rooty two curled up ; Knock sound = Murmur ), Good melon )" . Here's information about the example results , for example " Good melon ", be called Mark (labe1);
With an example of tag information , It is called a **“ Examples ” (example)**. Here's the picture

If we want to predict discrete values , for example " Good melon " “ Bad melon ”, Such learning tasks are called **“ classification ” (classification)**;
If you want to predict a continuous value ? For example, watermelon maturity 0.95 0.37, Such learning tasks are called **“ Return to ” (regression)**.
Yes, there are only two categories **“ Two classification ” (binary classification)** Mission , One of the classes is usually called Just like (positive class), The other class is **“ Anti class ” (negative class);**
When multiple categories are involved , It is called a **“ Many classification ” (multi-class classificatio)** Mission .
The ability of learning model to apply to new samples , be called " generalization " (generalization) Ability . The model with strong generalization ability can be well applied to the whole sample space .

3、 ... and 、 Hypothetical space
In the mirror of mathematical axioms , Based on a set of axioms and inference rules, the corresponding theorems are deduced , This is the interpretation ; and " Learn from examples " It's obviously a process of induction , So it's also called " Inductive learning " (inductive learning) ..
We can think of the learning process as a process in which all assumptions (hypothesis) The process of searching in the composed space . Here's the picture

How many possibilities are there , It's a permutation . In reality, we often face a lot of hypothetical space , But the learning process is based on the limited sample training set , therefore , There may be multiple assumptions that are consistent with the training set , That is, there is a consistent " Suppose the set ", We call it **“ Version space ” (version space)**.


Four 、 Generalize preferences
For Graphs 1.2 Watermelon version space , Corresponding ( Color mouth = dark green ; roots = Curl up ; Knock sound = Dull ) This new melon , If we were to use the " Good melon <->( Colour and lustre =* )( roots = Curl up )( Knock sound =*), Then the new melon will be judged as a good melon , And if two other assumptions are used , Then the result of judgment will not be good . If only the watch 1. Training samples in , It is impossible to determine which of the above three assumptions " Better , Then the computer is stupid .
What do I do , Any effective machine learning algorithm must have its inductive preference , Otherwise it will be assumed that the space appears on the training set " equivalent " Confused by your assumptions , And can't produce certain learning results .

that , Is there any general principle to guide algorithm establishment " Correct " What about preferences ? “ Okam razor ” (Occam’s razor) Is a common 、 The most basic principle in Natural Science Research , namely " If there are more than one hypothesis consistent with observation , Choose the simplest .

summary
Away from specific problems , Talk about in an empty way " What learning algorithm is better " meaningless , Because if you think about all the potential problems , Then all learning algorithms are just as good . We should talk about the relative advantages and disadvantages of the algorithm , Specific learning problems must be addressed ; Good learning algorithm in some problems , On other issues, it may not be satisfactory , Whether the inductive preference of the learning algorithm itself matches the problem , It often plays a decisive role .
边栏推荐
- Release of dolphin scheduler video tutorial in Shangsi Valley
- Centos7 compiling and installing redis
- Product design in the extreme Internet Era
- 在哪家券商公司开户最方便最安全可靠
- How SAP Spartacus default routing configuration works
- Leetcode (763) -- dividing letter ranges
- Homebrew installation in MacOS environment [email protected]
- Unity 设置Material、Shader的方法
- YOLOv6:又快又准的目标检测框架开源啦
- Unity布料系統_Cloth組件(包含動態調用相關)
猜你喜欢

【数学建模】基于matlab GUI随机节点的生成树【含Matlab源码 1919期】

MATLAB and MySQL database connection and data exchange (based on ODBC)
![leetcode:6103. Delete the minimum score of the edge from the tree [DFS + connected component + value record of the subgraph]](/img/16/8dc63e6494b3f23e2685e287abc94c.png)
leetcode:6103. Delete the minimum score of the edge from the tree [DFS + connected component + value record of the subgraph]

LabVIEW Arduino TCP/IP远程智能家居系统(项目篇—5)

leetcode:710. 黑名单中的随机数【映射思维】

360 mobile assistant is the first to access the app signature service system to help distribute privacy and security

In 2022, where will the medium and light-weight games go?
![[cloud native topic -51]:kubesphere cloud Governance - operation - step by step deployment of microservice based business applications - database middleware redis microservice deployment process](/img/42/c2a25bb7a9fdad8fe0a048e9af44ca.jpg)
[cloud native topic -51]:kubesphere cloud Governance - operation - step by step deployment of microservice based business applications - database middleware redis microservice deployment process

leetcode:6103. 从树中删除边的最小分数【dfs + 联通分量 + 子图的值记录】

【题解】剑指 Offer 15. 二进制中1的个数(C语言)
随机推荐
[hybrid programming JNI] details of JNA in Chapter 11
尚硅谷DolphinScheduler视频教程发布
Common configuration of jupyterlab
打新债注册开户有没有什么风险?安全吗?
LabVIEW Arduino tcp/ip remote smart home system (project part-5)
FPGA -VGA显示
证券注册开户有没有什么风险?安全吗?
Pass note 【 dynamic planning 】
Unity3D插件 AnyPortrait 2D骨骼動畫制作
leetcode:1567. Length of the longest subarray whose product is a positive number [dp[i] indicates the maximum length ending with I]
Vulnhub's dc9
How to enable Hana cloud service on SAP BTP platform
Detailed explanation of nmap parameters
Is there any risk for flush to register and open an account? Is it safe?
SAP commerce cloud project Spartacus getting started
MacOS環境下使用HomeBrew安裝[email protected]
Installation avec homebrew dans un environnement Mac OS [email protected]
【图像处理基础】基于matlab GUI图像直方图均衡化系统【含Matlab源码 1924期】
leetcode:1567. 乘积为正数的最长子数组长度【dp[i]表示以i结尾的最大长度】
How to write test cases and a brief introduction to go unit test tool testify