当前位置:网站首页>Watermelon book machine learning reading notes Chapter 1 Introduction
Watermelon book machine learning reading notes Chapter 1 Introduction
2022-07-27 15:19:00 【Honyelchak】
The first 1 Chapter The introduction
1.1 introduction
Machine learning is a subject
Machine learning is such a subject , It's dedicated to studying how to use Computing , Use experience to improve the performance of the system itself [P0]
The main content of machine learning research
It's about generating... From data on a computer " Model "(model) The algorithm of , namely " Learning algorithms " (learning algorithm).
With learning algorithms , We give it empirical data , It can generate models based on these data ; In the face of a new situation ( For example, I saw a watermelon that was not cut open ), The model will give us the corresponding judgment ( For example, a good melon ) . [P0]
1.2 Basic terminology
features feature/ attribute attribute
A matter reflecting the performance or nature of an event or object in some way , for example " Colour and lustre "“ roots ”“ Knock sound ”, be called " attribute "(attribute) or " features "(feature); [P1]
Property value
Value on property , for example " dark green "、“ It's dark ”, be called " Property value " (attribute value) [P1]
Property space / sample space / input space
The space formed by attributes is called " Property space " (attribute space) 、“ sample space ” (sample space) or " input space ". For example, let's take " Colour and lustre ",“ roots ”," Knock sound " As three axes , Then they are expanded into a three-dimensional space for describing watermelon , Each watermelon can find its own coordinate position in this space . [P1]
Eigenvector
Because each point in the sample space corresponds to a coordinate vector , So we also call an example a " Eigenvector " (feature vector) [P1]
Sample dimension
In a general way , Make D = {x1,x2…xm } Means to contain m Sample datasets , Each example has d Attribute description ( For example, the watermelon data above uses 3 Attributes ), Then each example Xi = (Xi1; Xi2; . . . ; Xid) yes d Dimensional sample space X One of the vectors in , Xi ε X , among Xij yes xi In the j Values on attributes ( For example, No 3 A watermelon is in the 2 The value of each attribute is " Be stiff " ), d Called a sample xi Of " dimension " (dimensionality). [P1]
Training (training) Related terms
The process of learning models from data be called “ Study ”(learning) or " Training " (training) , This is done by executing a learning algorithm .
The data used in the training process is called “ Training data ” (training data) , Each of these samples is called a “ The training sample ” (training sample) , The set of training samples is called “ Training set ” (training set).
The learned model corresponds to some potential law about data , So it's also called " hypothesis "(hypothesis); The underlying law itself , It is called a " The truth " or " real "(ground-truth) , The learning process is to find out or approach the truth . This book sometimes calls models " Learner "(learner) , It can be regarded as the instantiation of learning algorithm in given data and parameter space . [P1]

Mark 、 Examples
for example " (( Colour and lustre : dark green ; Rooty two curled up ; Knock sound = Murmur ), Good melon )" ,“ Good melon " This sample is “ result ” Information is called " Mark ”(label);
With an example of tag information , It is called a “ Examples ”(example).
In a general way , use (xi, yi) It means the first one i A sample , among yi ε Y It's an example xi The tag
Y It's a collection of all the tags , Also known as " Mark space "(label sapce) or " Output space " [P1]
classification ( Two classification | Many classification )、 Return to
If you want to predict Discrete value ( For example, a good melon 、 Bad melon ), Such learning tasks are called “ classification ”(classification);
If you want to predict Continuous value ( For example, watermelon maturity 0.95、0.37) , Such learning tasks are called “ Return to ” (regression).
In a general way , The prediction task is to hope that through the training set {(x1, y1) , (x2 , y2) ,…, (xm, ym)} To study , Create an input space from X To the output space Y Mapping f: X --> y.
- For two categories of tasks , Usually make Y= {-1,+1} or {0, 1};
- For multi category tasks ,IYI >2;
- For the return mission , Y= R(R Is a real set ). [P2]
Dichotomy and multiclassification
- Yes, there are only two categories “ Two classification ” (binary classification) Mission , One of the classes is usually called “ Just like ” (positive class), The other class is " Anti class / Negative class " (negative class);
- When multiple categories are involved , It is called a “ Many classification ” (multi-class classificatio) Mission .
test 、 Test samples
After learning the model , The process of using it for forecasting is called “ test ”(testing) , The predicted sample is called “ Test samples " (testing sample).
For example, learning to f after , For test cases x , We can get its prediction mark y = f(x). [P2]
clustering
- Divide the data in the training set into several groups , Each group is called a " cluster "(cluster);
- These automatically formed clusters may correspond to some potential concept partition , It helps us understand the internal law of data , It can establish the basis for more in-depth analysis of data . [P2]
The difference between clustering and classification
Clustering and classification The difference is whether the category is known ?
- Classification is based on the characteristics of data Divide the data into known categories
- Clustering is an unknown classification , Gather data with similar characteristics into a class , So as to gather into several categories .
Supervised learning 、 Unsupervised learning
according to Whether the training data has tag information , Learning tasks can be roughly divided into two categories " Supervised learning "(supervised learning) and " Unsupervised learning " (unsupervised learning) [P2]
- Classification and regression It is the representative of supervised learning
- clustering Is the representative of unsupervised learning .
Generalization ability
The ability of learning model to apply to new samples , be called " generalization "(generalization) Ability .
Generally speaking , The more training samples , What you get about D( Distribution ) The more information you have , In this way, it is more likely to obtain a model with strong generalization ability through learning .
1.3 Hypothetical space
Inductive learning
“ Learn from examples " It's obviously a process of induction , So it's also called " Inductive learning ” (inductive learning). [P3]
Hypothetical space
All possible values of characteristic attributes are combined into a hypothetical set , Plus the empty set is the hypothetical space . [P4]
Version space
In reality, we often face a lot of hypothetical space , But the learning process is based on the limited sample training set , therefore , There may be multiple assumptions that are consistent with the training set , namely There is a consistent with the training set " Suppose the set ", We call it " Version space " (version space). [P4]
Version space ( Baidu entry )
For two-dimensional space “ rectangular ” hypothesis ( Overview chart ), The green plus sign represents positive samples , Small red circles represent negative samples .
GB Is the maximum generalization positive assumption boundary (maximally General positive hypothesis Boundary), SB Is the most accurate positive hypothesis boundary (maximally Specific positive hypothesis Boundary).
GB And SB The rectangle in the enclosed area is the assumption in version space , That is to say GB And SB The enclosed area is the version space .
In some cases, it is necessary to rank the assumed generalization ability , You can go through GB And SB These two upper and lower bounds represent the version space . In the process of learning , Learning algorithms can only GB、SB These two represent operations on sets .
1.4 Generalize preferences
Generalize preferences / Preference
The preference of machine learning algorithm to some kind of hypothesis in the learning process , be called " Generalize preferences " (inductive bias) , Or for short " Preference " [P5]
Hypothesis type
- As special as possible, that is ” Use cases as few as possible “;
- As general as possible, that is ” As many as possible “
The meaning of preference
Any effective machine learning algorithm must have its inductive preference , Otherwise, it will be confused by the hypothesis that the training set seems to be equivalent in the hypothesis space , And can't produce certain learning results [P5]
Take watermelon algorithm for example , If the algorithm has no preference , Then the equivalent hypothesis on the training set is randomly selected every time when making prediction , So for the new melon ( I haven't seen melons before ), Learning the model sometimes tells me that it is good , Sometimes tell me it's bad , Such learning results are obviously meaningless .
Okam razor
Inductive preference can be seen as a heuristic or... That the learning algorithm itself selects hypotheses in a potentially large hypothesis space “ sense of worth ”.
that , Is there any general principle to guide algorithm establishment “ Correct ” What about preferences ?
“ Okam razor ”(Occam’s razor) Is a common 、 Natural science
The most basic principle in research .
namely " If there are more than one hypothesis consistent with observation , Choose the simplest " [P6]
effect : As a general principle to guide the establishment of the algorithm " Correct " Preference ".
Inductive preference corresponds to the learning algorithm itself " What kind of model is better " Assumptions . In specific practical problems , Is this hypothesis true , That is, whether the inductive preference of the algorithm matches the problem itself , Most of the time, it directly determines whether the algorithm can achieve good performance . [P6]
边栏推荐
- LeetCode 1143. 最长公共子序列 动态规划/medium
- Unity性能优化------DrawCall
- DXGI acquisition process
- What is the breakthrough point of digital transformation in the electronic manufacturing industry? Lean manufacturing is the key
- 【WORK】关于技术架构
- CAN总线的EMC设计方案
- 网络设备硬核技术内幕 路由器篇 20 DPDK (五)
- 什么是Tor?Tor浏览器更新有什么用?
- Unity3D学习笔记10——纹理数组
- Sword finger offer cut rope
猜你喜欢

Kubernetes CNI 分类/运行机制

IJCAI 2022杰出论文公布,大陆作者中稿298篇拿下两项第一

基于FIFO IDT7202-12的数字存储示波器

适配验证新职业来了!华云数据参与国家《信息系统适配验证师国家职业技能标准》编制

Design scheme of digital oscilloscope based on stm32
USB接口电磁兼容(EMC)解决方案

Dialog manager Chapter 3: create controls

3.3-5v转换
仪表放大器和运算放大器优缺点对比

See "sense of security" in uncertainty Volvo asked in 2022
随机推荐
adb命令 (安装apk包格式:adb install 电脑上apk地址包名)
reflex
Introduction of the connecting circuit between ad7606 and stm32
USB接口电磁兼容(EMC)解决方案
Design scheme of digital oscilloscope based on stm32
《剑指Offer》两个链表的第一个公共结点
STM32学习之CAN控制器简介
DevEco Studio2.1运行项目报错
Kotlin的基础用法
LeetCode 341.扁平化嵌套列表迭代器 dfs,栈/ Medium
STM32 can -- can ID filter analysis
谷歌团队推出新Transformer,优化全景分割方案|CVPR 2022
Principle of MOS tube to prevent reverse connection of power supply
ad7606与stm32连接电路介绍
4种单片机驱动继电器方案
代码覆盖率统计神器-jacoco工具实战
[Yunxiang book club issue 13] packaging format of video files
Code coverage statistical artifact -jacobo tool practice
Digital storage oscilloscope based on FIFO idt7202-12
《剑指Offer》 链表反转
