当前位置:网站首页>Watermelon book machine learning reading notes Chapter 1 Introduction
Watermelon book machine learning reading notes Chapter 1 Introduction
2022-07-27 15:19:00 【Honyelchak】
The first 1 Chapter The introduction
1.1 introduction
Machine learning is a subject
Machine learning is such a subject , It's dedicated to studying how to use Computing , Use experience to improve the performance of the system itself [P0]
The main content of machine learning research
It's about generating... From data on a computer " Model "(model) The algorithm of , namely " Learning algorithms " (learning algorithm).
With learning algorithms , We give it empirical data , It can generate models based on these data ; In the face of a new situation ( For example, I saw a watermelon that was not cut open ), The model will give us the corresponding judgment ( For example, a good melon ) . [P0]
1.2 Basic terminology
features feature/ attribute attribute
A matter reflecting the performance or nature of an event or object in some way , for example " Colour and lustre "“ roots ”“ Knock sound ”, be called " attribute "(attribute) or " features "(feature); [P1]
Property value
Value on property , for example " dark green "、“ It's dark ”, be called " Property value " (attribute value) [P1]
Property space / sample space / input space
The space formed by attributes is called " Property space " (attribute space) 、“ sample space ” (sample space) or " input space ". For example, let's take " Colour and lustre ",“ roots ”," Knock sound " As three axes , Then they are expanded into a three-dimensional space for describing watermelon , Each watermelon can find its own coordinate position in this space . [P1]
Eigenvector
Because each point in the sample space corresponds to a coordinate vector , So we also call an example a " Eigenvector " (feature vector) [P1]
Sample dimension
In a general way , Make D = {x1,x2…xm } Means to contain m Sample datasets , Each example has d Attribute description ( For example, the watermelon data above uses 3 Attributes ), Then each example Xi = (Xi1; Xi2; . . . ; Xid) yes d Dimensional sample space X One of the vectors in , Xi ε X , among Xij yes xi In the j Values on attributes ( For example, No 3 A watermelon is in the 2 The value of each attribute is " Be stiff " ), d Called a sample xi Of " dimension " (dimensionality). [P1]
Training (training) Related terms
The process of learning models from data be called “ Study ”(learning) or " Training " (training) , This is done by executing a learning algorithm .
The data used in the training process is called “ Training data ” (training data) , Each of these samples is called a “ The training sample ” (training sample) , The set of training samples is called “ Training set ” (training set).
The learned model corresponds to some potential law about data , So it's also called " hypothesis "(hypothesis); The underlying law itself , It is called a " The truth " or " real "(ground-truth) , The learning process is to find out or approach the truth . This book sometimes calls models " Learner "(learner) , It can be regarded as the instantiation of learning algorithm in given data and parameter space . [P1]

Mark 、 Examples
for example " (( Colour and lustre : dark green ; Rooty two curled up ; Knock sound = Murmur ), Good melon )" ,“ Good melon " This sample is “ result ” Information is called " Mark ”(label);
With an example of tag information , It is called a “ Examples ”(example).
In a general way , use (xi, yi) It means the first one i A sample , among yi ε Y It's an example xi The tag
Y It's a collection of all the tags , Also known as " Mark space "(label sapce) or " Output space " [P1]
classification ( Two classification | Many classification )、 Return to
If you want to predict Discrete value ( For example, a good melon 、 Bad melon ), Such learning tasks are called “ classification ”(classification);
If you want to predict Continuous value ( For example, watermelon maturity 0.95、0.37) , Such learning tasks are called “ Return to ” (regression).
In a general way , The prediction task is to hope that through the training set {(x1, y1) , (x2 , y2) ,…, (xm, ym)} To study , Create an input space from X To the output space Y Mapping f: X --> y.
- For two categories of tasks , Usually make Y= {-1,+1} or {0, 1};
- For multi category tasks ,IYI >2;
- For the return mission , Y= R(R Is a real set ). [P2]
Dichotomy and multiclassification
- Yes, there are only two categories “ Two classification ” (binary classification) Mission , One of the classes is usually called “ Just like ” (positive class), The other class is " Anti class / Negative class " (negative class);
- When multiple categories are involved , It is called a “ Many classification ” (multi-class classificatio) Mission .
test 、 Test samples
After learning the model , The process of using it for forecasting is called “ test ”(testing) , The predicted sample is called “ Test samples " (testing sample).
For example, learning to f after , For test cases x , We can get its prediction mark y = f(x). [P2]
clustering
- Divide the data in the training set into several groups , Each group is called a " cluster "(cluster);
- These automatically formed clusters may correspond to some potential concept partition , It helps us understand the internal law of data , It can establish the basis for more in-depth analysis of data . [P2]
The difference between clustering and classification
Clustering and classification The difference is whether the category is known ?
- Classification is based on the characteristics of data Divide the data into known categories
- Clustering is an unknown classification , Gather data with similar characteristics into a class , So as to gather into several categories .
Supervised learning 、 Unsupervised learning
according to Whether the training data has tag information , Learning tasks can be roughly divided into two categories " Supervised learning "(supervised learning) and " Unsupervised learning " (unsupervised learning) [P2]
- Classification and regression It is the representative of supervised learning
- clustering Is the representative of unsupervised learning .
Generalization ability
The ability of learning model to apply to new samples , be called " generalization "(generalization) Ability .
Generally speaking , The more training samples , What you get about D( Distribution ) The more information you have , In this way, it is more likely to obtain a model with strong generalization ability through learning .
1.3 Hypothetical space
Inductive learning
“ Learn from examples " It's obviously a process of induction , So it's also called " Inductive learning ” (inductive learning). [P3]
Hypothetical space
All possible values of characteristic attributes are combined into a hypothetical set , Plus the empty set is the hypothetical space . [P4]
Version space
In reality, we often face a lot of hypothetical space , But the learning process is based on the limited sample training set , therefore , There may be multiple assumptions that are consistent with the training set , namely There is a consistent with the training set " Suppose the set ", We call it " Version space " (version space). [P4]
Version space ( Baidu entry )
For two-dimensional space “ rectangular ” hypothesis ( Overview chart ), The green plus sign represents positive samples , Small red circles represent negative samples .
GB Is the maximum generalization positive assumption boundary (maximally General positive hypothesis Boundary), SB Is the most accurate positive hypothesis boundary (maximally Specific positive hypothesis Boundary).
GB And SB The rectangle in the enclosed area is the assumption in version space , That is to say GB And SB The enclosed area is the version space .
In some cases, it is necessary to rank the assumed generalization ability , You can go through GB And SB These two upper and lower bounds represent the version space . In the process of learning , Learning algorithms can only GB、SB These two represent operations on sets .
1.4 Generalize preferences
Generalize preferences / Preference
The preference of machine learning algorithm to some kind of hypothesis in the learning process , be called " Generalize preferences " (inductive bias) , Or for short " Preference " [P5]
Hypothesis type
- As special as possible, that is ” Use cases as few as possible “;
- As general as possible, that is ” As many as possible “
The meaning of preference
Any effective machine learning algorithm must have its inductive preference , Otherwise, it will be confused by the hypothesis that the training set seems to be equivalent in the hypothesis space , And can't produce certain learning results [P5]
Take watermelon algorithm for example , If the algorithm has no preference , Then the equivalent hypothesis on the training set is randomly selected every time when making prediction , So for the new melon ( I haven't seen melons before ), Learning the model sometimes tells me that it is good , Sometimes tell me it's bad , Such learning results are obviously meaningless .
Okam razor
Inductive preference can be seen as a heuristic or... That the learning algorithm itself selects hypotheses in a potentially large hypothesis space “ sense of worth ”.
that , Is there any general principle to guide algorithm establishment “ Correct ” What about preferences ?
“ Okam razor ”(Occam’s razor) Is a common 、 Natural science
The most basic principle in research .
namely " If there are more than one hypothesis consistent with observation , Choose the simplest " [P6]
effect : As a general principle to guide the establishment of the algorithm " Correct " Preference ".
Inductive preference corresponds to the learning algorithm itself " What kind of model is better " Assumptions . In specific practical problems , Is this hypothesis true , That is, whether the inductive preference of the algorithm matches the problem itself , Most of the time, it directly determines whether the algorithm can achieve good performance . [P6]
边栏推荐
- 《剑指Offer》剪绳子
- 网络设备硬核技术内幕 路由器篇 14 从鹿由器到路由器 (中)
- Unity performance optimization ----- occlusion culling of rendering optimization (GPU)
- 3.3-5v conversion
- Stm32f103c8t6 drives sh1106 1.3 "IIC OLED display under Arduino frame
- Usage of countdownlatch in multithreaded environment
- 对话框管理器第三章:创建控件
- See "sense of security" in uncertainty Volvo asked in 2022
- 网络设备硬核技术内幕 路由器篇 4 贾宝玉梦游太虚幻境(下)
- 深圳市人力资源和社会保障局关于发放脱贫人口就业有关补贴的通知
猜你喜欢

LeetCode 面试题 17.21. 直方图的水量 双指针,单调栈/hard

对话框管理器第三章:创建控件

修改frameworks资源文件如何单编

仅做两项修改,苹果就让StyleGANv2获得了3D生成能力
Principle of MOS tube to prevent reverse connection of power supply

What is the breakthrough point of digital transformation in the electronic manufacturing industry? Lean manufacturing is the key

Four kinds of relay schemes driven by single chip microcomputer

LeetCode 783. 二叉搜索树节点最小距离 树/easy

周鸿祎:数字安全能力落后也会挨打

Selenium 报错:session not created: This version of ChromeDriver only supports Chrome version 81
随机推荐
LeetCode 1143. 最长公共子序列 动态规划/medium
3D相关的简单数学知识
适配验证新职业来了!华云数据参与国家《信息系统适配验证师国家职业技能标准》编制
Kotlin的基础用法
Simple mathematical knowledge related to 3D
Web page table table, realizing rapid filtering
4种单片机驱动继电器方案
Unity 鼠标控制第一人称摄像机视角
网络设备硬核技术内幕 路由器篇 19 DPDK(四)
See "sense of security" in uncertainty Volvo asked in 2022
Leetcode 244周赛-赛后补题题解【西兰花选手】
Photoelectric isolation circuit design scheme (six photoelectric isolation circuit diagrams based on optocoupler and ad210an)
LeetCode 456. 132模式 单调栈/medium
仪表放大器和运算放大器优缺点对比
DevEco Studio2.1运行项目报错
关于印发《深圳市工业和信息化局绿色制造试点示范管理暂行办法》的通知
Hdu3117 Fibonacci numbers [mathematics]
事务_基本演示和事务_默认自动提交&手动提交
关于 CMS 垃圾回收器,你真的懂了吗?
南山区民政局关于开展2022年度南山区社会组织等级评估工作的通知
