当前位置：网站首页>Watermelon book machine learning reading notes Chapter 1 Introduction

Watermelon book machine learning reading notes Chapter 1 Introduction

2022-07-27 15:19:00 【Honyelchak】

The first 1 Chapter The introduction

1.1 introduction

Machine learning is a subject

Machine learning is such a subject , It's dedicated to studying how to use Computing , Use experience to improve the performance of the system itself [P0]

The main content of machine learning research

It's about generating... From data on a computer " Model "(model) The algorithm of , namely " Learning algorithms " (learning algorithm).

With learning algorithms , We give it empirical data , It can generate models based on these data ; In the face of a new situation ( For example, I saw a watermelon that was not cut open ), The model will give us the corresponding judgment ( For example, a good melon ) . [P0]

1.2 Basic terminology

features feature/ attribute attribute

A matter reflecting the performance or nature of an event or object in some way , for example " Colour and lustre "“ roots ”“ Knock sound ”, be called " attribute "(attribute) or " features "(feature); [P1]

Property value

Value on property , for example " dark green "、“ It's dark ”, be called " Property value " (attribute value) [P1]

Property space / sample space / input space

The space formed by attributes is called " Property space " (attribute space) 、“ sample space ” (sample space) or " input space ". For example, let's take " Colour and lustre ",“ roots ”," Knock sound " As three axes , Then they are expanded into a three-dimensional space for describing watermelon , Each watermelon can find its own coordinate position in this space . [P1]

Eigenvector

Because each point in the sample space corresponds to a coordinate vector , So we also call an example a " Eigenvector " (feature vector) [P1]

Sample dimension

In a general way , Make D = {x1,x2…xm } Means to contain m Sample datasets , Each example has d Attribute description ( For example, the watermelon data above uses 3 Attributes ), Then each example Xi = (Xi1; Xi2; . . . ; Xid) yes d Dimensional sample space X One of the vectors in , Xi ε X , among Xij yes xi In the j Values on attributes ( For example, No 3 A watermelon is in the 2 The value of each attribute is " Be stiff " ), d Called a sample xi Of " dimension " (dimensionality). [P1]

Training (training) Related terms

The process of learning models from data be called “ Study ”(learning) or " Training " (training) , This is done by executing a learning algorithm .
The data used in the training process is called “ Training data ” (training data) , Each of these samples is called a “ The training sample ” (training sample) , The set of training samples is called “ Training set ” (training set).
The learned model corresponds to some potential law about data , So it's also called " hypothesis "(hypothesis); The underlying law itself , It is called a " The truth " or " real "(ground-truth) , The learning process is to find out or approach the truth . This book sometimes calls models " Learner "(learner) , It can be regarded as the instantiation of learning algorithm in given data and parameter space . [P1]

Insert picture description here

Mark 、 Examples

for example " (( Colour and lustre : dark green ; Rooty two curled up ; Knock sound = Murmur ), Good melon )" ,“ Good melon " This sample is “ result ” Information is called " Mark ”(label);

With an example of tag information , It is called a “ Examples ”(example).

In a general way , use (xi, yi) It means the first one i A sample , among yi ε Y It's an example xi The tag
Y It's a collection of all the tags , Also known as " Mark space "(label sapce) or " Output space " [P1]

classification ( Two classification | Many classification )、 Return to

If you want to predict Discrete value ( For example, a good melon 、 Bad melon ), Such learning tasks are called “ classification ”(classification);
If you want to predict Continuous value ( For example, watermelon maturity 0.95、0.37) , Such learning tasks are called “ Return to ” (regression).

In a general way , The prediction task is to hope that through the training set {(x1, y1) , (x2 , y2) ,…, (xm, ym)} To study , Create an input space from X To the output space Y Mapping f: X --> y.

For two categories of tasks , Usually make Y= {-1,+1} or {0, 1};
For multi category tasks ,IYI >2;
For the return mission , Y= R(R Is a real set ). [P2]

Dichotomy and multiclassification

Yes, there are only two categories “ Two classification ” (binary classification) Mission , One of the classes is usually called “ Just like ” (positive class), The other class is " Anti class / Negative class " (negative class);
When multiple categories are involved , It is called a “ Many classification ” (multi-class classificatio) Mission .

test 、 Test samples

After learning the model , The process of using it for forecasting is called “ test ”(testing) , The predicted sample is called “ Test samples " (testing sample).

For example, learning to f after , For test cases x , We can get its prediction mark y = f(x). [P2]

clustering

Divide the data in the training set into several groups , Each group is called a " cluster "(cluster);
These automatically formed clusters may correspond to some potential concept partition , It helps us understand the internal law of data , It can establish the basis for more in-depth analysis of data . [P2]

The difference between clustering and classification

Clustering and classification The difference is whether the category is known ？

Classification is based on the characteristics of data Divide the data into known categories
Clustering is an unknown classification , Gather data with similar characteristics into a class , So as to gather into several categories .

Supervised learning 、 Unsupervised learning

according to Whether the training data has tag information , Learning tasks can be roughly divided into two categories " Supervised learning "(supervised learning) and " Unsupervised learning " (unsupervised learning) [P2]

Classification and regression It is the representative of supervised learning
clustering Is the representative of unsupervised learning .

Generalization ability

The ability of learning model to apply to new samples , be called " generalization "(generalization) Ability .

Generally speaking , The more training samples , What you get about D( Distribution ) The more information you have , In this way, it is more likely to obtain a model with strong generalization ability through learning .

1.3 Hypothetical space

Inductive learning

“ Learn from examples " It's obviously a process of induction , So it's also called " Inductive learning ” (inductive learning). [P3]

Hypothetical space

All possible values of characteristic attributes are combined into a hypothetical set , Plus the empty set is the hypothetical space . [P4]

Version space

In reality, we often face a lot of hypothetical space , But the learning process is based on the limited sample training set , therefore , There may be multiple assumptions that are consistent with the training set , namely There is a consistent with the training set " Suppose the set ", We call it " Version space " (version space). [P4]

Version space ( Baidu entry )

For two-dimensional space “ rectangular ” hypothesis （ Overview chart ）, The green plus sign represents positive samples , Small red circles represent negative samples .
GB Is the maximum generalization positive assumption boundary (maximally General positive hypothesis Boundary), SB Is the most accurate positive hypothesis boundary (maximally Specific positive hypothesis Boundary).
GB And SB The rectangle in the enclosed area is the assumption in version space , That is to say GB And SB The enclosed area is the version space .
In some cases, it is necessary to rank the assumed generalization ability , You can go through GB And SB These two upper and lower bounds represent the version space . In the process of learning , Learning algorithms can only GB、SB These two represent operations on sets .

1.4 Generalize preferences

Generalize preferences / Preference

The preference of machine learning algorithm to some kind of hypothesis in the learning process , be called " Generalize preferences " (inductive bias) , Or for short " Preference " [P5]

Hypothesis type

As special as possible, that is ” Use cases as few as possible “;
As general as possible, that is ” As many as possible “

The meaning of preference

Any effective machine learning algorithm must have its inductive preference , Otherwise, it will be confused by the hypothesis that the training set seems to be equivalent in the hypothesis space , And can't produce certain learning results [P5]

Take watermelon algorithm for example , If the algorithm has no preference , Then the equivalent hypothesis on the training set is randomly selected every time when making prediction , So for the new melon ( I haven't seen melons before ), Learning the model sometimes tells me that it is good , Sometimes tell me it's bad , Such learning results are obviously meaningless .

Okam razor

Inductive preference can be seen as a heuristic or... That the learning algorithm itself selects hypotheses in a potentially large hypothesis space “ sense of worth ”.
that , Is there any general principle to guide algorithm establishment “ Correct ” What about preferences ？

“ Okam razor ”(Occam’s razor) Is a common 、 Natural science
The most basic principle in research .

namely " If there are more than one hypothesis consistent with observation , Choose the simplest " [P6]

effect ： As a general principle to guide the establishment of the algorithm " Correct " Preference ".

Inductive preference corresponds to the learning algorithm itself " What kind of model is better " Assumptions . In specific practical problems , Is this hypothesis true , That is, whether the inductive preference of the algorithm matches the problem itself , Most of the time, it directly determines whether the algorithm can achieve good performance . [P6]

原网站

版权声明
本文为[Honyelchak]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/208/202207271418377520.html