当前位置：网站首页>Chapter 01 introduction of [notes of Huashu]

Chapter 01 introduction of [notes of Huashu]

2022-07-28 07:53:00 【Hyacinth's cat redamancy】

【 Flower Book Notes 】 And Chapter01 introduction

Let computers learn from experience , And understand the world according to the concept of hierarchy . Acquire knowledge from experience , It can avoid that human beings formally specify the required knowledge for computers .

Hierarchical concepts allow computers to build simple concepts to learn complex concepts .

These concepts are based on each other's diagrams , very “ deep ”, be called AI Deep learning deep learning

Some timelines

IBM Of Deep Blue 1997 Defeated the world champion in chess Garry Kasparov
Abstract and formal tasks are the most difficult for humans , It's easy for computers

The challenge of AI How to convey these informal knowledge to computers

Knowledge base method (knowledge base) Hard code the knowledge of the world in formal language (hard code), Computers use logical reasoning rules to automatically understand declarations in formal languages
The most famous project is _Cyc(1989)_ Cyc It includes an inference engine and a user CycL Statement database of language description . The declaration is under human supervision
shortcoming ： This is a clumsy process , It is impossible to design formal rules that are complex enough to accurately describe the world . Its reasoning engine may be inconsistent

machine learning machine learning Relying on hard coding is just an indication of the difficulties faced by the system ,AI The system needs to have its own ability to acquire knowledge
The ability to extract patterns from raw data

The introduction of machine learning enables computers to solve problems involving real-world knowledge and make seemingly subjective judgments

Logical regression logistic regression Decide whether to do sth

Naive Bayes native Bayes It can distinguish between garbage and legal mail
The performance of simple machine learning algorithms depends largely on the representation of data representation Express representation
Different representation methods will affect the performance of the algorithm

Many AI tasks It can be solved by taking a suitable feature set first , These features are provided to a simple machine learning algorithm
However, for many tasks , It is difficult to know which features should be extracted
For example, the vehicle may be affected by light and shadow, which indicates that the extracted features are not good

It means learning representation learning Use machine learning to discover the representation itself , Instead of just mapping the representation to the output , The learned representation often performs better than manual design

A typical example ： Self encoder autoencoder
The self encoder consists of an encoder encoder And a decoder decoder form

When designing features or designing algorithms for learning features , The goal is usually to isolate the variation factors that can explain the observed data factor of variation
In many practical tasks , The difficulty is that multiple variation factors may affect an observation at the same time
Obviously, extract high-level data from the original data 、 Abstract features are very difficult

Deep learning deeplearning Express complex expressions through other simple expressions , It solves the core problem of deep learning in representation learning

A typical example of deep learning mode is Feedforward depth network or multilayer perceptron

Multilayer perceptron It's just a mathematical function that maps a set of input values to output values , This function is composed of many simple functions

The idea of learning the correct expression of data is a perspective to explain deep learning , Another perspective is that depth makes computers learn a multi-step computer program

At present, there are mainly two ways to measure the depth of the model. The first is based on the number of sequential instructions that need to be executed by the evaluation architecture , Consider the longest path of the flowchart as depth
Depends on the definition of the calculation step
Use addition, multiplication and logistic sigmoid As an element, the depth of the model is 3
Treat logistic regression as the element itself, and the depth of the model is 1
The other is the method used in the depth probability model , Consider the depth of the graph that describes how concepts relate to each other as the depth of the model

This book is for the readers

Mathematical tools and machine learning concepts
The most mature deep learning algorithm
Forward thinking

The historical trend of deep learning

Many names and fate changes of Neural Networks

Deep learning has experienced three development waves

20 century 40~60 years embryonic form cybernetics cybernetics
20 century 80~90 years Connectionism connectionism
2006 year To really revive in the name of deep learning

Some of the earliest learning algorithms , It aims to simulate the biological learning computing model, namely artificial neural network artificial neural network

The neural view of deep learning is inspired by two main ideas

Brain as an example to prove that intelligent behavior is possible
It's interesting to understand the principles behind the brain and human intelligence

Modern terminology “ Deep learning ” A neuroscience perspective that goes beyond current machine learning models . It appeals to the more general principle of learning multilevel combinations

The earliest predecessor of modern deep learning is a simple linear model from the perspective of neuroscience

20 century 50 years perceptron It is the first one who can learn the weight model according to the input samples of each category

Adaptive linear element Simply return the value of the function itself to predict a real number , And it can also learn to predict these numbers from data

Used to regulate ADALINE The weight training algorithm is called Stochastic gradient descent A special case of

Limitations of linear model limitations of linear model cannot learn XOR function

Connectionist parallel processing

The central idea of connectionism is the intelligent behavior that can be realized when the network connects a large number of simple computing units... When the network connects a large number of simple computing units

Distributed representation

thought ： Each input of the system should be represented by multiple features , And each feature should participate in multiple representations of possible inputs

Another important achievement of the connectionist trend ： Back propagation

Introduce long-term and short-term memory networks

The neural network of deep belief network can use a strategy called greedy layer by layer and training

The increasing amount of data

20 At the beginning of the century , Statisticians use hundreds or thousands of manually produced metrics to study data sets (Garson, 1900; Gosset, 1908; Anderson, 1935; Fisher, 1936).20 century 50 S to 80 years , Biologically inspired machine learning pioneers often use small synthetic datasets , Such as low resolution letter bitmap , Designed to show that neural networks can learn specific functions at low computational cost (Widrow and Hoff, 1960; Rumelhart et al., 1986b).

20 century 80 Age and 90 years , Machine learning has become more statistical , And began to use a larger data set containing thousands of samples , Such as handwritten scanning numbers MNIST Data sets （ Pictured 1.9 ） Shown (LeCun et al., 1998c).

stay 21 The first decade of the century , More complex data sets of the same size continue to appear , Such as CIFAR-10 Data sets (Krizhevsky and Hinton, 2009) .

At the end of this decade and the next five years , Significantly larger data sets （ It contains tens of thousands to tens of millions of samples ） Completely changed the possible realization of deep learning . These datasets include public Street View House Numbers Data sets (Netzer et al., 2011)、 Various versions of ImageNet Data sets (Deng et al., 2009, 2010a; Russakovsky et al., 2014a) as well as Sports-1M Data sets (Karpathy et al., 2014). At the top of the figure , We see that the data set of translated sentences is usually much larger than other data sets , As per Canadian Hansard To make the IBM Data sets (Brown et al., 1990) and WMT 2014 English and French data sets (Schwenk, 2014) .

Insert picture description here

Increasing model size

first , The number of connections between neurons in artificial neural networks is limited by the hardware capability . And now , The number of connections between neurons is mostly due to design considerations . Some artificial neural networks have as many connections per neuron as cats , And for other neural networks , Each neuron connects with smaller mammals （ Like mice ） It is very common to have as many . Even the number of connections per neuron in the human brain is not too high .
Insert picture description here

Since the introduction of hidden cells , The size of artificial neural network is about every 2.4 Double the number of years .

perceptron (Rosenblatt, 1958, 1962)
Adaptive linear element (Widrow and Hoff, 1960)
Neurocognitive machine (Fukushima, 1980)
Early backward propagation network (Rumelhart et al., 1986b)
Cyclic neural network for speech recognition (Robinson and Fallside, 1991)
Multilayer perceptron for speech recognition (Bengio et al., 1991)
Uniform field sigmoid Belief network (Saul et al., 1996)
LeNet-5 (LeCun et al., 1998c)
Echo state network (Jaeger and Haas, 2004)
Deep belief network (Hinton et al., 2006a)
GPU- Accelerate convolution network (Chellapilla et al., 2006)
Deep Boltzmann machine (Salakhutdinov and Hinton, 2009a)
GPU- Accelerate deep belief networks (Raina et al., 2009a)
Unsupervised convolutional network (Jarrett et al., 2009b)
GPU- Accelerate multilayer perceptron (Ciresan et al., 2010)
OMP-1 The Internet (Coates and Ng, 2011)
Distributed self encoder (Le et al., 2012)
Multi-GPU Convolution network (Krizhevsky et al., 2012a)
COTS HPC Unsupervised convolutional network (Coates et al., 2013)
GoogLeNet (Szegedy et al., 2014a)

Increasing precision 、 Complexity and impact on the real world

Due to the depth of the network reached in ImageNet The scale necessary for competition in large-scale visual recognition challenges , They win every year , And produce a lower and lower error rate .

Insert picture description here