当前位置:网站首页>Wake up wrist - neural network and deep learning (tensorflow application) updating

Wake up wrist - neural network and deep learning (tensorflow application) updating

2022-06-11 23:04:00 Wake wrist

machine learning 、 Introduction to deep learning

At present, the proportion of research papers related to traditional machine learning is indeed not too high , Some people deep learning is make complaints about systematic engineering. , There is no mathematical gold content . But there is no denying that deep learning is too easy to use , It greatly simplifies the overall algorithm analysis and learning process of traditional machine learning , More importantly, in some general fields, tasks refresh the accuracy and accuracy that traditional machine learning algorithms can not achieve .

In depth learning has been particularly popular in recent years , Just like big data a few years ago , However, deep learning mainly belongs to the field of machine learning , So in this article, let's talk about the difference between the algorithm process of machine learning and deep learning .

 Insert picture description here

What is machine learning ?

Simply put, it is the method of converting unordered data into value , In a broad sense , Machine learning is a method that can endow machine learning with the ability to complete the function that direct programming can't . But in the sense of practice , Machine learning is a way of using data , Training out models , Then use a method of model prediction .

  • “ Training ” And “ forecast ” Two processes of machine learning ,“ Model ” Is the intermediate output of the process ,“ Training ” produce “ Model ”,“ Model ” To guide the “ forecast ”.
  • The machine learning method is that the computer makes use of the existing data ( Experience ), And we got a model ( The law of being late ), And use this model to predict the future ( Are you late? ) One way .
  • Let's compare the process of machine learning with that of human induction of historical experience .
     Insert picture description here

The value and importance of machine learning ?

We focus on the ability of these tools to solve practical problems and machine learning practices , Extract rules from data , And used to predict the future .

  1. automation (Automatically) : The machine learning method can be regarded as the algorithm of automatic generation algorithm .

  2. Fast (Fast) : Machine learning can save time . Compared with manual processing , Machine learning method can analyze sample data and generate algorithm more quickly .

  3. Accuracy (Accurate) : Due to the nature of Automation , Machine learning methods can be based on more data 、 Run longer , Generate more accurate decisions .

  4. scale (Scale) : Machine learning methods can provide solutions to problems that cannot be solved by human beings .

Examples of machine learning applications

 Classification problem : Image recognition 、 Spam recognition 
 The return question : Stock price forecast 、 Housing forecast 
 Scheduling problem : Click through rate estimate 、 recommend 
 Generate problems : Image generation 、 Image style conversion 、 Image text description generation 

Machine learning application process
 Insert picture description here
Algorithm flow of machine learning

In fact, machine learning studies data science ( Sounds a little boring ), The following is the main flow of machine learning algorithm : Mainly from 1) Data set preparation 、2) Exploratory analysis of data 、3) Data preprocessing 、4) Data segmentation 、5) Machine learning algorithm modeling 、6) Select machine learning tasks , Of course, the last thing is to evaluate the application of machine learning algorithm to actual data .

Deep learning algorithm set

contain :1. Convolutional neural networks 2. Cyclic neural network 3. Automatic encoder 4. Sparse coding 5. Deep belief network 6. Limit the Boltzmann machine

Neuron - Logistic regression model

Because the simulation object of neural network is the human brain , So before we discuss the specific model , We need to look at the characteristics of the human brain from a biological point of view .

According to biological research , The computing unit of the human brain is the neuron (neuron). It can respond to environmental changes , Then send the information to other neurons . In the human brain , There are about 860 Billion neurons , They are interconnected to form an extremely complex nervous system , The latter is the material basis of human wisdom . So follow the biological structure of the human brain , We first need to build a model to simulate human neurons .

Basic introduction to neurons

A neuron is the smallest structure of a neural network , A neural network is formed by combining multiple neurons . Neurons can also form a logistic regression model after some settings .

 Insert picture description here
The input signal comes from the output of an external or other processing unit , Expressed mathematically as a line vector x = ( x 1 , x 2 , … , x m ) x=(x_1,x_2,…,x_m) x=(x1,x2,,xm), among x i x_i xi For the first time i i i Excitation levels of inputs , m m m Indicates the number of inputs .

Connect to node k The weighting of is expressed as a weighting vector W k = ( w k 1 , w k 2 , … , w k m ) W_k=(w_k1,w_k2,…,w_km) Wk=(wk1,wk2,,wkm), among w k i w_{ki} wki Represents a slave node i i i( Or the first i i i Input points ) To the node k k k A weighted , Or called i i i And k k k Connection strength between nodes .

The main function of the calculation function is to process each input signal to determine its strength ( weighting ); Determine the combined effect of all input signals ( Sum up ); Then determine its output ( Transfer characteristics ).

in other words , When neurons receive information from n The input signals from these other neurons , The neuron adds up the received input values according to a certain weight , The superimposed stimulus intensity S Can be expressed by formula :
S = w 1 x 1 + w 2 x 2 + ⋯ + w n x n = ∑ i = 1 n w i x i S = w_1x_1 + w_2x_2 + \cdots + w_nx_n = \sum_{i=1}^{n}{w_ix_i} S=w1x1+w2x2++wnxn=i=1nwixi

And this output , It is not directly output in a naked way , It is compared with the current neuron threshold , And then through Activation function (Activation Function) Express output outwards , Conceptually, this is called a perceptron (Perceptron), Its model can be expressed by formula :

y = f ( ∑ i = 1 n w i x i − θ ) y = f(\sum_{i=1}^{n}{w_ix_i - \theta}) y=f(i=1nwixiθ)

here θ \theta θ Is the so-called threshold (Threshold), f f f It's the activation function , y y y Is the final output .

Neuronal targets

The goal of the neuron is to adjust the weights according to a large number of input and output examples . therefore , Suppose we show neurons a thousand examples of cat pictures and non cat pictures , And we show what features we show in each example and how certain we are that they are here . Based on thousands of images, neurons decide :

 Which features are important and positive ( For example, every cat has a tail , So the weight must be large and positive )
 Which characteristics are not important ( for example , Only a few pictures have 2 Eyes , So the weight is very small )
 Which characteristics are important and negative ( For example, each picture containing a horn is actually a picture of a unicorn rather than a cat , So the weight must be large and negative )

 Insert picture description here

Neuron - Simple basic calculation problem

A neural network is a group of hierarchical neurons . Every neuron is a mathematical operation , It accepts input , Multiply by its weight , The sum is then passed to the other neurons through the activation function . Neural networks are learning how to classify inputs by adjusting their weights according to the previous example .

It multiplies the input values by their weights , Then add them up , after , It applies the activation function to the summation .

 Insert picture description here

Logical STI model of binary classification

The distribution function of logistic distribution F ( x ) F(x) F(x) The curve of is shown in the figure , The graph is a S Shape curve , The curve grows fastest near the center , Slow growth at both ends . When x x x At infinity , F ( x ) F(x) F(x) Close to the 1; When x x x Infinite hours , F ( x ) F(x) F(x) Close to the 0.

Binomial logistic regression model is a classification model , By conditional probability distribution P ( Y ∣ X ) P(Y|X) P(YX) Express , The form is parameterized logistic distribution ? Here are random variables X The value is a real number , A random variable Y Y Y The value is 1 or 0.

 Insert picture description here
Suppose the distribution of a set of data is as shown in the figure above , What kind of model do you build to distinguish the two categories ?

linear regression model z = W T x + b z = W^Tx + b z=WTx+b, The output value of linear regression model is a real value , The output flag of the two classification task ( In binomial logistic regression , We force positive classes to be marked as 1, Negative class marked as 0, The reasons for this will be mentioned later ), So we consider the real value z z z Convert to 0 / 1 0/1 0/1 value .
most The reason is Want to Of single position rank jump Letter Count : y = { 0 i f z < 0 0.5 i f z = 0 1 i f z > 0 The most ideal unit step function : y = \begin{cases} 0 & if & z < 0 \\ 0.5 & if & z=0 \\ 1 & if & z > 0\end{cases} most The reason is Want to Of single position rank jump Letter Count y=00.51ifififz<0z=0z>0
But the unit step function is discontinuous , We hope to find an alternative function that is close to the unit step function to a certain extent , And hope it is monotonous and differentiable , Logarithmic probability function is such a commonly used substitute function , Logarithmic probability function ( Also called sigmod function ,logistic function )

y = 1 1 + e − z y = \frac{1}{1+e^{-z}} y=1+ez1

For a given input instance x x x, According to the above distribution function, we can get P ( Y = 1 ∣ x ) and P ( Y = 0 ∣ x ) P(Y=1|x) and P(Y=0|x) P(Y=1x) and P(Y=0x) . Logistic regression is to compare the size of two conditional probability values , Will instance x x x Into the category with high probability value .

 Insert picture description here
Neurons have multiple outputs : W W W From vector to matrix , Output W ∗ x W*x Wx Becomes a vector

In Statistics , Multiclass logistic regression is a classification method obtained by generalizing logistic regression into multiclass problems . In more professional terms , It is a model used to predict the probability of different possible results of a dependent variable with category distribution .
 Insert picture description here

Binomial logistic regression model is a binomial classification model , Used in binary classification problems . It can be extended to multiple logistic regression models , For multi classification problems . Suppose a discrete random variable Y The set of possible values for is {1,2,…,K}, So the multiple logistic regression model is :

 Insert picture description here

Multinomial logistic regression is also called softmax Return to , It is a generalization of binomial logistic regression , For multi category classification .

 Insert picture description here

gradient descent 、 Loss function

What is gradient descent ?

First, we can decompose the gradient descent into gradient + falling , So the gradient can be interpreted as a derivative ( For multidimensional, it can be understood as partial derivative ), So it all adds up to : Derivative descent , That's the question , What does the derivative decline do ? Here I give the answer directly : Gradient descent is used to find the corresponding value of the independent variable when finding the minimum value of a function .

A function in this sentence refers to : Loss function (cost/loss function), The direct point is the error function .

The loss function is a parameter whose argument is the algorithm , Function whose value is the error value . Therefore, gradient descent is to find the parameters taken by the algorithm when the error value is minimized .

In machine learning, one kind of algorithm is to generate a curve to fit the existing data , In this way, we can predict the future data , We call this algorithm regression .

Another kind of algorithm also produces a curve , But this curve is used to divide the point into two pieces , Implementation classification , We call this algorithm classification . However, the fitting curves generated by the two algorithms are not completely coincident with the existing points , There is an error between the fitted curve and the true value . So we usually use the value of the loss function to measure the error , Therefore, the more obvious the error value of the loss function is, the better the fitting effect is .

Simple understanding : The loss function represents the error between the predicted value and the actual value .

Introduction to declarative programming

 Insert picture description here

The more declarative , It means there's a lot more to do down there , Or the more powerful it is . It also means a loss of efficiency . The more imperative , It means that the upper layer has more operating space for the lower layer , You can ask the lower level to deal with it in a certain way according to your specific needs .

actually , This pair of concepts should be called “ Declarative interface ” and “ Command interface ”. Maybe it's because it mostly talks about “ Language ” This interface mode is only used , So it's called “ Declarative programming ” and “ Command programming ”.

Of course , You can also think of it as a programming idea , in other words , When building your own code , For the sake of legibility of the structure , Layer the code , Interfaces between layers should be declarative as much as possible . In this way, your code naturally describes what you need from a human perspective on one level ; On the other layer, the computer logic is used to realize the needs of people .

 Insert picture description here

Data processing and model building

TensorFlow It is commonly used in deep learning Python Neural network framework .TensorFlow It's a data flow graph (data flow graphs), Open source software library for numerical calculation .

TensorFlow By the first Google Brain groups ( Affiliated to the Google Machine Intelligence Research Institute ) The researchers and engineers of , For machine learning and deep neural network research , But the versatility of this system makes it widely used in other computing fields .

It is based on DistBelief Second generation AI learning system for R & D .2015 year 11 month 9 Japan ,Google Release artificial intelligence system TensorFlow And announce open source .

. With the help of Anaconda Installation tensorflow

Anaconda Official website :https://www.anaconda.com/

Choose the appropriate Anaconda Installation , Get into Anaconda Its official website , Download the corresponding system version of Anaconda, The current version of the official website is For Windows Python 3.9 • 64-Bit Graphical Installer • 594 MB.
 Insert picture description here
Just like installing ordinary software , Select Default for all , Note that the check box will python3.9 Add to environment variable .

anaconda To configure : open cmd Switch to the domestic image source

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/

TensorFlow install : It is recommended to install tensorflow 1.15 Version of , If you need to install other versions, you only need to install tensorflow Modify the corresponding version number in the command line of .

open cmd function , First create tensorflow 1.15 The environment needed (cmd Command line python -V View version )

conda create -n tensorflow pip python=3.9

And then activate TensorFlow Environmental Science

activate tensorflow
原网站

版权声明
本文为[Wake wrist]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206112300319361.html