当前位置：网站首页>Ml17 neural network practice

Ml17 neural network practice

2022-07-29 06:07:00 【19-year-old flower girl】

The actual combat of neural network

The data set is divided into 50000 Training set ,10000 Test set . But we choose for speed 5000 Training ,500 test .

initialization

input_dim： The input data is 32*32 chromatic .hidden_dim; There are ten neurons in the hidden layer ;num_classes Possibility of outputting ten categories .weight_scale： The weight initialization is smaller ,reg Regularization penalty .

# initialization w,b
def __init__(self, input_dim=3*32*32, hidden_dim=100, num_classes=10, 
							weight_scale=1e-3, reg=0.0):  
		self.params = {
    }    
        self.reg = reg   
        self.params['W1'] = weight_scale * np.random.randn(input_dim, hidden_dim)     
        self.params['b1'] = np.zeros((1, hidden_dim))    
        self.params['W2'] = weight_scale * np.random.randn(hidden_dim, num_classes)  
        self.params['b2'] = np.zeros((1, num_classes))

Forward propagation process

Every layer && Activate the function to the output value score

Data input , After passing through the full connection layer , Then through the activation layer （ With ReLu function ）, Then output the calculated loss value
Insert picture description here

First get the initialized w,b, take x,w,b Passing in functions , The process of forward propagation . Finally, you can get the score .

scores = None
        N = X.shape[0]
        # Unpack variables from the params dictionary
        W1, b1 = self.params['W1'], self.params['b1']
        W2, b2 = self.params['W2'], self.params['b2']
        # The difference between these two functions is , The second one has no activation function 
        h1, cache1 = affine_relu_forward(X, W1, b1)
        out, cache2 = affine_forward(h1, W2, b2)
        # Score value 
        scores = out

Get into affine_relu_forward function , This function calculates the output of the middle layer .

def affine_relu_forward(x, w, b):
  a, fc_cache = affine_forward(x, w, b)
  # Save intermediate values , Including the output and process of the original full connection layer ReLu Output after , For back propagation calculation .
  out, relu_cache = relu_forward(a)
  cache = (fc_cache, relu_cache)
  return out, cache

stay affine_forward Calculation in ,x*w+b

out = np.dot(x_row, w) + b

relu_forward function ： Conduct ReLu operation ,ReLu, Is to figure out max(0,x)

def relu_forward(x):
	out = None    
    out = ReLU(x)    
    cache = x    
    return out, cache

def ReLU(x):    
    """ReLU non-linearity."""    
    return np.maximum(0, x)

softmax function take score Convert to probability

call softmax_loss

data_loss, dscores = softmax_loss(scores, y)

def softmax_loss(x, y):  
	# Normalize the score 
    probs = np.exp(x - np.max(x, axis=1, keepdims=True))    
    probs /= np.sum(probs, axis=1, keepdims=True)    
    N = x.shape[0]
     Use -log（ Scores belonging to the correct category ）  , And calculate the loss value  
    loss = -np.sum(np.log(probs[np.arange(N), y])) / N    
    dx = probs.copy()    
    # Find gradient 
    dx[np.arange(N), y] -= 1    
    dx /= N    
	# Return the loss value and gradient 
    return loss, dx

# Regularization penalty term  1/2w^2
reg_loss = 0.5 * self.reg * np.sum(W1*W1) + 0.5 * self.reg * np.sum(W2*W2)
# Loss function = Loss value + Regularization penalty 
loss = data_loss + reg_loss

We're done softmax After the gradient of, it's time to calculate the gradient of the previous layer , Second floor w₂ and b₂.
call affine_backward, And then call affine_relu_backward.

		dh1, dW2, db2 = affine_backward(dscores, cache2)
        dX, dW1, db1 = affine_relu_backward(dh1, cache1)

about x Find gradient , Derivation is w, It's actually w*dout（ The gradient passed down before ）, Code such as ①, Calculation w The gradient of , Such as ②. Yes b Beg is 1, Then he is equal to the one passed down from above .

#dout yes softmax Gradient of layer transmission ,cache It is the result of the second level calculation .
def affine_backward(dout, cache): 
	x, w, b = cache    
    dx, dw, db = None, None, None   
    #①
    dx = np.dot(dout, w.T)                       # (N,D) 
    # Yes x Standardize  
    dx = np.reshape(dx, x.shape)                 # (N,d1,...,d_k) 
    x_row = x.reshape(x.shape[0], -1)            # (N,D) 
    #②
    dw = np.dot(x_row.T, dout)                   # (D,M) 
    db = np.sum(dout, axis=0, keepdims=True)     # (1,M) 

    return dx, dw, db

affine_relu_backward function , First pair relu Back propagation . Then call... Again affine_backward.

def affine_relu_backward(dout, cache):
  """ Backward pass for the affine-relu convenience layer """
  fc_cache, relu_cache = cache
  da = relu_backward(dout, relu_cache)
  dx, dw, db = affine_backward(da, fc_cache)
  return dx, dw, db

about relu layer , Forward propagation is through max（0,x）, So when we take the derivative x>0 The time derivative is 1 That is, the incoming gradient , When x≤0 when , The derivative is 0, Then the gradient is also 0.

def relu_backward(dout, cache):  
	dx, x = None, cache    
    dx = dout    
    dx[x <= 0] = 0    

    return dx

Add the regularization penalty , Complete back propagation

		dW2 += self.reg * W2
        dW1 += self.reg * W1

Save the gradient value

		grads['W1'] = dW1
        grads['b1'] = db1
        grads['W2'] = dW2
        grads['b2'] = db2

原网站

版权声明
本文为[19-year-old flower girl]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207290519567974.html