当前位置：网站首页>Implementing tensorflow deep learning framework similarflow with numpy

Implementing tensorflow deep learning framework similarflow with numpy

2022-06-12 13:22:00 【Kun Li】

SimpleFlow | PytLabPersonal Blog of ShaoZhengjianghttp://pytlab.github.io/tags/SimpleFlow/ How to understand TensorFlow Calculation chart ？ - You know nlp-paper：NLP relevant Paper Notes and code recurrence nlp-dialogue： An open source whole process dialogue system , Updating ！ explain ： Read the original text with relevant ideas 、 structure 、 Advantages and disadvantages , Refine and record the content , The source text and relevant quotations will be indicated , If there is infringement in the quotation , Bother …https://zhuanlan.zhihu.com/p/344846077 PyTorch Of Autograd - You know PyTorch As a deep learning platform , In the deep learning task NumPy What is the strength of this scientific computing library ？ I think one is PyTorch Provides an automatic derivation mechanism , The second is right GPU Support for . thus it can be seen , Automatic derivation (autograd) yes PyTorch, And most of the others …https://zhuanlan.zhihu.com/p/69294347 Deep Learning From Scratch I: Computational Graphs - sabinasz.nethttps://www.sabinasz.net/deep-learning-from-scratch-i-computational-graphs/ How to find the gradient of neural network back propagation ？_Magic_Anthony The blog of -CSDN Blog I believe that every new neural network （ Now it is called deep learning ） My classmates must have been tortured for a long time by the gradient derivation of back propagation . In all kinds of machine learning classes, I can clearly understand , A neural network is nothing more than a forward calculation Loss, Reverse the gradient of each parameter , Then you can update it according to the gradient . The question is how to find the gradient ？ The scalar example is often used in class , But when you do your homework, you will find that everything is vectorized Of , Each one is a matrix . The differential operation of matrix is unfamiliar to most people , The result is that https://blog.csdn.net/magic_anthony/article/details/77531552 Project git Address ：GitHub - leeguandong/SimilarWork

This piece mainly focuses on the use of numpy Realization tensorflow Form and pytorch The framework of form unfolds , Mainly want to practice numpy->mmcv/lightning-> Algorithm paper->mmdet/mmclas... This road , We should further strengthen our understanding of some partial overall aspects of in-depth learning , Better decouple and model business problems .

1. Calculation chart

By the chain rule , We compute the partial derivatives node by node , On the Internet backward when , The gradient of the final output of the network needs to be calculated by the chain derivation rule , Then optimize the network . The expression similar to the above figure is tensorflow as well as pytorch The basic calculation model . In conclusion , The calculation diagram model consists of nodes and lines , Node representation operator operator, Or called an operator , Lines represent dependencies between calculations , A solid line indicates that there is a data transfer dependency , The data transmitted is the tensor , Dashed lines usually indicate control dependencies , That is, the order of execution . The computational graph is essentially , yes tensorflow Program logic diagrams built in memory , A computational graph can be divided into multiple blocks , And it can run in parallel on many different cpu or gpu On , This is called parallel computing .

tensorflow There are three kinds of calculation diagrams in , They are static calculation diagrams , Dynamic calculation diagram and autograph,tf2 Dynamic calculation chart is adopted by default , That is, every time an operator is used , The operator will be dynamically added to the implicit default calculation graph and executed immediately to get the result , Every time we build a calculation chart , And then after the back propagation , The entire graph is freed in memory , The following example , The second time loss.backward() It's just a mistake , This is also pytorch Calculation method of . Dynamic graphs do not distinguish between the definition and execution of calculation graphs , Execute immediately after definition , be called eager excution.

a = torch.tensor([3.0, 1.0], requires_grad=True)
b = a * a
loss = b.mean()

loss.backward() #  normal 
loss.backward() # RuntimeError

The earlier approach to using static diagrams is divided into two steps , The first step is to define the calculation diagram , The second step is in the conversation session Execute the calculation diagram , Here's how tf1 and tf2 Writing in Chinese , This article uses numpy The simplicity of implementation similarflow This is the static graph method , Static graph has some efficiency advantages over dynamic graph , Dynamic graphs can have many times python The process and tf Of c++ Communication between processes , After the static diagram is built, almost all of them will be tf Used on the kernel c++ perform , Efficient .

import tensorflow as tf
# TensorFlow1.0
# Define the calculation chart 
g = tf.Graph()
with g.as_default():
    #placeholder As a placeholder , Specify the padding object when executing the session 
    x = tf.placeholder(name='x', shape=[], dtype=tf.string)  
    y = tf.placeholder(name='y', shape=[], dtype=tf.string)
    z = tf.string_join([x,y],name = 'join',separator=' ')
# Perform the calculation diagram 
with tf.Session(graph = g) as sess:
    print(sess.run(fetches = z,feed_dict = {x:"hello",y:"world"}))

tf CNOOC one autograph, The running efficiency of dynamic graph is low , It can be used @tf.function The decorator will be ordinary python Functions are converted to and tf1 The corresponding static calculation diagram building code .

2.similarflow

The overall architecture has a calculation diagram ,Graph object , It is the storage node operation And variables , The driving calculation diagram is session, The core is back propagation , Back propagation is achieved by chain derivation ,loss The product of the derivatives for each node , To achieve all operator The derivative method of , In addition, the optimization with gradient descent for derivation , These are the basic architectures , With these, we can design linear classifiers ,softamx And multi-layer perceptron .

graph Design , The core of a directed graph is the node , After defining the nodes, they are placed in a graph for unified management , Forward propagation depends on session.graph It is a calculation chart , The graph consists of nodes ,operation and variable Are all node elements ,placeholder Is user input ：

class Graph(object):
    """    computational graph
    """

    def __init__(self):
        self.operations = []
        self.placeholders = []
        self.variables = []
        self.constants = []

    def __enter__(self):
        global _default_graph
        self.graph = _default_graph
        _default_graph = self
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        global _default_graph
        _default_graph = self.graph

    def as_default(self):
        return self


class Operation(object):
    """ Accept one or more input nodes for simple calculation 
    """

    def __init__(self, *input_nodes):
        self.input_nodes = input_nodes
        self.output_nodes = []

        #  Adds a reference to the current node to the input node output_nodes, In this way, the current node can be found in the input node 
        for node in input_nodes:
            node.output_nodes.append(self)

        #  Add a reference to the current node to the graph , It is convenient to recycle the resources in the diagram 
        _default_graph.operations.append(self)

    def compute(self):
        """ Calculate the output value of the current node according to the value of the input node 
        """
        pass

    def __add__(self, other):
        from .operations import add
        return add(self, other)

    def __neg__(self):
        from .operations import negative
        return negative(self)

    def __sub__(self, other):
        from .operations import add,negative
        return add(self, negative(other))

    def __mul__(self, other):
        from .operations import matmul
        return matmul(self, other)


class Placeholder(object):
    """ No input node , Node data is passed in by the user after the graph is established 
    """

    def __init__(self):
        self.output_nodes = []

        _default_graph.placeholders.append(self)

    def __add__(self, other):
        from .operations import add
        return add(self, other)

    def __neg__(self):
        from .operations import negative
        return negative(self)

    def __sub__(self, other):
        from .operations import add, negative
        return add(self, negative(other))

    def __mul__(self, other):
        from .operations import matmul
        return matmul(self, other)


class Variable(object):
    """ No input node , The node data is changeable during the operation 
    """

    def __init__(self, initial_value=None):
        self.value = initial_value
        self.output_nodes = []

        _default_graph.variables.append(self)

    def __add__(self, other):
        from .operations import add
        return add(self, other)

    def __neg__(self):
        from .operations import negative
        return negative(self)

    def __sub__(self, other):
        from .operations import add, negative
        return add(self, negative(other))

    def __mul__(self, other):
        from .operations import matmul
        return matmul(self, other)


class Constant(object):
    """ No input node , The node data is immutable during the operation 
    """

    def __init__(self, value=None):
        self.value = value
        self.output_nodes = []

        _default_graph.constants.append(self)

    def __add__(self, other):
        from .operations import add
        return add(self, other)

    def __neg__(self):
        from .operations import negative
        return negative(self)

    def __sub__(self, other):
        from .operations import add, negative
        return add(self, negative(other))

    def __mul__(self, other):
        from .operations import matmul
        return matmul(self, other)

1. Overload the operator with a wave , such , The same node can directly +-* 了 , But because there are mutual calls , So every time from import once .

2. multi-purpose numpy Instead of python Own operation .

session：feedforward： Need one session To calculate a calculation chart that has been created , Created graph In fact, the node of creates an empty node , There are no computable values in node ,session Recursion before sequence traversal operation All nodes before , Call the of the node compute Method to get the value .

import numpy as np
from .graph import Operation, Placeholder, Variable, Constant


class Session(object):
    """ feedforward
    """

    def __init__(self):
        self.graph = _default_graph

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        return self.close()

    def close(self):
        all_nodes = (self.graph.operations + self.graph.variables +
                     self.graph.constants + self.graph.placeholders)
        for node in all_nodes:
            node.output = None

    def run(self, operation, feed_dict=None):
        """    Calculate the output value of the node 
        :param operation:
        :param feed_dict:
        :return:
        """
        nodes_postorder = traverse_postorder(operation)

        for node in nodes_postorder:
            if type(node) == Placeholder:
                node.output = feed_dict[node]
            elif (type(node) == Variable) or (type(node) == Constant):
                node.output = node.value
            else:  # Operation
                #  Take out the value of each node 
                node.inputs = [input_node.output for input_node in node.input_nodes]
                #  unpacking , call operation Of compute Calculate the forward value 
                node.output = node.compute(*node.inputs)

            if type(node.output) == list:
                node.output = np.array(node.output)
        return operation.output


def traverse_postorder(operation):
    """
     Get the output values of all nodes required by a node through post order traversal , recursive 
    :param operation:
    :return:
    """
    nodes_postorder = []

    def recurse(node):
        if isinstance(node, Operation):
            for input_node in node.input_nodes:
                recurse(input_node)
        nodes_postorder.append(node)

    recurse(operation)
    return nodes_postorder

operation: Only the forward method is implemented in the operator , The gradient calculation of back propagation is put in a separate file .

import numpy as np
from .graph import Operation


class matmul(Operation):
    def __init__(self, x, y):
        super(matmul, self).__init__(x, y)

    def compute(self, x_value, y_value):
        """ x_value,y_value Is the specific value , Not the type in the node , If you use it directly self.input_nodes Namely garph The nodes in the ,
        init and compute Du Chuan can really look ugly ,==！
        :param x_value:
        :param y_value:
        :return:
        """
        return np.dot(x_value, y_value)


class add(Operation):
    def __init__(self, x, y):
        super(add, self).__init__(x, y)

    def compute(self, x_value, y_value):
        return np.add(x_value, y_value)


class negative(Operation):
    def __init__(self, x):
        super(negative, self).__init__(x)

    def compute(self, x_value):
        return -x_value


class multiply(Operation):
    def __init__(self, x, y):
        super(multiply, self).__init__(x, y)

    def compute(self, x_value, y_value):
        return np.multiply(x_value, y_value)


class sigmoid(Operation):
    def __init__(self, x):
        super(sigmoid, self).__init__(x)

    def compute(self, x_value):
        return 1 / (1 + np.exp(-x_value))


class softmax(Operation):
    def __init__(self, x):
        super(softmax, self).__init__(x)

    def compute(self, x_value):
        return np.exp(x_value) / np.sum(np.exp(x_value), axis=1)[:, None]


class log(Operation):
    def __init__(self, x):
        super(log, self).__init__(x)

    def compute(self, x_value):
        return np.log(x_value)


class square(Operation):
    def __init__(self, x):
        super(square, self).__init__(x)

    def compute(self, x_value):
        return np.square(x_value)


class reduce_sum(Operation):
    def __init__(self, A, axis=None):
        super(reduce_sum, self).__init__(A)
        self.axis = axis

    def compute(self, A_value):
        return np.sum(A_value, self.axis)

gradients： Gradient calculation

Back propagation , The calculation of matrix gradient is an important part of back propagation algorithm . In the network, it is basically to find the derivative of matrix to matrix , Do not directly calculate the derivative of one matrix over another , You can use loss This scalar does the indirect dimensional derivation , Determine the dimensions , Just calculate the gradient .

Operator to find the derivative softmax Pay attention to , In addition, the registrar is used , Later, it is directly used in back propagation op_type In the dictionary middle note .

import numpy as np

_gradient_registry = {}


class RegisterGradient(object):
    def __init__(self, op_type):
        self._op_type = eval(op_type)

    def __call__(self, f):
        _gradient_registry[self._op_type] = f
        return f


@RegisterGradient("add")
def _add_gradient(op, grad):
    """    Sum matrix derivation , Add lines , Column addition 
    :param op:
    :param grad:
    :return:
    """
    x, y = op.inputs[0], op.inputs[1]

    grad_wrt_x = grad
    while np.ndim(grad_wrt_x) > len(np.shape(x)):
        grad_wrt_x = np.sum(grad_wrt_x, axis=0)
    for axis, size in enumerate(np.shape(x)):
        if size == 1:
            grad_wrt_x = np.sum(grad_wrt_x, axis=axis, keepdims=True)

    grad_wrt_y = grad
    while np.ndim(grad_wrt_y) > len(np.shape(y)):
        grad_wrt_y = np.sum(grad_wrt_y, axis=0)
    for axis, size in enumerate(np.shape(y)):
        if size == 1:
            grad_wrt_y = np.sum(grad_wrt_y, axis=axis, keepdims=True)

    return [grad_wrt_x, grad_wrt_y]


@RegisterGradient("matmul")
def _matmul_gradient(op, grad):
    """  seek x Gradient of ：y The transpose , seek y Gradient of ：x The transpose 
    :param op:
    :param grad:
    :return:
    """
    x, y = op.inputs[0], op.inputs[1]
    return [np.dot(grad, np.transpose(y)), np.dot(np.transpose(x), grad)]


@RegisterGradient("sigmoid")
def _sigmoid_gradient(op, grad):
    sigmoid = op.output
    return grad * sigmoid * (1 - sigmoid)


@RegisterGradient("softmax")
def _softmax_gradient(op, grad):
    """ softmax  Reciprocal 
    https://stackoverflow.com/questions/40575841/numpy-calculate-the-derivative-of-the-softmax-function
    :param op:
    :param grad:
    :return:
    """
    softmax = op.output
    return (grad - np.reshape(np.sum(grad * softmax, 1), [-1, 1])) * softmax


@RegisterGradient("log")
def _log_gradient(op, grad):
    x = op.inputs[0]
    return grad / x


@RegisterGradient("multiply")
def _multiply_gradient(op, grad):
    x, y = op.inputs[0], op.inputs[1]
    return [grad * y, grad * x]


@RegisterGradient("negative")
def _negative_gradient(op, grad):
    return -grad


@RegisterGradient("square")
def _square_gradient(op, grad):
    x = op.inputs[0]
    return grad * np.multiply(2.0, x)


@RegisterGradient("reduce_sum")
def _reduce_sum_gradient(op, grad):
    x = op.inputs[0]

    output_shape = np.array(np.shape(x))
    output_shape[op.axis] = 1
    tile_scaling = np.shape(x) // output_shape
    grad = np.reshape(grad, output_shape)
    return np.tile(grad, tile_scaling)

Back propagation ：

import numpy as np
from queue import Queue

from .graph import Operation, Variable
from .gradients import _gradient_registry


def compute_gradients(loss):
    """  The gradient of output to input in each node is known , Backward search the node associated with the lost node from back to front to calculate the gradient by back propagation .
     If we need to calculate other nodes about loss The gradient of needs to start with the loss node to carry out breadth first search on the calculation graph , During search 
     The gradient calculation for each node can calculate the gradient of the node to the traversing node while traversing , It can be used dict Save nodes and gradients .

     Use a first in first out queue to control the traversal sequence , A collection object stores the accessed nodes to prevent repeated access , Then, when traversing, calculate the gradient and put 
     Gradient drop grad_table in 
    :param loss:
    :return:
    """
    grad_table = {}  #  The gradient of the storage node 
    grad_table[loss] = 1

    visited = set()
    queue = Queue()
    visited.add(loss)
    queue.put(loss)

    while not queue.empty():
        node = queue.get()

        #  The node is not loss node , First traverse into queue
        if node != loss:
            grad_table[node] = 0

            for output_node in node.output_nodes:
                lossgrad_wrt_output_node_output = grad_table[output_node]

                output_node_op_type = output_node.__class__
                bprop = _gradient_registry[output_node_op_type]

                lossgrads_wrt_output_node_inputs = bprop(output_node, lossgrad_wrt_output_node_output)

                if len(output_node.input_nodes) == 1:
                    grad_table[node] += lossgrads_wrt_output_node_inputs
                else:
                    #  If a node has multiple outputs , Then multiple gradients are summed 
                    node_index_in_output_node_inputs = output_node.input_nodes.index(node)
                    lossgrad_wrt_node = lossgrads_wrt_output_node_inputs[node_index_in_output_node_inputs]
                    grad_table[node] += lossgrad_wrt_node

        #  Put the node in the queue 
        if hasattr(node, "input_nodes"):
            for input_node in node.input_nodes:
                if input_node not in visited:
                    visited.add(input_node)
                    queue.put(input_node)

    return grad_table

GradientDescentOptimizer： Gradient descent optimization , The loss function is used to calculate the gradient of other nodes , The purpose of getting the gradient is to optimize the parameters , A gradient descent optimizer is implemented to optimize the parameters , Take the reverse direction of gradient as the search direction of each iteration, and then search the local optimal value according to the set step size ：

class GradientDescentOptimizer(object):
    def __init__(self, learning_rate):
        self.learning_rate = learning_rate

    def minimize(self, loss):
        learning_rate = self.learning_rate

        class MinimizationOperation(Operation):
            def compute(self):
                grad_table = compute_gradients(loss)

                for node in grad_table:
                    if type(node) == Variable or type(node) == Constant:
                        grad = grad_table[node]
                        node.value -= learning_rate * grad

        return MinimizationOperation()

for instance ： Linear classification

import numpy as np
import matplotlib.pylab as plt
import similarflow as sf

input_x = np.linspace(-1, 1, 100)
input_y = input_x * 3 + np.random.randn(input_x.shape[0]) * 0.5

x = sf.Placeholder()
y = sf.Placeholder()
w = sf.Variable([[1.0]])
b = sf.Variable(0.0)

# linear = sf.add(sf.matmul(x, w), b)
linear = x * w + b

loss = sf.reduce_sum(sf.square(sf.add(linear, sf.negative(y))))
# loss = sf.reduce_sum(sf.square(linear - y))

train_op = sf.train.GradientDescentOptimizer(learning_rate=0.005).minimize(loss)

feed_dict = {x: np.reshape(input_x, (-1, 1)), y: np.reshape(input_y, (-1, 1))}
# feed_dict = {x: input_x, y: input_y}

with sf.Session() as sess:
    for step in range(20):
        #  Forward direction 
        loss_value = sess.run(loss, feed_dict)
        mse = loss_value / len(input_x)
        print(f"step:{step},loss:{loss_value},mse:{mse}")
        #  Back propagation 
        sess.run(train_op, feed_dict)
    w_value = sess.run(w, feed_dict=feed_dict)
    b_value = sess.run(b, feed_dict=feed_dict)
    print(f"w:{w_value},b:{b_value}")

w_value = float(w_value)
max_x, min_x = np.max(input_x), np.min(input_x)
max_y, min_y = w_value * max_x + b_value, w_value * min_x + b_value

plt.plot([max_x, min_x], [max_y, min_y], color='r')
plt.scatter(input_x, input_y)
plt.show()

perceptron ：

import numpy as np
import similarflow as sf
import matplotlib.pyplot as plt

# Create red points centered at (-2, -2)
red_points = np.random.randn(50, 2) - 2 * np.ones((50, 2))

# Create blue points centered at (2, 2)
blue_points = np.random.randn(50, 2) + 2 * np.ones((50, 2))

X = sf.Placeholder()
y = sf.Placeholder()
W = sf.Variable(np.random.randn(2, 2))
b = sf.Variable(np.random.randn(2))

p = sf.softmax(sf.add(sf.matmul(X, W), b))

loss = sf.negative(sf.reduce_sum(sf.reduce_sum(sf.multiply(y, sf.log(p)), axis=1)))

train_op = sf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(loss)

feed_dict = {
    X: np.concatenate((blue_points, red_points)),
    y: [[1, 0]] * len(blue_points) + [[0, 1]] * len(red_points)
}

with sf.Session() as sess:
    for step in range(100):
        loss_value = sess.run(loss, feed_dict)
        if step % 10 == 0:
            print(f"step:{step},loss:{loss_value}")
        sess.run(train_op, feed_dict)

    # Print final result
    W_value = sess.run(W)
    print("Weight matrix:\n", W_value)
    b_value = sess.run(b)
    print("Bias:\n", b_value)

# Plot a line y = -x
x_axis = np.linspace(-4, 4, 100)
y_axis = -W_value[0][0] / W_value[1][0] * x_axis - b_value[0] / W_value[1][0]
plt.plot(x_axis, y_axis)

# Add the red and blue points
plt.scatter(red_points[:, 0], red_points[:, 1], color='red')
plt.scatter(blue_points[:, 0], blue_points[:, 1], color='blue')
plt.show()

Multilayer perceptron

import numpy as np
import similarflow as sf
import matplotlib.pyplot as plt

# Create two clusters of red points centered at (0, 0) and (1, 1), respectively.
red_points = np.concatenate((
    0.2 * np.random.randn(25, 2) + np.array([[0, 0]] * 25),
    0.2 * np.random.randn(25, 2) + np.array([[1, 1]] * 25)
))

# Create two clusters of blue points centered at (0, 1) and (1, 0), respectively.
blue_points = np.concatenate((
    0.2 * np.random.randn(25, 2) + np.array([[0, 1]] * 25),
    0.2 * np.random.randn(25, 2) + np.array([[1, 0]] * 25)
))

# Plot them
plt.scatter(red_points[:, 0], red_points[:, 1], color='red')
plt.scatter(blue_points[:, 0], blue_points[:, 1], color='blue')
plt.show()

X = sf.Placeholder()
y = sf.Placeholder()
W_hidden = sf.Variable(np.random.randn(2, 2))
b_hidden = sf.Variable(np.random.randn(2))
p_hidden = sf.sigmoid(sf.add(sf.matmul(X, W_hidden), b_hidden))

W_output = sf.Variable(np.random.randn(2, 2))
b_output = sf.Variable(np.random.rand(2))
p_output = sf.softmax(sf.add(sf.matmul(p_hidden, W_output), b_output))

loss = sf.negative(sf.reduce_sum(sf.reduce_sum(sf.multiply(y, sf.log(p_output)), axis=1)))

train_op = sf.train.GradientDescentOptimizer(learning_rate=0.03).minimize(loss)

feed_dict = {
    X: np.concatenate((blue_points, red_points)),
    y: [[1, 0]] * len(blue_points) + [[0, 1]] * len(red_points)
}

with sf.Session() as sess:
    for step in range(100):
        loss_value = sess.run(loss, feed_dict)
        if step % 10 == 0:
            print(f"step:{step},loss:{loss_value}")
        sess.run(train_op, feed_dict)

    # Print final result
    W_hidden_value = sess.run(W_hidden)
    print("Hidden layer weight matrix:\n", W_hidden_value)
    b_hidden_value = sess.run(b_hidden)
    print("Hidden layer bias:\n", b_hidden_value)
    W_output_value = sess.run(W_output)
    print("Output layer weight matrix:\n", W_output_value)
    b_output_value = sess.run(b_output)
    print("Output layer bias:\n", b_output_value)

# Visualize classification boundary
xs = np.linspace(-2, 2)
ys = np.linspace(-2, 2)
pred_classes = []
for x in xs:
    for y in ys:
        pred_class = sess.run(p_output, feed_dict={X: [[x, y]]})[0]
        pred_classes.append((x, y, pred_class.argmax()))
xs_p, ys_p = [], []
xs_n, ys_n = [], []
for x, y, c in pred_classes:
    if c == 0:
        xs_n.append(x)
        ys_n.append(y)
    else:
        xs_p.append(x)
        ys_p.append(y)
plt.plot(xs_p, ys_p, 'ro', xs_n, ys_n, 'bo')
plt.show()

原网站

版权声明
本文为[Kun Li]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203010517337813.html

当前位置：网站首页>Implementing tensorflow deep learning framework similarflow with numpy

Implementing tensorflow deep learning framework similarflow with numpy

边栏推荐

猜你喜欢

随机推荐