当前位置:网站首页>About tensorflow using GPU acceleration
About tensorflow using GPU acceleration
2022-06-27 15:33:00 【Peng Xiang】
We're installing tensorflow-gpu after , When it runs, we can choose to use gpu To speed up , This will undoubtedly help us speed up our training pace .
( Be careful : When our tensorflow-gpu After installation , It will be used by default gpu To train )
Previous bloggers have made their own python The environment is installed tensorflow-gpu, Details refer to :
Tensorflow install
After installation , We use BP The project of handwritten numeral recognition based on neural network algorithm is taken as an example
First of all BP A simple understanding of the principle of neural networks
BP Neural network realizes handwritten digit recognition
# -*- coding: utf-8 -*-
""" Handwritten digit recognition , BP Neural network algorithm """
# -------------------------------------------
''' Use python Parsing binary files '''
import numpy as np
import struct
import random
import tensorflow as tf
from sklearn.model_selection import train_test_split
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Compulsory use cpu
import time
T1 = time.clock()
class LoadData(object):
def __init__(self, file1, file2):
self.file1 = file1
self.file2 = file2
# Load training set
def loadImageSet(self):
binfile = open(self.file1, 'rb') # Read binary
buffers = binfile.read() # buffer
head = struct.unpack_from('>IIII', buffers, 0) # Take before 4 It's an integer , Returns a tuple
offset = struct.calcsize('>IIII') # Locate the data Starting position
imgNum = head[1] # The number of images
width = head[2] # Row number ,28 That's ok
height = head[3] # Number of columns ,28
bits = imgNum*width*height # data Altogether 60000*28*28 Pixel values
bitsString = '>' + str(bits) + 'B' # fmt Format :'>47040000B'
imgs = struct.unpack_from(bitsString, buffers, offset) # take data data , Returns a tuple
binfile.close()
imgs = np.reshape(imgs, [imgNum, width*height])
return imgs, head
# Load training set labels
def loadLabelSet(self):
binfile = open(self.file2, 'rb') # Read binary
buffers = binfile.read() # buffer
head = struct.unpack_from('>II', buffers, 0) # Take before 2 It's an integer , Returns a tuple
offset = struct.calcsize('>II') # Locate the label Starting position
labelNum = head[1] # label Number
numString = '>' + str(labelNum) + 'B'
labels = struct.unpack_from(numString, buffers, offset) # take label data
binfile.close()
labels = np.reshape(labels, [labelNum]) # Transition to list ( One dimensional array )
return labels, head
# Expand the label to 10 Dimension vector
def expand_lables(self):
labels, head = self.loadLabelSet()
expand_lables = []
for label in labels:
zero_vector = np.zeros((1, 10))
zero_vector[0, label] = 1
expand_lables.append(zero_vector)
return expand_lables
# Combine samples and labels into an array [[array(data), array(label)], []...]
def loadData(self):
imags, head = self.loadImageSet()
expand_lables = self.expand_lables()
data = []
for i in range(imags.shape[0]):
imags[i] = imags[i].reshape((1, 784))
data.append([imags[i], expand_lables[i]])
return data
file1 = r'train-images.idx3-ubyte'
file2 = r'train-labels.idx1-ubyte'
trainingData = LoadData(file1, file2)
training_data = trainingData.loadData()
file3 = r't10k-images.idx3-ubyte'
file4 = r't10k-labels.idx1-ubyte'
testData = LoadData(file3, file4)
test_data = testData.loadData()
X_train = [i[0] for i in training_data]
y_train = [i[1][0] for i in training_data]
X_test = [i[0] for i in test_data]
y_test = [i[1][0] for i in test_data]
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=0.1, random_state=7)
# print(np.array(X_test).shape)
# print(np.array(y_test).shape)
# print(np.array(X_train).shape)
# print(np.array(y_train).shape)
INUPUT_NODE = 784
OUTPUT_NODE = 10
LAYER1_NODE = 500
BATCH_SIZE = 200
LERANING_RATE_BASE = 0.005 # Basic learning rate
LERANING_RATE_DACAY = 0.99 # The decay rate of learning rate
REGULARZATION_RATE = 0.01 # Coefficient of regularization term in loss function
TRAINING_STEPS = 30000
MOVING_AVERAGE_DECAY = 0.99 # Sliding average decay rate
# Three layer fully connected neural network , Moving average class
def inference(input_tensor, avg_class, weights1, biases1, weights2, biases2):
if not avg_class:
layer1 = tf.nn.relu(tf.matmul(input_tensor, weights1)+biases1)
# Not used softmax Layer output
return tf.matmul(layer1, weights2)+biases2
else:
layer1 = tf.nn.relu(tf.matmul(input_tensor, avg_class.average(weights1))+
avg_class.average(biases1))
return tf.matmul(layer1, avg_class.average(weights2))+avg_class.average(biases2)
def train(X_train, X_validation, y_train, y_validation, X_test, y_test):
x = tf.placeholder(tf.float32, [None, INUPUT_NODE], name="x-input")
y_ = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name="y-input")
# Generate hidden layers
weights1 = tf.Variable(
tf.truncated_normal([INUPUT_NODE, LAYER1_NODE], stddev=0.1))
biases1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NODE]))
# Generate output layer
weights2 = tf.Variable(
tf.truncated_normal([LAYER1_NODE, OUTPUT_NODE], stddev=0.1))
biases2 = tf.Variable(tf.constant(0.1, shape=[OUTPUT_NODE]))
y = inference(x, None, weights1, biases1, weights2, biases2)
global_step = tf.Variable(0, trainable=False)
variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
variable_averages_op = variable_averages.apply(tf.trainable_variables())
average_y = inference(x, variable_averages, weights1, biases1, weights2, biases2)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
cross_entropy_mean = tf.reduce_mean(cross_entropy)
# L2 Regularization loss
regularizer = tf.contrib.layers.l2_regularizer(REGULARZATION_RATE)
regularization = regularizer(weights1) + regularizer(weights2)
loss = cross_entropy_mean + regularization
# Exponentially decaying learning rate
learning_rate = tf.train.exponential_decay(LERANING_RATE_BASE,
global_step,
len(X_train)/BATCH_SIZE,
LERANING_RATE_DACAY)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
with tf.control_dependencies([train_step, variable_averages_op]):
train_op = tf.no_op(name='train')
correct_prediction = tf.equal(tf.argmax(average_y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
validation_feed = {
x: X_validation, y_: y_validation}
train_feed = {
x: X_train, y_: y_train}
test_feed = {
x: X_test, y_: y_test}
for i in range(TRAINING_STEPS):
if i % 500 == 0:
validate_acc = sess.run(accuracy, feed_dict=validation_feed)
print("after %d training step(s), validation accuracy "
"using average model is %g" % (i, validate_acc))
start = (i * BATCH_SIZE) % len(X_train)
end = min(start + BATCH_SIZE, len(X_train))
sess.run(train_op,
feed_dict={
x: X_train[start:end], y_: y_train[start:end]})
# print('loss:', sess.run(loss))
test_acc = sess.run(accuracy, feed_dict=test_feed)
print("after %d training step(s), test accuracy using"
"average model is %g" % (TRAINING_STEPS, test_acc))
train(X_train, X_validation, y_train, y_validation, X_test, y_test)
T2 = time.clock()
print(' Program running time :%s millisecond ' % ((T2 - T1)*1000))
GPU Running results 

CPU Running results 

From the results of operation , The difference in running time between the two is two times
The blogger's graphics card is too stretched , Looking at other people's tests, the two are very different , Purring , But at least it has some acceleration effect , Bye-bye !
边栏推荐
猜你喜欢

Eolink 推出面向中小企业及初创企业支持计划,为企业赋能!

What is the London Silver unit

Teach you how to realize pynq-z2 bar code recognition

保留有效位数;保留小数点后n位;

Web chat room system based on SSM

基于WEB平台的阅读APP设计与实现

Référence forte, faible, douce et virtuelle de threadlocal

Numerical extension of 27es6

Synchronized and lock escalation

Pri3d: a representation learning method for 3D scene perception using inherent attributes of rgb-d data
随机推荐
[issue 18] share a Netease go classic
Typescript learning materials
QT notes (XXVIII) using qwebengineview to display web pages
Can the teacher tell me what the fixed income + products are mainly invested in?
Design of CAN bus controller based on FPGA (with main codes)
élégant pool de threadpoolexecutor personnalisé
I want to buy fixed income + products, but I don't know what its main investment is. Does anyone know?
CAS comparison and exchange
Maximum profit of stock (offer 63)
Design and implementation of reading app based on Web Platform
On traversal of tree nodes
Cesium 使用MediaStreamRecorder 或者MediaRecorder录屏并下载视频,以及开启摄像头录像。【转】
Elegant custom ThreadPoolExecutor thread pool
PSS:你距離NMS-free+提點只有兩個卷積層 | 2021論文
[digital signal processing] discrete time signal (analog signal, discrete time signal, digital signal | sampling leads to time discrete | quantization leads to amplitude discrete)
Gin general logging Middleware
Today, Teng Xu came out with 37k during the interview. It's really a miracle. He showed me his skill
Teach you how to package and release the mofish Library
#27ES6的数值扩展
[kotlin] the next day