当前位置:网站首页>Tensorflow [actual Google deep learning framework] uses HDF5 to process large data sets with tflearn
Tensorflow [actual Google deep learning framework] uses HDF5 to process large data sets with tflearn
2022-06-11 22:39:00 【Li Xiang superb】
List of articles
- 1.HDF5 file
- 2.github Code :
- 3. Source code address :
- 4.tf Introduction collection
1.HDF5 file
up to now , All the datasets we use can be loaded into memory . For small data sets , We can load all the image data into memory , Pre treatment , And forward propagation processing . However , For large data sets ( such as ImageNet), We need to create a data generator , Only a small number of datasets are accessed at a time ( such as mini-batch), Then on batch Data preprocessing and forward propagation .
Keras Module is very convenient for data loading , You can use the original file path on disk as input to the training process . You don't need to store the entire data set in memory —— Just for Keras The data generator provides the image path , The generator automatically loads data from the path and propagates forward .
However , This method is very inefficient . It takes one to read every image on disk I/O operation , This will cause some delay . Training deep learning networks themselves is slow enough , So we should try to avoid I/O bottleneck .
A more reasonable solution is to generate the original image HDF5 Data sets ,, It's just that this time we're storing the original image , Instead of extracted features .HDF5 Not only can you store a large number of data sets , And it can also be used for I/O operation , Especially for extracting... From files batch( be called “ slice ”). We save the original image on disk to HDF5 In file , This allows the model to quickly traverse the dataset and train the deep learning network on it .
2.github Code :
# -*- coding: utf-8 -*-
"""
Example on how to use HDF5 dataset with TFLearn. HDF5 is a data model,
library, and file format for storing and managing data. It can handle large
dataset that could not fit totally in ram memory. Note that this example
just give a quick compatibility demonstration. In practice, there is no so
real need to use HDF5 for small dataset such as CIFAR-10.
"""
from __future__ import division, print_function, absolute_import
import tflearn
from tflearn.layers.core import *
from tflearn.layers.conv import *
from tflearn.data_utils import *
from tflearn.layers.normalization import *
from tflearn.layers.estimator import regression
# CIFAR-10 Dataset
from tflearn.datasets import cifar10
(X, Y), (X_test, Y_test) = cifar10.load_data()
Y = to_categorical(Y)
Y_test = to_categorical(Y_test)
# Create a hdf5 dataset from CIFAR-10 numpy array
import h5py
h5f = h5py.File('data.h5', 'w')
h5f.create_dataset('cifar10_X', data=X)
h5f.create_dataset('cifar10_Y', data=Y)
h5f.create_dataset('cifar10_X_test', data=X_test)
h5f.create_dataset('cifar10_Y_test', data=Y_test)
h5f.close()
# Load hdf5 dataset
h5f = h5py.File('data.h5', 'r')
X = h5f['cifar10_X']
Y = h5f['cifar10_Y']
X_test = h5f['cifar10_X_test']
Y_test = h5f['cifar10_Y_test']
# Build network
network = input_data(shape=[None, 32, 32, 3], dtype=tf.float32)
network = conv_2d(network, 32, 3, activation='relu')
network = max_pool_2d(network, 2)
network = conv_2d(network, 64, 3, activation='relu')
network = conv_2d(network, 64, 3, activation='relu')
network = max_pool_2d(network, 2)
network = fully_connected(network, 512, activation='relu')
network = dropout(network, 0.5)
network = fully_connected(network, 10, activation='softmax')
network = regression(network, optimizer='adam',
loss='categorical_crossentropy',
learning_rate=0.001)
# Training
model = tflearn.DNN(network, tensorboard_verbose=0)
model.fit(X, Y, n_epoch=50, shuffle=True, validation_set=(X_test, Y_test),
show_metric=True, batch_size=96, run_id='cifar10_cnn')
h5f.close()
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
- 44.
- 45.
- 46.
- 47.
- 48.
- 49.
- 50.
- 51.
- 52.
- 53.
- 54.
- 55.
- 56.
- 57.
- 58.
- 59.
- 60.
3. Source code address :
4.tf Introduction collection
边栏推荐
- R7-1 sum of numeric elements of a list or tuple
- Implementation of sequencelist sequence table
- SequenceList顺序表的实现
- How to adjust the font blur of win10
- MATLAB点云处理(二十五):点云生成 DEM(pc2dem)
- Alibaba cloud server MySQL remote connection has been disconnected
- Basic operation and question type summary of binary tree
- 习题8-8 判断回文字符串 (20 分)
- 习题11-3 计算最长的字符串长度 (15 分)
- 二叉树的基本操作与题型总结
猜你喜欢

If I take the college entrance examination again, I will study mathematics well!

Fastapi 5 - common requests and use of postman and curl (parameters, x-www-form-urlencoded, raw)

遇到表格,手动翻页太麻烦?我教你写脚本,一页展示所有数据

什么是死锁?(把死锁给大家讲明白,知道是什么,为什么用,怎么用)

0-1 knapsack problem of dynamic programming (detailed explanation + analysis + original code)

Tkinter学习笔记(二)

Bit operation in leetcode

学1个月爬虫就月赚6000?别被骗了,老师傅告诉你爬虫的真实情况

华为设备配置HoVPN

leetcode 257. Binary Tree Paths 二叉树的所有路径(简单)
随机推荐
[solution] solution to asymmetric and abnormal transformation caused by modifying the transform information of sub objects
Use the securecrtportable script function to read data from network devices
Swiper -- a solution to the conflict of single page multicast plug-ins
Unity3D getLaunchIntentForPackage 获取包返回null问题
[Yu Yue education] basic engineering English of Zhejiang industrial and Commercial University (wuyiping) reference materials
16 | 浮点数和定点数(下):深入理解浮点数到底有什么用?
Neglected technique: bit operation
Start notes under the Astro Pro binocular camera ROS
Why is the printer unable to print the test page
【Uniapp 原生插件】商米钱箱插件
习题8-8 判断回文字符串 (20 分)
SecurityContextHolder. getContext(). getAuthentication(). Getprincipal() gets username instead of userdetails
[Yu Yue education] General English of Shenyang Institute of Engineering (4) reference materials
volatile的解构| 社区征文
LeetCode栈题目总结
How to view computer graphics card information in win11
Basic operation and question type summary of binary tree
MATLAB点云处理(二十五):点云生成 DEM(pc2dem)
习题8-2 在数组中查找指定元素 (15 分)
Implementation of sequencelist sequence table