当前位置:网站首页>batch_size of deep learning foundation
batch_size of deep learning foundation
2022-08-02 05:29:00 【hello689】
Background knowledge: batch size is a parameter that can only be used in batch learning.Statistical learning can be divided into two categories: online learning and batch learning.That is to say, deep learning can be divided into online learning and batch learning according to a certain division method. We usually use batch learning more.For details, please refer to "Statistical Learning and Methods" Second Edition, p13.
Online learning: accept one sample at a time, make predictions, and repeatedly learn the model;
Batch learning: It is an offline learning method (all samples need to be acquired in advance for training tasks), and all samples oris a partial sample for learning.
1. Why do I need batch size?
One answer is this statement in the picture below. As a beginner, it may be a little confusing to see it for the first time.
The second answer is as follows: Is this answer still a bit convoluted?
In the batch method of supervised learning, the adjustment of the salient weights of the multilayer perceptron is performed after all N examples of the training sample set have appeared, which constitutes a traininground.The cost function for batch learning is defined by the mean error energy.So batch training is required.
My summary:
Why batch size is needed, the core is not enough memory.If the memory is sufficient, the error of all samples is used, and then the optimal gradient direction of the parameter to be optimized can be calculated, that is, the globally optimal gradient direction.However, in practice, the amount of data is very large, and the video memory is not enough, so the batch training method is needed for training.If the batch is too small, the randomness of the gradient descent direction is large, and the model is difficult to converge stably (the convergence speed is slow), and many iterations are required.Therefore, the selection of batch size is generally enough to choose a suitable one, which not only ensures high memory utilization, but also ensures that the model converges quickly and achieves a balance.
In addition, batch normal also requires batch data to find the mean square error. If the batch size is 1, BN is basically useless.
2. The benefits of increasing batch size (three points)
- The memory utilization is improved, and the parallelization efficiency of large matrix multiplication is improved.
- The number of iterations required to run an epoch is reduced, and the processing speed is accelerated relative to the same amount of data
- In a certain range, generally increase the batch size, the gradient descent direction is more accurate, and the convergence is stable
3. Disadvantages of increasing batch size (three points)
- The memory utilization is improved, but the memory capacity may not be enough (buy more memory)
- The number of iterations required to run an epoch is reduced. To achieve the same accuracy, it is necessary to train several more iterations, then increase the epoch and increase the training time.
- When the batch size increases to a certain extent, its decreasing direction is difficult to change, and it may fall into a local optimum.(The ship is difficult to turn around), unless the batch is all data.
4. How does adjusting batch size affect the training effect?
- The batch size is too small, the model performance is not good (error soars)
- As the batch size increases, the speed of processing the same amount of data is faster.
- As the batch size increases, the number of epochs required to achieve the same accuracy increases.
- Due to the contradiction between the above two factors, when the batch size increases to a certain value, the optimal time is achieved.
- Because the final convergence accuracy will fall into different local extremums, the batch size is increased to a certain time to achieve the optimal final convergence accuracy.
5. Why does batch_size need to be a multiple of 2?
Memory Alignment and Floating Point Efficiency;
One of the main arguments for choosing batch sizes as powers of 2 is that CPU and GPU memory architectures are powers of 2organized.Or more precisely, there is the concept of a memory page, which is essentially a contiguous block of memory.If you're on macOS or Linux, you can check the page size by executing getconf PAGESIZE in a terminal, which should return a power of 2 number.For details, please refer to this article.
References
- https://www.cnblogs.com/Wanggcong/p/4699932.html
- https://cloud.tencent.com/developer/article/1358478
- "Statistical Learning Methods" Li Hang
边栏推荐
- 科研笔记(八) 深度学习及其在 WiFi 人体感知中的应用(上)
- UI自动化测试框架搭建——标记性能较差用例
- Jetson Nano 2GB Developer Kit Installation Instructions
- 生物识别学习资源推荐
- 吴恩达机器学习系列课程笔记——第十八章:应用实例:图片文字识别(Application Example: Photo OCR)
- 吴恩达机器学习系列课程笔记——第九章:神经网络的学习(Neural Networks: Learning)
- arr的扩展方法、数组的遍历及其他方法
- 使用 Fastai 构建食物图像分类器
- 单目3D目标检测之入门
- 数据复制系统设计(2)-同步复制与异步复制
猜你喜欢
随机推荐
C# Thread IsBackground作用
Jetson Nano 2GB Developer Kit 安装说明
Deep Blue Academy - 14 Lectures on Visual SLAM - Chapter 7 Homework
Research Notes (8) Deep Learning and Its Application in WiFi Human Perception (Part 2)
Promise
[Win11] PowerShell无法激活Conda虚拟环境
PHP5.6安装ssh2扩展用与执行远程命令
jetracer_pro_2GB AI Kit system installation instructions
ScholarOne Manuscripts submits journal LaTeX file and cannot convert PDF successfully!
v-bind动态绑定
力扣 215. 数组中的第K个最大元素
深度学习基础之批量归一化(BN)
Centos7下使用systemd管理redis服务启动
ffmpeg视频播放、格式转化、缩放等命令
asyncawait和promise的区别
Excel操作技巧大全
空卡安装设置树莓派4B并安装opencv+QT
How to save a section of pages in a PDF as a new PDF file
3D目标检测之数据集
MySQL read-write separation mysql-proxy deployment