当前位置:网站首页>batch_size of deep learning foundation
batch_size of deep learning foundation
2022-08-02 05:29:00 【hello689】
Background knowledge: batch size is a parameter that can only be used in batch learning.Statistical learning can be divided into two categories: online learning and batch learning.That is to say, deep learning can be divided into online learning and batch learning according to a certain division method. We usually use batch learning more.For details, please refer to "Statistical Learning and Methods" Second Edition, p13.
Online learning: accept one sample at a time, make predictions, and repeatedly learn the model;
Batch learning: It is an offline learning method (all samples need to be acquired in advance for training tasks), and all samples oris a partial sample for learning.
1. Why do I need batch size?
One answer is this statement in the picture below. As a beginner, it may be a little confusing to see it for the first time.
The second answer is as follows: Is this answer still a bit convoluted?
In the batch method of supervised learning, the adjustment of the salient weights of the multilayer perceptron is performed after all N examples of the training sample set have appeared, which constitutes a traininground.The cost function for batch learning is defined by the mean error energy.So batch training is required.
My summary:
Why batch size is needed, the core is not enough memory.If the memory is sufficient, the error of all samples is used, and then the optimal gradient direction of the parameter to be optimized can be calculated, that is, the globally optimal gradient direction.However, in practice, the amount of data is very large, and the video memory is not enough, so the batch training method is needed for training.If the batch is too small, the randomness of the gradient descent direction is large, and the model is difficult to converge stably (the convergence speed is slow), and many iterations are required.Therefore, the selection of batch size is generally enough to choose a suitable one, which not only ensures high memory utilization, but also ensures that the model converges quickly and achieves a balance.
In addition, batch normal also requires batch data to find the mean square error. If the batch size is 1, BN is basically useless.
2. The benefits of increasing batch size (three points)
- The memory utilization is improved, and the parallelization efficiency of large matrix multiplication is improved.
- The number of iterations required to run an epoch is reduced, and the processing speed is accelerated relative to the same amount of data
- In a certain range, generally increase the batch size, the gradient descent direction is more accurate, and the convergence is stable
3. Disadvantages of increasing batch size (three points)
- The memory utilization is improved, but the memory capacity may not be enough (buy more memory)
- The number of iterations required to run an epoch is reduced. To achieve the same accuracy, it is necessary to train several more iterations, then increase the epoch and increase the training time.
- When the batch size increases to a certain extent, its decreasing direction is difficult to change, and it may fall into a local optimum.(The ship is difficult to turn around), unless the batch is all data.
4. How does adjusting batch size affect the training effect?
- The batch size is too small, the model performance is not good (error soars)
- As the batch size increases, the speed of processing the same amount of data is faster.
- As the batch size increases, the number of epochs required to achieve the same accuracy increases.
- Due to the contradiction between the above two factors, when the batch size increases to a certain value, the optimal time is achieved.
- Because the final convergence accuracy will fall into different local extremums, the batch size is increased to a certain time to achieve the optimal final convergence accuracy.
5. Why does batch_size need to be a multiple of 2?
Memory Alignment and Floating Point Efficiency;
One of the main arguments for choosing batch sizes as powers of 2 is that CPU and GPU memory architectures are powers of 2organized.Or more precisely, there is the concept of a memory page, which is essentially a contiguous block of memory.If you're on macOS or Linux, you can check the page size by executing getconf PAGESIZE in a terminal, which should return a power of 2 number.For details, please refer to this article.
References
- https://www.cnblogs.com/Wanggcong/p/4699932.html
- https://cloud.tencent.com/developer/article/1358478
- "Statistical Learning Methods" Li Hang
边栏推荐
猜你喜欢
【FreeRTOS】12 任务通知——更省资源的同步方式
Pycharm平台导入scikit-learn
ClickHouse的客户端命令行参数
企业级的dns服务器的搭建
3D object detection dataset
数学建模学习(76):多目标线性规划模型(理想法、线性加权法、最大最小法),模型敏感性分析
Research Notes (6) Indoor Path Planning Method Based on Environment Perception
UI自动化测试框架搭建——标记性能较差用例
Jetson Nano 2GB Developer Kit Installation Instructions
ESP32-C5 简介:乐鑫首款双频 Wi-Fi 6 MCU
随机推荐
其他语法和模块的导出导入
DOM系列之 click 延时解决方案
MySQL读写分离mysql-proxy部署
Batch normalization (BN) based on deep learning
使用docker-compose 安装Redis最新版,并且设置密码
Research Notes (6) Indoor Path Planning Method Based on Environment Perception
SCI写作攻略——附带常见英语写作句式
深蓝学院-视觉SLAM十四讲-第七章作业
2022华为软件精英挑战赛(初赛)-总结
科研笔记(六) 基于环境感知的室内路径规划方法
[Win11] PowerShell cannot activate Conda virtual environment
无主复制系统(1)-节点故障时写DB
科研笔记(八) 深度学习及其在 WiFi 人体感知中的应用(下)
RuoYi-App启动教程
吴恩达机器学习系列课程笔记——第九章:神经网络的学习(Neural Networks: Learning)
ES6中变量的使用及结构赋值
OpenCV内阈值处理方法
Win8.1下QT4.8集成开发环境的搭建
el-input 只能输入整数(包括正数、负数、0)或者只能输入整数(包括正数、负数、0)和小数
el-select和el-tree结合使用-树形结构多选框