当前位置:网站首页>Loss outputs Nan for the Nan model
Loss outputs Nan for the Nan model
2022-06-13 08:50:00 【Human high quality Algorithm Engineer】
lately , Cutting branches , I haven't touched before , It was used gate-decorator-pruning engineering , yes 19 It's been a long time , Claim to be right resnet The pruning effect is very good , as follows :
You can see the GBN, Compared with other pruning methods , stay FLOPs In case of maximum reduction , While maintaining the highest accuracy .
Here is a record of a pit in the course of my own experiment , I am looking for this bug For more than a day , Purring 
problem : The model is running train Stage , The output of the network logits The first time is normal , The second time is nan, after loss, accuracy = pack.criterion(logits, label) after ,loss by nan, So the output of the model is always nan,
Locate the cause , I take a trained model to prune ,backbone The final output is 128 The eigenvectors of the dimensions , No use, last fc linear layer , The code adds a weight to itself , As follows :
if cfg.base.head_model != "":
model_dict = torch.load(cfg.base.head_model, map_location='cpu' if not cfg.base.cuda else 'cuda')
data = model_dict['ARC_HEAD']['weight'].cpu().data.numpy()
weight = nn.Parameter(torch.Tensor(data), requires_grad=False)
else:
weight = nn.Parameter(torch.Tensor(cfg.model.class_number, 512))
if embedding.is_cuda:
weight = weight.cuda()
I didn't include fc Layer model , So set it to blank , go else Branch ,
weight = nn.Parameter(torch.Tensor(cfg.model.class_number, 512))
That's the problem , You can declare one by yourself tensor Look at the value inside ,
>>> torch.Tensor(5, 8)
tensor([[-1.1774e-37, 6.0536e-43, -1.1774e-37, 6.0536e-43, 4.2039e-45,
0.0000e+00, 1.7937e-43, 0.0000e+00],
[ 1.4360e+04, 7.1846e+22, 1.4601e-19, 1.7750e+28, 6.8608e+22,
2.8183e+20, -1.1774e-37, 6.0536e-43],
[-1.1774e-37, 6.0536e-43, -1.1774e-37, 6.0536e-43, 1.4013e-45,
0.0000e+00, 0.0000e+00, 0.0000e+00],
[ 7.3970e+20, 7.1833e+22, 1.8153e+31, 2.7372e+20, 5.5123e-11,
4.6149e+24, 0.0000e+00, 0.0000e+00],
[ 4.2039e-45, 0.0000e+00, 1.4026e-27, 4.5909e-41, 3.9821e-41,
2.4428e-38, 1.0842e-19, 1.6930e+22]])
The value inside is a minimum or a maximum , So in training and back propagation , There will be situations like gradient disappearance or gradient explosion , Will cause the network output to be nan, Make the following changes , Problem solving
weight = nn.Parameter(torch.randn([cfg.model.class_number, 128]))
>>> torch.randn([5,8])
tensor([[-0.4336, 0.3006, 1.4319, -0.4194, -0.7426, -0.2283, -0.2755, 0.3634],
[ 0.9681, 0.5644, 0.3170, -0.9134, -1.7536, -0.0589, 0.4907, 1.3428],
[ 1.0248, 1.2903, 0.3210, 1.9144, 0.0591, -0.5614, 1.7932, -1.0874],
[ 0.7404, -1.1362, -1.1224, -1.1677, -0.2877, 1.5038, -0.0281, -0.9513],
[ 0.3340, -0.1252, 1.2106, -1.4836, -1.3784, 0.8065, -0.0257, 1.9197]])
边栏推荐
- Cesium displays a pop-up box at the specified position and moves with the map
- useRoutes() may be used only in the context of a <Router> component.
- 5. Attribute selector
- 4、 Js-es5-i / O
- centos 安装mysql及设置远程访问
- Brief description of port, domain communication port and domain service
- Taobao commodity sales interface / Taobao commodity sales monitoring interface / cumulative commodity sales interface
- A solution to create a new EXCEL workbook on win10 computer and change the suffix to xlsm (normally it should be xlsx)
- ERP outlet
- 容器概念和云原生
猜你喜欢

顺时针打印个数组

Is signed or unsigned selected to create an integer field in MySQL? The answer is as follows:

ADT Google browser plug-in ad Terminator

Form exercise 2

Vs installation of vassistx plug-in causes Chinese input of wpf-xaml file to be garbled. Solution

centos 安装mysql及设置远程访问

A solution to create a new EXCEL workbook on win10 computer and change the suffix to xlsm (normally it should be xlsx)

Sky background map, navigation page lovefanfan top

Docker installing MySQL local remote connection docker container MySQL

Buuctf web (VI)
随机推荐
Is it safe to open an account online? Can a novice open an account?
WARNING:tornado. access:404 GET /favicon. ICO (172.16.8.1) 1.84ms [with static file settings]
国债逆回购能开户吗,国债逆回购在APP上可以直接开通券商安全吗 ,买股票怎么网上开户
ES6 module import / export summary
turf. JS usage
Taobao commodity sales interface / Taobao commodity sales monitoring interface / cumulative commodity sales interface
1.SolidWorks各模块的学习顺序
\Difference between N and \r
顺时针打印个数组
GBase 8a磁盘问题及处理
Knowledge points related to system architecture 3
Replace jade engine with EJS
I set up a blog
Gbase 8A disk problems and Solutions
JS - print 99 multiplication table of the for cycle case
JS - array de duplication in the array object case
【QNX Hypervisor 2.2 用户手册】4.5.1 构建QNX Guest
抖音关键词搜索列表接口,超详细的接口对接步骤
Animation through svg
Is signed or unsigned selected to create an integer field in MySQL? The answer is as follows: