当前位置:网站首页>30 SCM common problems and solutions!
30 SCM common problems and solutions!
2022-06-27 05:43:00 【SunMicro soc】
One 、 Problem recurrence
Stable recurrence of the problem can correctly locate the problem 、 Solve and verify . Generally speaking , The easier it is to reproduce the problem, the easier it is to solve .
1.1 Simulated recurrence conditions
Some problems exist under specific conditions , You only need to simulate the problem conditions to reproduce . For conditions that rely on external input , If the conditions are complex and difficult to simulate, you can consider entering the corresponding state directly by default in the program .
1.2 Increase the frequency of relevant tasks
For example, if an exception occurs only after a task runs for a long time, the execution frequency of the task can be increased .
1.3 Increase the test sample size
An exception occurred after the program ran for a long time , It's hard to repeat the problem , The test environment can be built, and multiple sets of equipment can be tested at the same time .
Two 、 Problem location
Narrow down the scope of investigation , Confirm the task of introducing the problem 、 function 、 sentence .
2.1 Print LOG
According to the phenomenon of the problem , Add... To the code in question LOG Output , To track the program execution process and the value of key variables , Observe whether it is consistent with expectations .
2.2 Online debugging
Online debugging can play a role in and printing LOG Similar effects , In addition, this method is particularly suitable for troubleshooting program crash classes BUG, When the program falls into an abnormal interrupt (HardFault, Watchdog interrupt, etc ) You can directly STOP see call stack And the value of the kernel register , Quickly locate problem points .
2.3 Version rollback
When using the version management tool, you can locate the version that introduces the problem for the first time by constantly backing back the version and testing and verification , After that, you can check the code added and modified in this version .
2.4 Dichotomous notes
The second note is Comment out part of the code in a way similar to binary search , To determine whether the problem is caused by the commented out part of the code .
The specific method is to comment out half of the code irrelevant to the problem , See if the problem is solved , If it is not solved, note the other half , If the problem is solved, continue to narrow the scope of the comment in half , And so on, gradually narrow the scope of the problem .
2.5 Save kernel register snapshot
Cortex M When the kernel gets into an abnormal interrupt, it will push the values of several kernel registers onto the stack , Here's the picture :

Uploading … Re upload cancel
We can write the value of the kernel register on the stack to... When we fall into an abnormal interrupt RAM In the area where the default value is retained after a period of reset , Perform the reset operation and then start from RAM Read out and analyze this information , adopt PC、LR Confirm the function executed at that time , adopt R0-R3 Analyze whether the variables processed at that time are abnormal , adopt SP Analyze whether there may be stack overflow, etc .
3、 ... and 、 Problem analysis and treatment
Analyze the cause of the problem by combining the problem phenomenon and the location of the problem code .
3.1 The program continues to run
3.1.1 The value is abnormal
3.1.1.1 Software problems
1、 An array
Subscript exceeds array length when writing array , The corresponding address content is modified . as follows :

Such problems usually need to be combined with map Document analysis , adopt map The file observes the array near the address of the tampered variable , Check whether there is unsafe code as shown in the figure above when writing to the array , Change it to safe code .
2、 Stack overflow
| 0x20001ff8 | g_val |
|---|---|
| 0x20002000 | At the bottom of the stack |
| ………… | Stack space |
| 0x20002200 | To the top of the stack |
Pictured above , Such problems also need to be combined with map Document analysis . Suppose the stack grows from high address to low address , If stack overflow occurs , be g_val The value of is overwritten by the value on the stack .
In case of stack overflow, analyze the maximum usage of the stack , Too many function call layers , Interrupt the function call in the service function , Large temporary variables declared inside the function may lead to stack overflow .
There are the following ways to solve such problems :
Memory resources should be allocated reasonably in the design stage , Set the appropriate size for the stack ;
Add... To the larger temporary variable in the function ”static” Keywords are converted to static variables , Or use malloc() Dynamic allocation , Put it on the pile ;
Change the method of function call , Reduce the number of call layers .
3、 Judge whether the sentence condition is written incorrectly

Judging the condition of a statement is easy to put the equality operator “==” Write as assignment operator “=” Causes the value of the variable to be judged to be changed , This kind of error will not be reported at compile time and always returns true .
It is recommended to write the variable to be judged to the right of the operator , In this way, an error will be reported during compilation when it is written as an assignment operator . You can also use some static code checking tools to find such problems .
4、 Synchronization problem
For example, when operating a queue , An interruption occurred during the execution of the out of queue operation ( Task switching ), And interrupt ( Task after switching ) The queue structure may be damaged if the queue entry operation is performed in , For this kind of situation, you should turn off the interrupt during operation ( Use mutex to synchronize ).
5、 optimization problem

Uploading … Re upload cancel
As shown in the above figure, the program , The original intention is to wait irq No more execution after interruption foo() function , But after being optimized by the compiler , During actual operation flg May be loaded into a register and determine the value in the register each time without re starting from ram Read in flg Value , Cause even irq Interruption occurs foo() Has been running , Here we need to be in flg Add... Before your statement “volatile” keyword , Force every time from ram get flg Value .
3.1.1.2 Hardware problem
1、 chip BUG
The chip itself exists BUG, In some specific cases, it returns an incorrect value to the MCU , The program needs to judge the read back value , Filter outliers .
2、 Communication timing error

For example, power management chip Isl78600, Let's say two pieces are cascading , When the voltage sampling data of two chips are read at the same time , The high-end chip will transmit data to the low-end chip through the daisy chain in a fixed cycle , There is only one buffer on the low-end chip .
If the MCU does not read the data on the low-end chip within the specified time, the new data will overwrite the current data when it comes , Cause data loss . Such problems require careful analysis of the data book of the chip , Strictly meet the timing requirements of chip communication .
3.1.2 Abnormal action
3.1.2.1 Software problems
1、 Design problems
There are errors or omissions in the design , Design documents need to be reviewed again .
2、 The implementation is inconsistent with the design
The implementation of the code is inconsistent with the design document. It is necessary to add unit tests to cover all conditional branches , Cross code review.
3、 State variable exception
For example, the variable recording the current state of the state machine is tampered with , The method of analyzing this kind of problem is the same as the numerical anomaly part above .
3.1.2.2 Hardware problem
1、 Hardware failure
The goal is IC invalid , Do not act after receiving the control command , Need to check the hardware .
2、 Abnormal communication
And target IC Communication error , Unable to execute control command correctly , You need to use an oscilloscope or logic analyzer to observe the communication sequence , Analyze whether the signal sent is wrong or subject to external interference .
3.2 Program crash
3.2.1 Stop running
3.2.1.1 Software problems
1、HardFault
The following conditions can cause HardFault:
Operate the register of the peripheral when the peripheral clock gate is not enabled ;
Jump function address is out of bounds , It usually happens when the function pointer is tampered with , The troubleshooting method is the same as that for abnormal values ;
Alignment problem when dereferencing pointer :
Take the small end sequence as an example , If we declare a forcibly aligned structure, it is as follows :

| Address | 0x00000000 | 0x00000001 | 0x00000002 | 0x00000003 |
|---|---|---|---|---|
| Variable name | Val0 | Val1_low | Val1_high | Val2 |
| value | 0x12 | 0x56 | 0x34 | 0x78 |
here a.val1 The address for 0x00000001, If the uint16_t Type to dereference this address will enter... Due to alignment problems HardFault, If you must manipulate the variable in pointer mode, you should use memcpy().
2、 The interrupt flag... Is not cleared in the interrupt service function
The interrupt flag is incorrectly cleared before the interrupt service function exits , When the program execution exits from the interrupt service function, it will immediately enter the interrupt service function , Show procedural “ Feign death ” The phenomenon .
3、NMI interrupt
Encountered during debugging SPI Of MISO Pin reuse NMI function , When passed SPI When the connected peripherals are damaged MISO Be pulled high , Cause the single-chip microcomputer to reset after NMI The pins are configured to SPI Enter directly before the function NMI interrupt , The program hangs in NMI In interruption . This can happen in NMI Disable... In the interrupt service function of NMI Function to exit NMI interrupt .
3.2.1.2 Hardware problem
1、 The crystal oscillator does not start
2、 Insufficient supply voltage
3、 The reset pin is pulled low
3.2 .2 Reset
3.2.2.1 Software problems
1、 Watchdog reset
In addition to the reset caused by dog feeding timeout , Also pay attention to the special requirements of watchdog configuration , With Freescale KEA For example, SCM , The MCU watchdog needs to execute unlocking sequence when configuring ( Write two different values continuously to its register ), The unlocking sequence must be in 16 Complete within a bus clock , Timeout will cause the watchdog to reset . Such problems can only be familiar with the MCU data manual , Pay attention to similar details .
3.2.2.2 Hardware problem
1、 The supply voltage is unstable
2、 Insufficient load capacity of power supply
Four 、 regression testing
After the problem is solved, regression test is needed , On the one hand, confirm whether the problem will not recur , On the other hand, make sure that the modification will not introduce other problems .
5、 ... and 、 Summary of experience
Summarize the causes of this problem and the methods to solve it , Think about how to prevent similar problems in the future , Whether the same platform products are worth learning from , To draw inferences from one case , Learn from failure .
边栏推荐
- Wechat applet websocket use case
- 双位置继电器DLS-34A DC0.5A 220VDC
- 洛谷P4683 [IOI2008] Type Printer 题解
- 双位置继电器RXMD2-1MRK001984 DC220V
- leetcode-20. Valid parentheses -js version
- 开门小例子学习十种用例图
- 牛客练习赛101-C 推理小丑---位运算+思维
- Basic concepts of neo4j graph database
- Vue学习笔记(五)Vue2页面跳转问题 | vue-router路由概念、分类与使用 | 编程式路由导航 | 路由组件的缓存 | 5种路由导航守卫 | 嵌套路由 | Vue2项目的打包与部署
- 双位置继电器JDP-1440/DC110V
猜你喜欢

Asp. Net core6 websocket simple case

STM32 reads IO high and low level status

leetcode-20. Valid parentheses -js version

双位置继电器XJLS-8G/220

齐纳二极管 稳压二极管 SOD123封装 正负区分

EasyExcel合并相同内容单元格及动态标题功能的实现

高翔slam14讲-笔记1

双位置继电器RXMD2-1MRK001984 DC220V

什么是BFC?有什么用?
![[FPGA] design and implementation of frequency division and doubling based on FPGA](/img/84/75d473d3d8e670260ba16d06705c2f.png)
[FPGA] design and implementation of frequency division and doubling based on FPGA
随机推荐
[unity] button of UI interactive component & summary of optional base classes
Experience oceanbase database under win10
牛客练习赛101-C 推理小丑---位运算+思维
快速排序(非遞歸)和歸並排序
C language implementation timer
导航【机器学习】
Dual position relay dls-34a dc0.5a 220VDC
Wechat applet websocket use case
QT using Valgrind to analyze memory leaks
Neon optimization 1: how to optimize software performance and reduce power consumption?
Get system volume across platforms in unity
重映像(STM32)
What is BFC? What's the usage?
Redis高可用集群(哨兵、集群)
Py2neo basic syntax
竣达技术丨多品牌精密空调集中监控方案
Gao Xiang slam14 lecture - note 1
Unicast, multicast and broadcast of IP network communication
ES6 0622 III
AD22 gerber files 点开 gerber steup 界面 有问题 官方解决方法