当前位置:网站首页>Five "potential errors" in embedded programming
Five "potential errors" in embedded programming
2022-07-04 13:51:00 【Li Xiaoyao】
Focus on 、 Official account of star standard , Straight to the highlights
source : Online material
Finding and eliminating potential errors in embedded development software is a difficult task .
It usually takes heroic effort and expensive tools to recover from the observed collapse , Crash or other unplanned runtime behavior can be traced back to the root cause .
In the worst case , The root cause can destroy code or data , Make the system appear to work properly or at least for a period of time .
Engineers often give up trying to find the cause of unusual anomalies , These anomalies are not easy to reproduce in the laboratory , Treat it as a user error or “ A glitch ”.
However , These ghosts in the machine still exist . This is the most common root cause guide for errors that are difficult to reproduce . Whenever you read the firmware source code , Please look for the following five main errors . And follow recommended best practices , To prevent them from happening to you again .
error 1: Competitive conditions
A race condition refers to two or more execution threads ( It can be RTOS Task or main() And interrupt handlers ) Any situation in which the combination result of changes according to the precise order of interleaving instructions . Each is executed on the processor .
for example , Suppose you have two execution threads , One of them regularly increments a global variable (g_counter + = 1; ), And another accidentally returns it to zero (g_counter = 0; ). If you can't always be atomic ( namely , In a single instruction cycle ) Execute incremental , There are competitive conditions .
Pictured 1 Shown , Think of the task as a car approaching the same intersection . A conflict between two updates of a counter variable may never occur , Or it rarely happens . however , When you do that , The counter does not actually clear in memory . Its value is corrupted at least until the next reset . This effect may have serious consequences for the system , Although it may not appear until a long time after the actual collision .
Best practices : You can execute key parts of the code atomically by having to restrict behavior with appropriate preemption , To avoid competitive conditions . To prevent the involvement of ISR Contention for , At least one interrupt signal must be disabled for the duration of a critical part of another code .
about RTOS Contention between tasks , The best practice is to create mutexes specific to the shared library , Each mutex must be obtained before entering the key part . Please note that , Rely on specific CPU To ensure atomicity is not a good idea , Because this can only prevent contention , Until you change the compiler or CPU.
Sharing data and seizing random time are the culprits of the competitive situation . But mistakes may not always happen , This makes it extremely difficult to track race status from observed symptoms to root causes . therefore , It is important to be vigilant to protect all shared objects . Each shared object is an accident waiting to happen .
Best practices : Name all potentially shared objects ( Including global variables , Heap object or peripheral register and pointer to the object ), So that the risk is obvious to all future code readers ; stay Netrino The embedded C The coding standard advocates the use of “ Of G_ So ,” Prefix . Finding all possible shared objects will be the first step in the contention condition code audit .
error 2: Non reentrant function
Technically speaking , The problem of non reentrant function is a special case of contention problem . and , For related reasons , Runtime errors caused by non reentrant functions usually do not occur in a reproducible manner - Make them equally difficult to debug .
Unfortunately , Non reentrant features are also more difficult to detect in code reviews than other types of competitive conditions .
chart 2 Shows a typical scenario . ad locum , The software entities to be preempted are also RTOS Mission . however , Instead of directly calling shared objects, they operate indirectly through function calls .
for example , Suppose the task A Call the socket layer protocol function , The socket function calls TCP Layer protocol function , call IP Layer protocol function , This function calls the Ethernet driver . In order to make the system operate reliably , All these functions must be reentrant .
however , All functions of the Ethernet driver operate the same global object in the form of registers of the Ethernet controller chip . If preemption is allowed during these register operations , Then the task B You can send packets A Preempt tasks after queuing but before sending starts A.
then , Mission B Call the socket layer function , The socket layer function calls TCP Layer function , Call again IP Layer function , This function calls the Ethernet driver , The queue sends packets B Queue and transmit . When CPU Return control to the task A when , It will request transmission . According to the design of Ethernet controller chip , This may retransmit packets B Or make mistakes . Data packets A The loss of , And will not be sent to the network .
For the sake of multiple at the same time RTOS The function of calling Ethernet driver in middle note , They must be reentrant . If they each use only stack variables , There is nothing to do .
therefore ,C The most common style of a function is inherently reentrant . however , Unless it's carefully designed , Otherwise, drivers and some other functions will not be reentrant .
The key to making the function reentrant is to pause the processing of peripheral registers , Including static local variables , Preemption of all accesses to global variables including persistent heap objects and shared memory areas . This can be done by disabling one or more interrupts or acquiring and releasing mutexes . The details of the problem determine the best solution .
Best practices : Create and hide a mutex in each library or driver module , These mutexes are not inherently reentrant . Make obtaining this mutex a prerequisite for operating on any persistent data or shared registers used in the entire module .
for example , The same mutex can be used to prevent contention involving Ethernet controller registers and global or static local packet counters . Before accessing this data , All functions accessing this data in the module must follow the protocol to obtain the mutex .
Note that the non reentrant function may act as a third-party middleware , Old code or part of the device driver enters your code base .
What is disturbing is , Non reentrant functions may even be the standard that comes with the compiler C or C ++ Part of the library . If you use GNU Compiler to build based on RTOS Applications for , Please note that you should use reentrant “ newlib” standard C library , Instead of the default library .
error 3: The lack of volatile keyword
If not used C Of volatile Keyword to mark certain types of variables , It may cause many unexpected behaviors in systems that only set the optimizer of the compiler to low-level or disable the compiler to work properly . The variable declaration during this qualifier , Its purpose is to prevent optimized reading and variable writing .
for example , If you write a list 1 Code shown , The optimizer may try to make the program faster by eliminating the first line , smaller , Thus damaging the patient's health . however , If you will g_alarm Declare as volatile , Then this optimization will not be allowed .
Best practices : Will volatilize The keyword of should be used to declare each :
from ISR And any other part of the code ,
By two or more RTOS Global variables accessed by the task ( Even if the competitive conditions in these accesses have been blocked ),
Points to the memory mapped peripheral register ( Or a set or a set of registers ) The pointer to , as well as
Delay cycle counter .
Please note that , In addition to ensuring that all read and write operations are for the given variable , Use volatile Also by adding other “ Sequence point ” To limit the compiler . Volatile access other than reading or writing volatile variables must be performed before the access .
error 4: stack overflow
Every programmer knows that stack overflow is a bad thing . however , The impact of each stack overflow is different . The nature of damage and the timing of misconduct depend entirely on what data or instructions are destroyed and how they are used . It is important to , The time between the stack overflow and its negative impact on the system depends on the time before the blocking bit is used .
Unfortunately , Stack overflow is more vulnerable to embedded systems than desktop computers . There are several reasons for this , These include :
(1) Embedded systems usually take up less space RAM;
(2) Usually there is no virtual memory to fallback ( Because there is no disk );
(3) be based on RTOS The firmware design of the task utilizes multiple stacks ( One for each task ), Each stack must be large enough , Ensure that the worst-case scenario does not occur at the depth of the stack ;
(4) Interrupt handlers may try to use these same stacks .
What further complicates the problem is , No amount of testing can ensure that a particular stack is large enough . You can test the system under various loading conditions , But it can only be tested for a long time . Only in “ Half a blue moon ” Tests run in may not be witnessed only in “ A blue moon ” Stack overflow in . In algorithmic constraints ( For example, no recursion ) Next , It can be proved that stack overflow will not occur by top-down analysis of the control flow of the code . however , Every time you change the code , All need to redo the top-down analysis .
Best practices : Startup time , Draw unlikely memory patterns across the stack .( I like to use hexadecimal 23 3D 3D 23, It looks like ASCII Fence in memory dump ' #==# '.) At run time , Ask the administrator task to check periodically whether any paint has changed above the preset high water level .
If you find a problem with a stack , Please record in nonvolatile memory ( For example, which stack and the height of the flood ), And do something safe for the users of the product ( for example , Controlled shutdown or reset ) There may be a real overflow . This is a nice additional security feature added to the watchdog task .
error 5: Heap fragmentation
Embedded development engineers do not make good use of dynamic memory allocation . One of them is the problem of heap fragments .
adopt C Of malloc() Standard library routines or C ++ Of new All data structures created by keywords reside in the heap . The pile is RAM A specific area with a predetermined maximum size in . first , Each allocation in the heap reduces the remaining number of bytes by the same amount “ You can use ” Space .
for example , The heap in a particular system may be from the address 0x20200000 Start crossing 10 KB. a pair 4 KB The allocation of data structures will leave 2 KB Free space for .
You can call free() Or use delete Keyword returns the storage of data structures that are no longer needed to the heap . In theory , This makes the storage space available for reuse during subsequent allocations . But the order of allocation and deletion is usually at least pseudo-random , This causes the pile to become a pile of smaller pieces .
To view fragments can be a problem , Please consider the above if 4 KB What happens when the data structure is first idle . Now? , The pile consists of a 4 KB Free block and another 2 KB Consists of free blocks . They are not adjacent , Can't merge . So our heap has been divided . Although the total free space is 6 KB, But more than 4 KB The assignment of will fail .
Fragments are similar to entropy : Both increase over time . In a long-running system ( let me put it another way , Most embedded systems ever created ) in , Fragmentation may eventually cause some allocation requests to fail . so what ? How should your firmware handle a heap allocation request failure ?
Best practices : Avoiding full heap usage is a sure way to prevent this error . however , If dynamic memory allocation is necessary or convenient in your system , You can use another method of structured heap to prevent fragmentation .
The key observation is that the problem is caused by variable size requests . If all requests are the same size , Then any free block will be as good as any other block , Even if it happens not to be adjacent to any other free block . chart 3 Shows how to put multiple “ Pile up ”( Each allocation request for a specific size ) The use of is realized as “ Memory pool ” data structure .
Many real-time operating systems have fixed size memory pools API. If you can access one of them , Please use it instead of malloc() and free() . Or write your own fixed size memory pool API. You only need three functions : One for creating a new pool ( The size is M block N byte ); The other allocates a block ( From the specified pool ); One third replace free() .
Code review is still a best practice , You can avoid a lot of debugging trouble by first ensuring that these errors do not exist in the system . The best way is to have people inside or outside the company conduct a comprehensive code review . Standard rule coding that enforces the best practices I describe here should also help . If you suspect one of these annoying errors in existing code , Then performing a code review may be faster than trying to trace the observed failure to the root cause .
Copyright notice : Source network of this paper , Free delivery of knowledge , The copyright belongs to the original author . If involves the work copyright question , Please contact me to delete .
‧‧‧‧‧‧‧‧‧‧‧‧‧‧‧‧ END ‧‧‧‧‧‧‧‧‧‧‧‧‧‧‧
Pay attention to my WeChat official account , reply “ Add group ” Join the technical exchange group according to the rules .
Click on “ Read the original ” See more sharing , Welcome to share 、 Collection 、 give the thumbs-up 、 Looking at .
边栏推荐
- 室外LED屏幕防水吗?
- XML入门一
- XML入门三
- Source code compilation and installation of MySQL
- MySQL three-level distribution agent relationship storage
- unity不识别rider的其中一种解决方法
- Personalized online cloud database hybrid optimization system | SIGMOD 2022 selected papers interpretation
- 一个数据人对领域模型理解与深入
- Three schemes to improve the efficiency of MySQL deep paging query
- JVM系列——栈与堆、方法区day1-2
猜你喜欢
Oracle 被 Ventana Research 评为数字创新奖总冠军
数据库公共字段自动填充
Source code compilation and installation of MySQL
2022G3锅炉水处理考试题模拟考试题库及模拟考试
面试官:Redis中哈希数据类型的内部实现方式是什么?
"Pre training weekly" issue 52: shielding visual pre training and goal-oriented dialogue
MySQL 45 lecture - learn the actual combat notes of MySQL in Geek time 45 lecture - 06 | global lock and table lock_ Why are there so many obstacles in adding a field to the table
Go 语言入门很简单:Go 实现凯撒密码
高质量软件架构的唯一核心指标
Building intelligent gray-scale data system from 0 to 1: Taking vivo game center as an example
随机推荐
SQL语言
JVM系列——栈与堆、方法区day1-2
C#基础补充
Flet tutorial 03 basic introduction to filledbutton (tutorial includes source code) (tutorial includes source code)
高效!用虚拟用户搭建FTP工作环境
Practice: fabric user certificate revocation operation process
MySQL45讲——学习极客时间MySQL实战45讲笔记—— 06 | 全局锁和表锁_给表加个字段怎么有这么多阻碍
. Net using redis
上汽大通MAXUS正式发布全新品牌“MIFA”,旗舰产品MIFA 9正式亮相!
CTF competition problem solution STM32 reverse introduction
ASP.NET Core入门一
Introduction to reverse debugging PE structure resource table 07/07
How real-time cloud interaction helps the development of education industry
CVPR 2022 | transfusion: Lidar camera fusion for 3D target detection with transformer
Redis - how to install redis and configuration (how to quickly install redis on ubuntu18.04 and centos7.6 Linux systems)
Using nsproxy to forward messages
"Pre training weekly" issue 52: shielding visual pre training and goal-oriented dialogue
Animation and transition effects
C语言集合运算
Comparative study of the gods in the twilight Era