当前位置:网站首页>Fault, error, failure of functional safety
Fault, error, failure of functional safety
2022-07-06 06:03:00 【Zhan Miao】
Some concepts in functional security are more convoluted , For example, failure (fault), error (error), invalid (failure), This paper discusses these three concepts .
1. fault
Failure defined in functional safety refers to abnormal conditions that can cause failure of elements or related items .
Faults can be divided into permanent faults and non permanent faults , The classification is shown in the figure below .
Permanent failure It refers to the occurrence and continuation of , Until the fault is removed or repaired . In other words, if a permanent fault occurs, corresponding measures must be taken to restore its normal operation . Systematic faults are generally permanent faults .
Non permanent failure It can be divided into intermittent faults and transient faults . Intermittent fault refers to the repeated occurrence of fault , And then disappear . When a component is on the edge of damage , Or, for example, due to switching surges ( The transient voltage changes violently ), Intermittent faults may occur . Some systemic failures ( For example, the timing is chaotic ) It may also cause intermittent problems .
Transient fault It refers to the fault that occurs once and then disappears . Transient faults can be caused by electromagnetic interference , This can cause bits to flip . For example, due to the single event flip effect (SEU) And single particle transient pulse (SET) Soft error occurred , All are transient faults .( Single event upset (SEU) is a process in which a single high-energy particle in the universe enters the sensitive region of a semiconductor device , The phenomenon of reversing the logical state of a device .)
2. error
ISO 26262 The error defined in refers to the calculated 、 Observed 、 The measured value or condition is different from the real one 、 Stipulated 、 The difference between or theoretically correct . Errors can be caused by unforeseen operating conditions or by the system under consideration 、 Internal failure of a subsystem or component . Faults can be expressed as errors in the considered elements , This error can eventually lead to failure .
For example, a single high-energy particle in the universe enters the sensitive region of semiconductor devices , The single event flip effect that flips the logical state of the memory SEU, Make one of the software bit From 0 To 1 Or from 1 become 0 It belongs to a soft error ( The hardware is not damaged ).
It can be seen from the above that the fault , The general relationship between error and failure is that failure can cause errors , Errors lead to failure . More details will be given below .
3. invalid
invalid , according to ISO26262 Is defined as the termination of an element's ability to perform functions as required .
( english :terminationof the ability of an element to perform a function as required)
notes : Incorrect specifications are one of the sources of failure .
Failure here refers to the loss or termination of function . For example, for motor controller , One of its main functions is based on the vehicle controller VCU Torque request , Control the torque and speed of the motor , Therefore, whether the output torque is unexpectedly large or small is a failure .
3.1. Systematic failure and random hardware failure
In functional safety, according to the cause of failure, it can be divided into two kinds : Systematic failure and random hardware failure .ISO 26262 The main purpose of is to eliminate these two kinds of failures as much as possible .
(1) Systematic failure (systematic failure)
Failure related to a cause in a definite way , Only for the design or production process 、 Operating procedures 、 Such invalidity can be eliminated only after the document or other relevant factors are changed .
Systematic failure has three characteristics :
A- Only carry out correct maintenance without modification , Failure cannot be eliminated .
B- By simulating the cause of failure, it can be repeated .
C- It's human error , Failure causes such as : Errors in the specification of safety requirements ; The design of hardware , manufacture , install , Operation error ; Software design and implementation errors, etc .
Software failures and some hardware failures are systemic failures . such as coding I didn't consider the error of using data type , A variable ( For example, the precision is 1,offset by 0) Should have used U16 Of , It turned out to be U8, So that the maximum value of the calculation can only reach 255. The software here bug It belongs to systematic failure .
(2) Random hardware failure (random hardware failure)
according to ISO 26262 The definition of , Random hardware failure is in the life cycle of hardware elements , Failure that occurs unexpectedly and obeys the probability distribution . And it can be predicted within a reasonable range of accuracy .
The meaning of unexpected occurrence is that although the hardware design is correct , For example, the selection of electronic components , Resistance value , Capacitance value , The circuit design is correct , And the device meets the quality standard . But I can't predict where it will happen , In what form does the failure occur .
Obeying probability distribution means that failure can be predicted within a reasonable range of accuracy . For example, the failure rate is obtained through reliability or analysis .
The cause of random hardware failure is due to physical processes , For example, fatigue 、 Physical degradation or environmental stress . For example, the bit flip mentioned above , For example, the open circuit of resistance , A short circuit , Resistance drift and so on .
3.2. Related failure and non related failure
In addition, related failure and non related failure are also defined in functional safety .
Related failure means that the probability of failure occurring simultaneously or successively cannot be expressed as a simple product of the unconditional probability of each failure . For example, when it fails A And failure B The probability of simultaneous occurrence is not equal to the opportunity of two failure probabilities , Expressed as Pab =Pa*Pb, invalid A and B Can be defined as a related failure . Conversely, uncorrelated failures can be expressed as a simple product of the unconditional probability of each failure .
Related failures can be divided into common cause failures and cascading failures .
CCF refers to the related items , The failure of two or more elements caused by a single specific event or source . As shown in the figure below .
CCF can be avoided through diversified program and hardware design .
Cascading failure refers to , The failure of one element causes the failure of another or more elements .
For example, software partitioning can avoid cascading failures . In the process of practical application level1 and level2 The variables in are stored in different RAM District or NVRAM Zone is a way of zoning .
4. Failure type of hardware
Hardware faults can be divided into the following types according to the fault type , As shown in the figure below :
(1) Safety failure :
Safety failure means that the occurrence of a failure will not significantly increase the probability of violating safety objectives (ISO 26262). Safety faults can be divided into two categories :a) Faults unrelated to the violation of safety objectives .b) n > 2 All n Point of failure ( Unless security concepts show that they are related to the violation of security objectives ).
Example 1: To be EDC And cyclic redundancy check (CRC) Protected flash : By EDC Corrected unit faults are not indicated by signals . The failure's violation of the safety goal has been EDC The prevention of , But it is not indicated by the signal . If EDC Logic failure , The fault is CRC Detected , The system is shut down . Only when there is a unit fault in the flash memory 、EDC Logic failure 、 And CRC When checksum monitoring fails , In order to violate the safety goal (n=3).
(2) A single point of failure :
Single point of failure means that it is not covered by the security mechanism , And directly lead to the violation of safety objectives (ISO26262).
For example, electric cars REESS( Rechargeable power storage system ) Single point failure of insulation resistance . Insulation resistance refers to B Level voltage ( Generally greater than 60V High voltage ) Resistance between terminals of live parts and electrical chassis . When the electric vehicle insulation material is aging and damaged , Water enters the battery system of car washing in rainy days , Vehicle collision, etc , Will lead to the reduction of insulation resistance and electric shock . normal Ri>100Ω/V. The reduction of edge resistance can directly lead to the risk of electric shock , So this is a single point of failure .
(3) Residual faults :
Residual faults are those that occur in hardware elements , The part of the fault that is not covered by the security mechanism (ISO26262) The occurrence of residual faults will directly lead to the violation of safety objectives . such as : If a failure mode is declared to be low, the coverage is 60%, So the rest 40% It is residual fault .
ISO 26262 An example is mentioned in part 10 : If you only use a chessboard RAM Test the security mechanism to check ram (RAM) modular , Then some kinds of bridge faults cannot be detected . The violation of safety objectives caused by these failures cannot be prevented by the safety mechanism . These faults are residual faults .
(4) Multiple faults :
Multipoint fault refers to a single fault that combines with other independent faults and leads to a multipoint failure (ISO26262).
notes : A multipoint fault can be identified only after multipoint failure is identified , For example, through the fault tree FTA Analysis of (ISO 26262). The two-point fault is the fault that two independent faults occur at the same time, which will lead to failure .
(5) Latent fault :
Latent failure means that the safety mechanism does not detect , And the multi-point fault that cannot be detected by the driver within the time interval of multi-point fault detection (ISO26262)
It can be understood as : A multi-point fault that cannot be detected and detected by the driver within a certain period of time is called a latent fault , such as :
A- Failure of monitoring chip
B- The failure of the security mechanism itself , But there is no problem with its function .
The latent fault is a multipoint fault , Combined with other independent multipoint faults, it will directly lead to the violation of safety objectives .
(6) Detectable faults (detected fault)
Detectable fault refers to that within a specified time , A fault detected by a safety mechanism that prevents the fault from becoming a latent fault .
Example : Special security mechanisms that can be defined in the functional security concept ( for example , Detect the error and inform the driver through the alarm device on the instrument panel ) Detected fault .
(7) Perceptible failure (perceivedfault)
The fault inferred by the driver within the specified time interval . Example : Faults can be directly perceived through obvious system performance or performance limitations .
Perceivable means being perceived by the driver , Whether or not a security mechanism detects , But its occurrence will obviously affect the function .
5. fault , The relationship between errors and failures
fault , The relationship between errors and failures is shown in the following figure . In the figure, there are three different types of reasons ( Systematic software problems 、 Random hardware problems and systematic hardware problems ) It describes the development process from fault to error and from error to failure .
Systematic failures are caused by design and specification problems ; Software failures and some hardware failures are systematic .
Random hardware failures are caused by physical processes , For example, fatigue 、 Physical degradation or environmental stress .
At the component level , Each different type of failure will lead to different failures . However , Failure at the component level is a failure at the relevant item level .
Be careful , In this example , Faults caused by different reasons at the vehicle level can cause the same failure . If additional environmental factors make the failure superimpose the accident scenario , Partial failure at the level of related items will be a hazard Hazard.
边栏推荐
- CoDeSys note 2: set coil and reset coil
- Leetcode 701 insertion operation in binary search tree -- recursive method and iterative method
- continue和break的区别与用法
- 华为BFD的配置规范
- My 2021
- [web security] nodejs prototype chain pollution analysis
- Li Chuang EDA learning notes 12: common PCB board layout constraint principles
- Bit operation rules
- 误差的基本知识
- Sqlmap tutorial (III) practical skills II
猜你喜欢
Web service connector: Servlet
Raised a kitten
【论文代码】SML部分代码阅读
How to use the container reflection method encapsulated by thinkphp5.1 in business code
Novice entry SCM must understand those things
Cannot build artifact 'test Web: War expanded' because it is included into a circular depend solution
P问题、NP问题、NPC问题、NP-hard问题详解
初识数据库
Buuctf-[[gwctf 2019] I have a database (xiaoyute detailed explanation)
关于 PHP 启动 MongoDb 找不到指定模块问题
随机推荐
Dynamic programming -- knapsack problem
Network protocol model
[email protected] raspberry pie
LAN communication process in the same network segment
H3C V7 switch configuration IRF
Memory and stack related concepts
Web服务连接器:Servlet
实践分享:如何安全快速地从 Centos迁移到openEuler
Software test interview questions - Test Type
数学三大核心领域概述:代数
Configuring OSPF GR features for Huawei devices
公司视频加速播放
【无标题】
Analysis report on development trends and investment planning of China's methanol industry from 2022 to 2028
[Thesis code] SML part code reading
Expose the serial fraudster Liu Qing in the currency circle, and default hundreds of millions of Cheng Laolai
Hongliao Technology: Liu qiangdong's "heavy hand"
Pay attention to the details of pytoch code, and it is easy to make mistakes
假设检验学习笔记
Accélération de la lecture vidéo de l'entreprise