当前位置：网站首页>[Nanjing University] - [software analysis] course learning notes (I) -introduction

[Nanjing University] - [software analysis] course learning notes (I) -introduction

2022-07-07 08:42:00 【Xihang】

List of articles

- Nanjing University Static program analysis
- - introduction

Nanjing University Static program analysis

introduction

1. program language and static analysis

programing language ：

theory
- Language design
- Type system
- Semantics and logic
- …
Environmental Science
- compiler
- Runtime system
- …
application
- Program analysis
- Program validation
- Program synthesis
- …

background ： In the past decade , The core of language is basically unchanged , But the program has become significantly larger and more complex ;

Challenge ： How to ensure the reliability of large-scale and complex programs 、 Security and other commitments

2. why we learn static analysis

The reliability of the program reliability：

null pointer dereference Null pointer reference
- There is no reference define, Will bring a lot bug
memory leak Memory leak
- malloc No, free, Too much memory will lead to crash

The security of the program security：

private information leak Disclosure of private information
injection attack Injection attack

Compiler optimization

dead code elimination Dead code elimination
code motion Code movement

Program understanding

IDE call hierarchy Call level
type indication Type indication

3. what is static analysis

Static analysis analyzes a program P to reason about its behaviors and determines whether it satisfies some properties before running P.

Static analysis through the analysis program P To infer its behavior , And running P Before determining whether it meets certain properties

Does P contain any private information leaks?
Does P dereference any null potinters?
Are all the cast operations in P safe? Type conversion operators
Can v1 and v2 in P point to the same memory location?
Will certain assert statements in P fail?
Is this piece of code in P dead(so that it could be eliminated)?

Rice theory Rice theorem : There is no way to determine whether a program satisfies some nontrivial properties .

Any non-trivial property of the behavior of programs in a r.e. language is undecidable

r.e. (recursively enumerable) = recognizable by a Turing-machine
Recursively enumerable languages are languages recognized by Turing machines

non-trivial Not simple , important

The above expression is a perfect static analysis It doesn't exist

From an academic point of view , One perfect Of static analysis There are two conditions that need to be met ：

sound：over approximate contain true
complete: under approximate true contain

So we don't pursue perfect static analysis , It's about pursuing useful static analysis：

Compromise soundness（false negatives） Compromise under reporting control
Compromise completeness (false positives) Compromise false alarm control

Mostly compromising completeness: Sound but not fully-precise static analysis

Most static analysis pursues Soundness, It will cause false positives

Soundness is critical to a collection of important(static-analysis) applications such as compiler optimization and program verification.

Static analysis: ensure (or get close to) soundness, while making good trade-offs between analysis precision and analysis speed.

Make sure soundness, And make a trade-off between analysis accuracy and analysis speed

4. static analysis features and examples

Abstraction + Over-approximation

Abstract and over approximate

abstraction
over-approximation
- transfer function
- control flows

Example:

Determine the symbols of all variables in a given program ： just 、 Negative or 0.

T：unknow Positive but negative （over-approximation Over approximate ）
bottom：undefined, Divide by zero operation , Throw exception operation

transfer function It defines how to evaluate different program states on abstract values , And it should be defined according to the semantics of analysis problems and different program states

1 The problem is divide by zero 、 Throw an exception , It should be undefined;
2 The problem is that the subscript is negative , Throw an exception , It should also be for undefined
3 The problem is over-approximated, That is, false positives , In the actual procedure a=x+y=9, So it won't report an error , This boundary will be blurred in the process of abstracting into symbols , Cause false positives

Control flow analysis ,y The two values of are two branch, Taking different branches will make z Change the value of . And it is impossible to enumerate all paths in practice , The space is huge , therefore flow merging Is used by many static analysis .

原网站

版权声明
本文为[Xihang]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/188/202207070550573844.html