当前位置:网站首页>[Nanjing University] - [software analysis] course learning notes (I) -introduction

[Nanjing University] - [software analysis] course learning notes (I) -introduction

2022-07-07 08:42:00 Xihang

Nanjing University Static program analysis

introduction

1. program language and static analysis

programing language :

  • theory
    • Language design
    • Type system
    • Semantics and logic
  • Environmental Science
    • compiler
    • Runtime system
  • application
    • Program analysis
    • Program validation
    • Program synthesis

background : In the past decade , The core of language is basically unchanged , But the program has become significantly larger and more complex ;

Challenge : How to ensure the reliability of large-scale and complex programs 、 Security and other commitments

2. why we learn static analysis

The reliability of the program reliability:

  • null pointer dereference Null pointer reference
    • There is no reference define, Will bring a lot bug
  • memory leak Memory leak
    • malloc No, free, Too much memory will lead to crash

The security of the program security:

  • private information leak Disclosure of private information
  • injection attack Injection attack

Compiler optimization

  • dead code elimination Dead code elimination
  • code motion Code movement

Program understanding

  • IDE call hierarchy Call level
  • type indication Type indication

3. what is static analysis

Static analysis analyzes a program P to reason about its behaviors and determines whether it satisfies some properties before running P.

Static analysis through the analysis program P To infer its behavior , And running P Before determining whether it meets certain properties

  • Does P contain any private information leaks?
  • Does P dereference any null potinters?
  • Are all the cast operations in P safe? Type conversion operators
  • Can v1 and v2 in P point to the same memory location?
  • Will certain assert statements in P fail?
  • Is this piece of code in P dead(so that it could be eliminated)?

Rice theory Rice theorem : There is no way to determine whether a program satisfies some nontrivial properties .

Any non-trivial property of the behavior of programs in a r.e. language is undecidable

r.e. (recursively enumerable) = recognizable by a Turing-machine
Recursively enumerable languages are languages recognized by Turing machines

non-trivial Not simple , important

The above expression is a perfect static analysis It doesn't exist

From an academic point of view , One perfect Of static analysis There are two conditions that need to be met :

  • sound:over approximate contain true
  • complete: under approximate true contain

So we don't pursue perfect static analysis , It's about pursuing useful static analysis:

  • Compromise soundness(false negatives) Compromise under reporting control
  • Compromise completeness (false positives) Compromise false alarm control

Mostly compromising completeness: Sound but not fully-precise static analysis

Most static analysis pursues Soundness, It will cause false positives

Soundness is critical to a collection of important(static-analysis) applications such as compiler optimization and program verification.

Static analysis: ensure (or get close to) soundness, while making good trade-offs between analysis precision and analysis speed.

Make sure soundness, And make a trade-off between analysis accuracy and analysis speed

4. static analysis features and examples

Abstraction + Over-approximation

Abstract and over approximate

  • abstraction
  • over-approximation
    • transfer function
    • control flows

Example:

Determine the symbols of all variables in a given program : just 、 Negative or 0.

  • T:unknow Positive but negative (over-approximation Over approximate )
  • bottom:undefined, Divide by zero operation , Throw exception operation

transfer function It defines how to evaluate different program states on abstract values , And it should be defined according to the semantics of analysis problems and different program states

  • 1 The problem is divide by zero 、 Throw an exception , It should be undefined;
  • 2 The problem is that the subscript is negative , Throw an exception , It should also be for undefined
  • 3 The problem is over-approximated, That is, false positives , In the actual procedure a=x+y=9, So it won't report an error , This boundary will be blurred in the process of abstracting into symbols , Cause false positives

Control flow analysis ,y The two values of are two branch, Taking different branches will make z Change the value of . And it is impossible to enumerate all paths in practice , The space is huge , therefore flow merging Is used by many static analysis .

原网站

版权声明
本文为[Xihang]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207070550573844.html