当前位置：网站首页>Nanjing University static program analyses -- Introduction learning notes

Nanjing University static program analyses -- Introduction learning notes

2022-06-21 12:07:00 【Zhenghan andrew Hann】

One 、Programming Languages system

Static program analysis yes programing language in application A subdivision under the level , It is a very important core content .

In the theoretical part , Consider how to design the syntax and semantics of a language , How to design the type system of language and so on . In the past decade , The core of the language has hardly changed
With the grammar of language 、 After the semantic and type systems , We need to support the language . therefore , In the environment section , You need to consider how to provide a runtime environment for running programs —— How to design a compiler , What support is needed at runtime （ Such as memory allocation management ） wait . The language hosting environment is in a stage of slow improvement , It mainly focuses on hardware equipment and high-performance programming optimization
The biggest change is program analysis , Because with IT、 Cloud computing 、 Software SaaS Rapid development of , The scale of software has become larger 、 The structure is more complicated 、 More . How to ensure the reliability of the system 、 Security and other commitments , How to automatically synthesize a degree , It has become an increasingly popular research and engineering field

Two 、Static Analysis Definition

Static analysis analyzes a program P to reason about its behaviors and determines whether it satisfies some properties before running P.

Does P contain any private information leaks?
Does P dereference any null pointers?
Are all the cast operations in P safe?
Can v1 and v2 in P point to the same memory location?
Will certain assert statements in P fail?
Is this piece of code in P dead (so that it could be eliminated)?
…

The two answers in the above figure are both correct in the static analysis semantics , They represent two solutions respectively ：

Branch exhaustion ： Time consuming , But precise
Symbol execution ： Fast , But not so precise

Static Analysis: ensure (or get close to) soundness, while making good trade-offs between analysis precision and analysis speed.

Two Words to Conclude Static Analysis：

Static Analysis = Abstraction + Over-approximation

Take a specific example ： Through static analysis , Judge a paragraph PHP Whether the code can have the risk of executing any external parameter , That is, whether it is Webshell. To complete this static analysis process , The following processing is required ：

Abstraction
Over-approximation
- Transfer functions
- Control flows

The original code is as follows ：

<?php
    v1 = 1;
    v2 = 2;
    v3 = $_POST[1];
    v4 = $_POST[2];
    v5 = v3 == 1 ? v3 : 5;
    $$_POST[3] = $_POST[4];  // $_POST[4] = v6

    if(v3 == 1){
        eval(v5);
    }
    echo "hello world";
?>

Let's see first Abstraction abstract ,

To understand in a popular way Abstraction abstract , Is to change the program from the original 、 High dimensional source code space , Mapping to an abstract 、 Low dimensional symbolic space . After symbolization , Subsequent optimizations 、 analysis 、 It will be more convenient to handle .

Let's look at Over-approximation: Transfer Functions Transformation function ,

In static analysis, transfer functions define how to evaluate different program statements on abstract values.
Transfer functions are defined according to “analysis problem” and the “semantics” of different program statements.

The transformation function defines the operation result of abstract symbols , It should be noted that , The transformation function is related to the specific semantics and the problem to be analyzed .

It should be noted that , Because in Abstraction In the process of abstraction, the dimension of the range space is reduced , So in the transformation function mapping , Between the results of static symbolic execution and dynamic actual implementation , There are differences , It's inevitable .

Let's look at Over-approximation: Control Flows control flow ,

As it’s impossible to enumerate all paths in practice, flow merging (as a way of over-approximation) is taken for granted in most static analyses.

In static analysis , Branch flow merging is a common branch inference technique , Promoted Soundness At the same time , Also led to Completeness The fall of , This leads to inevitable false positives .

3、 ... and 、Why we need Static Analysis

In recent years , The complexity of the program is getting higher and higher , Reliability and security are increasingly difficult to guarantee
- Null pointer dereference, memory leak, etc.
- Null pointer references and memory leaks ： Almost every programmer has been troubled by these two problems
For program reliability 、 Security analysis
- Private information leak, injection attack, etc.
- Disclosure of privacy information ： This problem is common in mobile applications
- Injection attack ： This is a very common topic in network security
Provide basic technology for compilation optimization
- Dead code elimination, code motion, etc.
- Dead code elimination ： In the machine independent optimization of the compiler , Code that will not affect the results of program execution （ Dead code ） Delete .
- Loop invariant code moves ： In the machine independent optimization of the compiler , Under the condition that the program execution results are not affected , Move a specific statement in a loop out of the loop , Reduce the number of statements executed when the program is running . For a more detailed explanation, please refer to StackOverFlow The answer on .
Program understanding （ The call stack 、 Automatic completion, etc ）
- IDE call hierarchy, type indication, etc.
- Help with the functionality of the integrated development environment ： When you use VS/Idea/Clion/Eclipse/Android Studio wait IDE when , Hover over the code ,IDE It can dynamically analyze and prompt the relevant information of the hovering object , The technology behind this is static program analysis .
A deeper understanding of the syntax and semantics of programming languages
Writing naturally is more reliable 、 Security 、 Efficient procedures

Four 、Rice’s Theorem -- Limitations of static analysis

Any non-trivial property of the behavior of programs in a r.e.(recursively enumerable) language is undecidable.”

non-trivial properties：

~= interesting properties
~= the properties related with run-time behaviors of programs

According to Rice's law ,「 Perfect static analysis 」 There are two core features ：

Sound（ Full coverage ）
Complete（ To infer exactly ）

If a program is “non-trivial” Of , There is no perfect static analysis program , Can satisfy at the same time Sound and Complete features . In industrial terms, it is , False positives and false negatives cannot be achieved at the same time 100%.

In practical use , We are not pursuing 「 Perfect static analysis 」, It's about pursuing 「 Useful static analysis 」, That is, it meets the following two core characteristics ：

Compromise soundness (false negatives, Compromise under reporting control )
Compromise completeness (false positives, Compromise false alarm control )

In a real industrial scenario ,Soundness It is often a priority , We use Webshell Static code analysis as an example .

If you pursue Sound The goal of , During static code analysis , integrity / coverage / High detection is often a priority . per contra , Relative false positives are inevitable .

5、 ... and 、 Comparison between static program analysis and similar techniques

Static program analysis

advantage ：
- Under the selected precision, no bug
shortcoming ：
- The academic threshold is relatively high . At present, only Peking University is known to have public course materials in domestic universities , Nanjing University , National Defense University , Jilin University , There are few easy to understand textbooks . As a senior elective course of computer major , It is difficult to get started and improve .

Dynamic software testing

advantage ：
- It is widely used in engineering , And it works . Implement a simple , Easy to automate .
shortcoming ：
- There is no guarantee that there is no bug. This is the inevitable result of not being able to traverse all possible program inputs .
- In today's concurrent environment brought about by multi-core and network applications, its role is limited . Some bug May only occur under certain circumstances , Therefore, it is difficult to reproduce stably .

Formal semantic validation

advantage ：
- Because the program is abstracted by mathematical method , Can guarantee that there is no bug.
shortcoming ：
- The academic threshold is high , Learners must have a good foundation in mathematics to get started .
- The cost of verification is high , Generally speaking, very important projects will use this method to ensure program quality . Even in software as important as the operating system , Not necessarily .

原网站

版权声明
本文为[Zhenghan andrew Hann]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/172/202206211133298680.html