当前位置:网站首页>Tla+ introductory tutorial (1): introduction to formal methods

Tla+ introductory tutorial (1): introduction to formal methods

2022-07-04 22:31:00 InfoQ

This time  TLA+  The introductory tutorial series will be divided into several parts , Help you master from scratch  TLA+  Basic knowledge of language , Welcome to official account and Zhihu “ Polysaccharide ”.

null
1960  Late middle age , The software industry began to explode “ Software Crisis (Software Crisis)”[1]. So-called “ Software Crisis ”, It refers to a series of serious problems encountered in the process of software development and maintenance , These problems may lead to the shortening of software life 、 Even died young . With the rapid growth of computing power and software more and more complex , Software development has become an unmanageable 、 High risk 、 Activities with high failure rates . essentially , Software crisis also means
Write correctly 、 Understandable 、 Verifiable procedures are very difficult
.

The main reason for the software crisis , Very impolitely : When there is no machine , Programming is not a problem at all ; When we have computers , Programming begins to become a problem ; Now we have huge computers , Programming has become an equally huge problem .—— Edsger Dijkstra [2]

in the light of “ Software Crisis ”, Various solutions have been proposed , To sum up, there are basically two kinds :
  • 1、 Adopt engineering methods to organize 、 Manage the software development process ;
  • 2、 Deeply explore the rules of program and program development process , Establish a rigorous theory , Deduce and verify the correctness of the theory , Used to guide software development practice .

The former is “ Software Engineering ”, The latter is what this article will discuss “ Formal method ”.

null
For any software , Correctness always comes first . however , Generally speaking , The correctness of the software is difficult to prove , Software usually goes through layers of testing , But still can't avoid mistakes .

According to our previous experience in software development , First, I will get a system design document ( Sometimes there's not even ), Then according to the demand 、 Materials such as flow charts or message structures begin to write code , Finally, a lot of tests are carried out to verify whether the function is correct . In this case, experience is often tested , Therefore, the more experienced people can often think of more details and possibilities , The designed system is usually more stable . Of course, this has also achieved some results , Especially business software development , Most of them maintain such a process .

But for basic software such as concurrent and distributed systems , Often with high complexity , Its behavior 、 There are too many possibilities of state , It's hard to reproduce , Correctness is also more difficult to verify and test .

Including Turing Prize Winners  Leslie Lamport  Many computer scientists, including , Adopt formal method (Formal Methods) Formal verification and analysis of the system , Is built correctly 、 Reliable and safe software is an important way .

Formal method , It is to use mathematical and logical methods to describe and verify the system , Including the description and verification of the behavior and nature of the system , It can be described in one or more languages , For example, first-order logic 、 Higher order logic 、 Temporal logic 、 State machine 、 Algebra. 、Pi  calculus 、 Special programming languages, etc . In the field of concurrent and distributed systems , The most commonly used is by  Leslie Lamport  Proposed behavioral temporal logic  TLA+  Language .

 The boss wants you to learn TLA+, Also dressed as a clown
Leslie Lamport  Make groundbreaking contributions in the field of concurrent and distributed systems , And won the Turing prize , I think you are no stranger to this old man .Lamport  Claim to be a theorist (theoretician), But he understands the need for engineers to build real systems , Therefore, his own research is also aimed at practicality , among  Paxos  Most famous .

What is?  TLA+

use  Lamport  In his own words ,
TLA+  It is used in digital system (digital systems) senior (high-level) Modeling language
, And there are corresponding tools to test the model .

The key words here are digital system and  high-level, Digital systems include algorithms 、 Programs and various software and hardware systems ,high-level  It means above the code level , Think at the design level .

TLA+  According to official documents [3], You should write  

, namely  +  The number is  TLA  The top right of , But this has caused some difficulties for web page layout , Therefore, it is often written as  TLA+  that will do .

null
TLA+  Abstract a system into a series of discrete events , The first digital system we are familiar with is the clock , It simulates a clock with discrete ticks ; Computers are also digital systems , It consists of a series of discrete program steps .TLA+  Describe discrete events as state changes , Variables in the system will change from one state to another due to events —— This is the state machine model .

Why abstract through the method of state machine ? Because most of the time , Our program and algorithm are running for a long time , For this kind of program that has been running , Abstracting through state machines is an effective way . Plus , State machines are widely used in distributed systems ( for example ,RSM),TLA+  It is just the best language to describe the global state transition of concurrent and distributed systems .

If we describe a state in the system as a variable , Then the state machine can be described as :
  • It's all about the system
    Variable
  • Of these variables
    Initial value
    What is it?
  • Which?
    action
    Change the state of the variable , The relationship between the values of these variables in the current state and their possible values in the next state

Raft The node state transition graph of is a typical state machine model
namely ,TLA+  It mainly describes the transformation relationship of system state . Remember this description , We will follow this method to write and read all  TLA+  Program .

TLA+  Another key point that is so useful is ,TLA+  There is one called  TLC  Model checking tool , You can think of it as  TLA+  Linguistic  IDE, Its function is simple , It's traversal  TLA+  Every possible path of the abstracted system , And some conditions on every possible path ( be called “ Invariant ”) To test . In order to make  TLC  Traverse all possible paths correctly and carry out effective verification , We need to tell it what conditions and attributes to check .

therefore , To write  TLA+  The key to language is , Abstract system variables , Describe the initial value of the variable , The transformation relationship of variables , And verify each critical path . We can follow the state machine described above  0、1、2  Three steps to write  TLA+  describe , And then check , Finally, it is verified by the model checking tool .

This section is the beginning of this series , Don't start with the specific syntax and toolbox usage , This part will be studied in the next section .

In order to promote  TLA+  Language ,Lamport  It also invented something closer to programming language  PlusCal  Language ,PlusCal  Can't run directly , It translates into  TLA+  Language before running . In order not to add more mental burden to readers , This series does not discuss  PlusCal  Language .

Written in the back ,TLA+  Is it really useful ?

Last ,Lamport  I have emphasized many times ,TLA+  It's never production code that can actually run , It mainly aims at modeling and verifying the key parts of the system , So that you can find problems in advance from the design level .

so ,TLA+  It's not a silver bullet , It requires not only the basic mathematics of the user 、 Abstract thinking and thinking ability , And there is still a certain distance from the actual code .

Many engineers will question this , Intuitively, I think this thing has low input-output efficiency , Writing code usually may not even have time to write documents and unit tests , And write an extra  TLA+  To simply verify the system , Is it really worth it ?

I think such a query is reasonable , After all, performance appraisal depends on the final output , In big domestic factories, you don't even have time to do this . But according to the author's observation , Now there are more and more basic software start-ups in China ,TLA+  The use of is also increasing , The author found :TiDB[4]、TDengine[5]、KingbaseES[6]  Other enterprises .

In the face of these questions ,Lamport  I also found him in  Intel、Amazon  An acquaintance of , Know what they are about  TLA+  Usage situation [7]. actually  TLA+  There are awesome figures and manufacturers endorsing in academia and industry , According to statistics , Use abroad  TLA+  The classic cases are :
  • Intel  take  TLA+  For industrial hardware modeling , Help engineers think before actually building ;
  • AWS  from  2011  Put into use  TLA+,TLA+  The model check is in  DynamoDB,S3,EBS  And the internal distributed lock manager have detected potential errors that are difficult to find ,AWS  Several papers on formal methods have been published ;
  • Microsoft  stay  Xbox 360,Azure  It's used in all of them  TLA+, Also used  TLA+  Designed  Cosmos DB[8];
  • A network centric  RTOS  The application case ;
  • For various distributed consensus algorithms (Paxos、Raft  and  EPaxos  etc. ) All provide their  TLA+  verification ;
null
so , The formal method of software has been studied for decades 、 Promotion and Application , A large number of important results have been achieved , Formal methods are not limited to academia , It has been gradually integrated into all stages of the software development process , From demand analysis 、 Function description ( Statute )、 Architecture 、 Algorithm design 、 Programming 、 Testing and even operation and maintenance stage .

therefore , Even if you don't plan to use  TLA+  Come and write something , Study  TLA+  Still valuable , Not only can you quickly understand others' algorithms , It can also provide a new way of thinking , At least in the distributed field , It can make you a better engineer .

Besides, ,TLA+  It's not hard to learn ,Amazon  The shared case shows [9], Whether old bird or novice , Can start in a few weeks  TLA+, It's a technology worth investing in .

I believe , Follow this  TLA+  series , You don't even need a few weeks , You can get started  TLA+, So as to help you think clearly about the behavior of the system ( Scientific perspective ), understand  Paxos  and  Raft  Algorithm  TLA+  describe , Verify the correctness of the distributed protocol .

Next section , Let's start with the simplest example .

Reference

[1]  Software Crisis (Software Crisis):https://en.wikipedia.org/wiki/Software_crisis
[2] Dijkstra, E. W. The Humble Programmer. Communications of the ACM. Aug 1972, 15 (10): 859–866. doi:10.1145/355604.361591.
[3] TLA+  Official homepage :https://lamport.azurewebsites.net/tla/tla.html
[4] TiDB TLA+  Statute :https://github.com/pingcap/tla-plus
[5] TDengine  In use  Raft  Of  TLA+  Statute :https://www.taosdata.com/techtalk/8976.html
[6] KingbaseES  Use TLA+ and PlusCal Enhance the reliability of products :https://bbs.kingbase.com.cn/thread-753-1-1.html
[7] Industrial Use of TLA+: https://lamport.azurewebsites.net/tla/industrial-use.html
[8] High-level TLA+ specifications for the five consistency levels offered by Azure Cosmos DB:https://github.com/Azure/azure-cosmos-tla
[9] How Amazon Web Services Uses Formal Methods, Chris Newcombe, Tim Rath, Fan Zhang, Bogdan

Welcome to my official account. :

原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/185/202207042139169170.html