This time TLA+ The introductory tutorial series will be divided into several parts , Help you master from scratch TLA+ Basic knowledge of language , Welcome to official account and Zhihu “ Polysaccharide ”.
1960 Late middle age , The software industry began to explode “ Software Crisis (Software Crisis)”[1]. So-called “ Software Crisis ”, It refers to a series of serious problems encountered in the process of software development and maintenance , These problems may lead to the shortening of software life 、 Even died young . With the rapid growth of computing power and software more and more complex , Software development has become an unmanageable 、 High risk 、 Activities with high failure rates . essentially , Software crisis also means
Write correctly 、 Understandable 、 Verifiable procedures are very difficult
.
The main reason for the software crisis , Very impolitely : When there is no machine , Programming is not a problem at all ; When we have computers , Programming begins to become a problem ; Now we have huge computers , Programming has become an equally huge problem .—— Edsger Dijkstra [2]
in the light of “ Software Crisis ”, Various solutions have been proposed , To sum up, there are basically two kinds :
2、 Deeply explore the rules of program and program development process , Establish a rigorous theory , Deduce and verify the correctness of the theory , Used to guide software development practice .
The former is “ Software Engineering ”, The latter is what this article will discuss “ Formal method ”.
For any software , Correctness always comes first . however , Generally speaking , The correctness of the software is difficult to prove , Software usually goes through layers of testing , But still can't avoid mistakes .
According to our previous experience in software development , First, I will get a system design document ( Sometimes there's not even ), Then according to the demand 、 Materials such as flow charts or message structures begin to write code , Finally, a lot of tests are carried out to verify whether the function is correct . In this case, experience is often tested , Therefore, the more experienced people can often think of more details and possibilities , The designed system is usually more stable . Of course, this has also achieved some results , Especially business software development , Most of them maintain such a process .
But for basic software such as concurrent and distributed systems , Often with high complexity , Its behavior 、 There are too many possibilities of state , It's hard to reproduce , Correctness is also more difficult to verify and test .
Including Turing Prize Winners Leslie Lamport Many computer scientists, including , Adopt formal method (Formal Methods) Formal verification and analysis of the system , Is built correctly 、 Reliable and safe software is an important way .
Formal method , It is to use mathematical and logical methods to describe and verify the system , Including the description and verification of the behavior and nature of the system , It can be described in one or more languages , For example, first-order logic 、 Higher order logic 、 Temporal logic 、 State machine 、 Algebra. 、Pi calculus 、 Special programming languages, etc . In the field of concurrent and distributed systems , The most commonly used is by Leslie Lamport Proposed behavioral temporal logic TLA+ Language .
Leslie Lamport Make groundbreaking contributions in the field of concurrent and distributed systems , And won the Turing prize , I think you are no stranger to this old man .Lamport Claim to be a theorist (theoretician), But he understands the need for engineers to build real systems , Therefore, his own research is also aimed at practicality , among Paxos Most famous .
What is? TLA+
use Lamport In his own words ,
TLA+ It is used in digital system (digital systems) senior (high-level) Modeling language
, And there are corresponding tools to test the model .
The key words here are digital system and high-level, Digital systems include algorithms 、 Programs and various software and hardware systems ,high-level It means above the code level , Think at the design level .
TLA+ According to official documents [3], You should write
, namely + The number is TLA The top right of , But this has caused some difficulties for web page layout , Therefore, it is often written as TLA+ that will do .
TLA+ Abstract a system into a series of discrete events , The first digital system we are familiar with is the clock , It simulates a clock with discrete ticks ; Computers are also digital systems , It consists of a series of discrete program steps .TLA+ Describe discrete events as state changes , Variables in the system will change from one state to another due to events —— This is the state machine model .
Why abstract through the method of state machine ? Because most of the time , Our program and algorithm are running for a long time , For this kind of program that has been running , Abstracting through state machines is an effective way . Plus , State machines are widely used in distributed systems ( for example ,RSM),TLA+ It is just the best language to describe the global state transition of concurrent and distributed systems .
If we describe a state in the system as a variable , Then the state machine can be described as :
Of these variables
Initial value
What is it?
namely ,TLA+ It mainly describes the transformation relationship of system state . Remember this description , We will follow this method to write and read all TLA+ Program .
TLA+ Another key point that is so useful is ,TLA+ There is one called TLC Model checking tool , You can think of it as TLA+ Linguistic IDE, Its function is simple , It's traversal TLA+ Every possible path of the abstracted system , And some conditions on every possible path ( be called “ Invariant ”) To test . In order to make TLC Traverse all possible paths correctly and carry out effective verification , We need to tell it what conditions and attributes to check .
therefore , To write TLA+ The key to language is , Abstract system variables , Describe the initial value of the variable , The transformation relationship of variables , And verify each critical path . We can follow the state machine described above 0、1、2 Three steps to write TLA+ describe , And then check , Finally, it is verified by the model checking tool .
This section is the beginning of this series , Don't start with the specific syntax and toolbox usage , This part will be studied in the next section .
In order to promote TLA+ Language ,Lamport It also invented something closer to programming language PlusCal Language ,PlusCal Can't run directly , It translates into TLA+ Language before running . In order not to add more mental burden to readers , This series does not discuss PlusCal Language .
Written in the back ,TLA+ Is it really useful ?
Last ,Lamport I have emphasized many times ,TLA+ It's never production code that can actually run , It mainly aims at modeling and verifying the key parts of the system , So that you can find problems in advance from the design level .
so ,TLA+ It's not a silver bullet , It requires not only the basic mathematics of the user 、 Abstract thinking and thinking ability , And there is still a certain distance from the actual code .
Many engineers will question this , Intuitively, I think this thing has low input-output efficiency , Writing code usually may not even have time to write documents and unit tests , And write an extra TLA+ To simply verify the system , Is it really worth it ?
I think such a query is reasonable , After all, performance appraisal depends on the final output , In big domestic factories, you don't even have time to do this . But according to the author's observation , Now there are more and more basic software start-ups in China ,TLA+ The use of is also increasing , The author found :TiDB[4]、TDengine[5]、KingbaseES[6] Other enterprises .
In the face of these questions ,Lamport I also found him in Intel、Amazon An acquaintance of , Know what they are about TLA+ Usage situation [7]. actually TLA+ There are awesome figures and manufacturers endorsing in academia and industry , According to statistics , Use abroad TLA+ The classic cases are :
AWS from 2011 Put into use TLA+,TLA+ The model check is in DynamoDB,S3,EBS And the internal distributed lock manager have detected potential errors that are difficult to find ,AWS Several papers on formal methods have been published ;
so , The formal method of software has been studied for decades 、 Promotion and Application , A large number of important results have been achieved , Formal methods are not limited to academia , It has been gradually integrated into all stages of the software development process , From demand analysis 、 Function description ( Statute )、 Architecture 、 Algorithm design 、 Programming 、 Testing and even operation and maintenance stage .
therefore , Even if you don't plan to use TLA+ Come and write something , Study TLA+ Still valuable , Not only can you quickly understand others' algorithms , It can also provide a new way of thinking , At least in the distributed field , It can make you a better engineer .
Besides, ,TLA+ It's not hard to learn ,Amazon The shared case shows [9], Whether old bird or novice , Can start in a few weeks TLA+, It's a technology worth investing in .
I believe , Follow this TLA+ series , You don't even need a few weeks , You can get started TLA+, So as to help you think clearly about the behavior of the system ( Scientific perspective ), understand Paxos and Raft Algorithm TLA+ describe , Verify the correctness of the distributed protocol .
Next section , Let's start with the simplest example .
Reference
[1] Software Crisis (Software Crisis):https://en.wikipedia.org/wiki/Software_crisis
[2] Dijkstra, E. W. The Humble Programmer. Communications of the ACM. Aug 1972, 15 (10): 859–866. doi:10.1145/355604.361591.
[3] TLA+ Official homepage :https://lamport.azurewebsites.net/tla/tla.html
[4] TiDB TLA+ Statute :https://github.com/pingcap/tla-plus
[5] TDengine In use Raft Of TLA+ Statute :https://www.taosdata.com/techtalk/8976.html
[6] KingbaseES Use TLA+ and PlusCal Enhance the reliability of products :https://bbs.kingbase.com.cn/thread-753-1-1.html
[7] Industrial Use of TLA+: https://lamport.azurewebsites.net/tla/industrial-use.html
[8] High-level TLA+ specifications for the five consistency levels offered by Azure Cosmos DB:https://github.com/Azure/azure-cosmos-tla
[9] How Amazon Web Services Uses Formal Methods, Chris Newcombe, Tim Rath, Fan Zhang, Bogdan
Welcome to my official account. :
原网站版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/185/202207042139169170.html