当前位置:网站首页>Testing and debugging of multithreaded applications

Testing and debugging of multithreaded applications

2022-07-06 06:03:00 Zhan Miao

Testing and debugging are equivalent to two sides of a coin —— Test the code for errors , Debug code to correct errors . If you're lucky , You debug all the errors yourself , Instead of letting users of the application discover code vulnerabilities . Before we introduce testing and debugging , It is important to understand what problems may arise , Let's look at these problems first .

1. Types of concurrency related errors

1.1. Unnecessary blocking

What does unnecessary blocking mean ? First , Linear blocking refers to threads waiting for certain conditions ( Mutually exclusive element 、 Condition variables, 、 Time and so on ) The state in which the operation cannot continue . In multithreaded code , Commonly used ] Some conditions , And these conditions are often not met , Therefore, there is an unnecessary blocking problem . We will ask the next question again : Why is this blockage unnecessary ? Because there are other threads waiting for the blocked thread to perform some actions , If the thread is blocked , Other threads are bound to block . Unnecessary blocking is divided into the following categories . Unnecessary blocking Live lock —— When the first thread waits for the second thread , And the second thread is waiting for the first thread ) Situation , A live lock is like a deadlock . The key difference between a live lock and a deadlock is that the waiting process is not a hindrance ; State is a continuous cycle of detecting state , Such as the spin lock . Serious times , The symptoms of livelock are like death ( The application will not execute any process ), The only difference is CPU At this time, the utilization rate is very high , Because now ; It is still running and testing , Just wait for each other and block . When it's not too serious , When a random event ; When I was born , The livelock may be unlocked , however , Livelocks can lead to tasks that cannot be performed for a long time , and ! in the meantime CPU High utilization .

1.1.1. Deadlock

Deadlock means that the first thread waits for the second thread to execute before continuing , And the second thread is waiting for the first thread , This constitutes a thread waiting loop state if your thread deadlocks , Then your program will not be able to continue . Under many foreseeable circumstances , One of the threads is responsible for interacting with the user interface , In the case of deadlock , The user interface will stop answering . And in other cases , The user interface will still answer , But some necessary tasks cannot be implemented , Such as not returning search results or not printing files .

1.1.2. Live lock

When the first thread waits for the second thread , And the second thread is waiting for the first thread , A live lock is like a deadlock . The key difference between livelock and deadlock is that the waiting process is not a blocking state, but a continuous loop detection state , Such as the spin lock . Serious times , The symptoms of livelock are like deadlock ( The application will not execute any process ), The only difference is CPU At this time, the utilization rate is very high , Because it is still running and testing , Just wait for each other and block . When it's not too serious , When a random event occurs , The livelock may be unlocked , however , Livelocks can lead to tasks that cannot be performed for a long time , And in the meantime CPU High utilization .

1.1.3. stay I/O Or other external input blocking

When your thread is blocked because you are waiting for some external input and cannot continue execution , Maybe this external input will never come , Then this kind of blocking is called wait based I/O Or other external input blocking . therefore , You don't want a thread to block waiting for external input , Other threads are blocked because they want to wait for this thread to run .

1.2. Competitive conditions

Race conditions are the most common cause of problems in multithreaded code —— Many deadlocks and livelocks are actually manifestations of competitive conditions . Not all competitive conditions are problematic , The time when the race condition occurs depends on the sequence of operations of each independent thread . Many competitive conditions are beneficial . for example , Exactly which thread will handle the next task in the task queue is uncertain . However , Many concurrent errors occur due to race conditions . Competitive conditions often produce the following types of errors .

1.2.1. Data competition

Data competition is a special competitive condition . Because there is no synchronization for a share Parallel access to storage , therefore , Data competition will cause undefined operations . When you mistakenly use atomic operations to synchronize threads or want to avoid mutex deadlock by sharing data , Data competition often occurs .

1.2.2. Destroy invariants

It often appears as hanging pointer ( Because another thread deleted the accessed data )、 Random storage damage ( The read data of the thread is inconsistent due to local updates ) Or double idle ( For example, when two threads pop up the same value from the same queue , And these two threads delete some related data ) etc. . Destruction invariant often refers to the change of invariant in time or value . If multiple threads require execution in a specific order , Then incorrect synchronization may produce race conditions caused by the wrong thread execution sequence .

1.2.3. The question of survival

People often attribute the problem of lifetime to the problem of destruction invariants , But in fact, the survival problem is another independent problem classification caused by competitive conditions . The basic problem with errors in this classification is that threads will access some data over time , And these data may have been deleted 、 The memory destroyed or accessed has actually been reused by another object . When a thread wants to reference a local variable , And this local variable is no longer within the thread's access ability , This will cause survival problems . When there is no limit rule between the lifetime of a thread and the data it can manipulate , that , It is very likely that the data has been destroyed before the end of the thread , And cause thread access errors . If you call join() To let the data be destroyed after the thread completes , Then you need to make sure that when something goes wrong , You can skip join() Execution of a function , This is the basic security guarantee for thread exceptions .

Competitive conditions are problem killers . Deadlocks and livelocks can lead to tasks that cannot be executed for a long time . Usually , You can add a debugger to distinguish which threads are deadlocked or alive , And which concurrent objects are contradictory .

The data competition described above can occur anywhere in the whole code 、 Symptoms of problems that destroy variables and life cycles ( Such as random crash or incorrect output ), The code may rewrite the memory that other programs may use later , Causes compilation errors . The error location given by compilation is often completely independent of the error code , But long after the program is executed , To expose the error . Such errors are often caused by shared system memory , Even if you carefully try to specify a thread to access a certain data , And ensure correct synchronization , however , Any thread can rewrite the data that other threads in the application need to use .

thus , We briefly identified the types of errors we will encounter , Let's take a look at , How can we locate the wrong instance , And solve them .

2. Techniques for locating concurrency related errors

Even review your own code , You may still miss some mistakes . therefore , No matter when , You have to make sure that your code can be executed , Even if the code doesn't execute smoothly , You should also maintain a peaceful state of mind . therefore , We will introduce some multithreaded testing and debugging techniques related to reviewing code .

2.1. Review the code to locate potential errors

When reviewing multithreaded code to correct parallelism related errors , It is very important to read thoroughly and carefully , Read the code like a fine tooth comb . If possible, let others review your code for you , Because they didn't participate in the code writing , They have to figure out how the code works , therefore , You will find many missing mistakes . This requires code readers to have enough time to review the code carefully and responsibly , Instead of simply going through it quickly . Most parallel errors cannot be found by simply scanning the code quickly , These mistakes often require subtle timing .

If you ask your colleagues to review your code for you , This code is completely strange to him . therefore , They will see things from different perspectives , And point out some mistakes you haven't found . If you can't find a colleague to review the code for you , You can ask your friends for help , Even send the code to the network for help . If you really can't find someone to review the code for you , perhaps , They can't find the problem , Don't worry. , You can still do that . For beginners , Put the code on hold for a while , To do something else , Such as writing other parts of the program 、 read 、 Take a walk etc. . In the meantime , When you concentrate on something else , Your subconscious is still thinking about this problem . meanwhile , When you go back to the code , The code is not so familiar , In this way, you may review your code from a different perspective .

The alternative to having someone review the code is to review it yourself . A useful technique is to try to explain details of how it works to others . This person may not even be an entity , Such as puppet bear or rubber chicken , Personally, I think it is very helpful to write detailed notes . You have to explain , What does each line of code do , What's going to happen , Access data, etc . You should constantly ask yourself questions and explain your answers .

Problems to consider when reviewing multithreaded code

Here are some specific but not all , Some questions I like to ask . You can also find other issues that you are more concerned about . No more , First list these questions for reference .

  • What data needs to be protected , Prevent parallel access ?
  • How to ensure that your data is protected ?
  • At this point, where other threads execute the code ?
  • What semaphores does this thread use ?
  • What semaphores do other threads hold ?
  • Is there a requirement for the sequence between the operations of this thread ? Is there such a problem in other threads ? How to enforce these requirements ?
  • Whether the data loaded by this thread is valid ? Whether the data has been modified by other threads ?
  • If you assume that other threads may be modifying the data , What consequences might it lead to and how to ensure that such things never happen ?

The last question is my favorite question , Because it can really help me figure out the relationship between threads . By assuming that there is an error in a line of code , You can trace the reason like a detective . To convince yourself , There are no errors in the code , You need to consider all the situations and possible sequencing . When data is protected by multiple semaphores during its lifetime , This method is very useful .

The penultimate question listed is also important , Because it solves a simple mistake that is often made , If you re acquire the semaphore after releasing it, you must assume that other threads have modified the shared data . Obviously , If mutexes are not immediately visible because they are internal to the object, you may unconsciously do so .

2.2. Locate concurrency related errors through testing

When developing a single threaded application , Application testing is relatively simple and time-consuming . First , You need to distinguish between all possible input data sets ( Include at least some typical input test sets ) And test these input data sets . If the application can execute correctly and produce the correct output , It shows that the application can run normally for a given input set . If an error condition is detected , Processing will be more complex than running correctly . however , The basic idea is the same —— Establish initialization conditions to execute the application .

Testing multithreaded code is much harder than single threading , Because reasonable scheduling of threads is uncertain , Therefore, the difference in thread scheduling will lead to changes in operation . therefore , Even if the application runs the same set of input data , If there are competition conditions hidden in the code , It is still possible that sometimes it runs correctly and sometimes it runs incorrectly . Because there are potential competition conditions does not mean that code execution always fails , Only sometimes it may fail .

Given the inherent difficulty in reproducing concurrency related errors , therefore , The test program needs to be carefully designed . You want each test to identify the least possible code problems , So when the test fails , You can better isolate the error code —— It is better to test the parallel stack pressing and out of stack work directly rather than the whole code block using the parallel queue .

It is worth testing to eliminate concurrency to prove that the problem is concurrency related . If you make an error when running all programs in one thread , This error is just a common error, not a concurrency related error . It is very important to track the initial location of the error rather than the location of the error found by your test tool . This is because even if the error occurs in the multithreaded part of your application , It does not mean that it is concurrency related . If you use thread pools to manage concurrency levels , Usually you can specify the worker thread by setting the configuration parameters . If you manage threads manually , You need to modify the code to use a single thread test to test . One side , You can reduce your thread to one , In this way, concurrency can be eradicated ; On the other hand , If there is no error in the single core system ( Even a multithreaded application ), But there is an error in multi-core system or multi processor system , Then it is the race condition error and possible synchronization or memory order error .

2.3. Testability design

Testing multithreaded code is difficult , So how can you make the code easy to test ? One of the most important things you can do is design code that is easy to test . Most of the existing techniques for designing easy to test code are used for single threaded code , however , Many of these technologies can also be applied to multithreading . Usually , After doing the following , The code is easier to test .

  • The division of functions and classes of each function is clear .
  • The function is concise .
  • Your test code can completely control the environment around your tested code .
  • The tested code that requires specific operations should be concentrated in one piece rather than scattered throughout the system .
  • Before you write test code, you should consider how to test code .

All of the above can be applied to multithreaded code . in fact , I think the above points are more applied to solve the testability of multithreaded code than that of single threaded code . The last one above is very important , Even before you write application code , At this time, it is far from the step of writing test code , Before you write application code, it is also necessary to consider how to test it —— What kind of input to use , Under what conditions may errors occur , How to find the potential errors in the code . One of the best ways to design parallel code that is easy to test is to eliminate concurrency . If you can divide the code into multiple parts to be responsible for the communication path between the communication data to be operated and multiple threads , such , You greatly reduce the problem . These application parts when operating data accessed by a single thread can be tested using normal single thread technology . such , Those parts of concurrent code that are difficult to test to handle the communication between threads and ensure that only one thread accesses a specific data block at a time become less , When an error occurs in the test , It is also easier to trace the source of the error .

for example , If your application is designed as a multithreaded state machine , Then you can break it down into several parts . The state logic of threads used to ensure the correctness of state transition and operation for each possible input set can be tested independently by single thread technology , And through the test input set provided by the test tool , It can also be applied to other threads . next , Through the special design of multiple concurrent threads and simple state logic in the test code , The core state machine and the code that ensures the information routing of events to the correct thread in the correct order can be tested independently .

Optionally , If you break the code into multiple code blocks , Read shared data / Migrating data / Update shared data , You can use all single thread techniques to test the code block part of the migrated data , Because at this time, this part of code is only a single thread code . Testing a difficult problem of multi-threaded migration can be degraded to testing one of reading shared data blocks and updating shared data blocks , Choose which is easy .

It should be noted that library function calls can use internal variables to store state , then , If multiple threads use the same set of library function calls to share among multiple threads . Because code access to shared data is not immediately apparent , therefore , There are still some problems in multithreading sharing . However , As you learn about these library function calls , Multithreading sharing is still a problem . At this time , You can either add appropriate protection and synchronization or use alternative safe functions for multithreaded parallel access .

2.4. Multi thread testing technology

You need to think about the scenario you want to test and write some small code to test the function . that , How can you ensure that those time schedules with potential problems solve their potential errors through small test exercises ? in fact , There are many ways to do this , Such as violence test or stress test .

2.4.1. Violence test

The core idea of brute force testing is to enumerate all possible situations to see whether the code can be normal without errors . The most typical method is to run the code multiple times , And run as many threads at once as possible . If an error occurs only when multiple threads are running in a particular order , Then the more code you run , The more likely it is to make a mistake . If you only test once and pass the test , You may confidently think that there is nothing wrong with the code , Able to work . If you run a batch ten times and pass the test every time , You will be more confident . If you test a billion times , And passed the test every time , Then you will have great confidence in your code .

Your confidence level depends on the number of times you pass the test . If your test results are very accurate , Tests can even be accurately generalized to thread safe queues , Such exhaustive testing will make you extremely confident in your code ; On the other hand , If there is a lot of code being tested , There are many possible permutations , Running even a billion times will only produce a little confidence .

The disadvantage of exhaustive testing is that it may lead to blind confidence . Maybe the test environment you write will not produce errors , Even if you run it many times, there will be no error , however , If you change a slightly different environment, you will make mistakes every time you test . The worst case is that there will be no problematic test environment in your test system because you are testing in a special environment . Unless your code runs in the same environment as your code tests , And the corresponding hardware and operating system will not cause any errors .

A typical example given here is to test a multithreaded application on a single processing system . Because every thread needs to run on the same processor , All tasks are performed automatically in series , Then many competitive conditions and bidirectional cache problems that may be encountered on multiprocessors no longer exist in single processor systems . It's not just about variables ; Different processor architectures produce different synchronization and device timing problems .

2.4.2. Combined simulation test

Combined simulation test refers to running your code on software in a special simulation code real running environment . You will notice that this software allows you to run multiple virtual machines on a single physical computer , The characteristics of these virtual machines and hardware are called by the upper software . Different from simulation system , Simulation software can record thread data access 、 lock 、 Sequence of atomic operations, etc . then , Use C++ The rules of the memory model repeatedly run each group of allowed combination operations to identify competition conditions and deadlocks .

However, such a comprehensive test portfolio can ensure that all errors in the system can be found , however , Many small mistakes , It often takes a lot of time to find it , Because the number of permutations of combined operations will increase exponentially with the number of threads and the number of operands per thread . Therefore, the combination testing technology is best reserved for the fine testing of code fragments , Instead of testing the entire application . An obvious disadvantage of combinatorial testing is that it depends on the ability of simulation software to handle the operations in your code .

Combinatorial testing techniques can be used to test your code repeatedly under normal conditions , however , This technique may miss some errors , therefore , You need a technology , This technology allows you to test your code repeatedly under various specific conditions . Is there such a technology ?

Using library functions that find problems during test runs is such a technique .

2.4.3. Use special library functions to detect the problems exposed by the test

Although this technology cannot provide a comprehensive inspection of the combination of simulation tests , however , You can use some special library functions to synchronize basic units to find most errors , These synchronization basic units, such as mutually exclusive elements 、 Locks and conditional variables . for example , Common requirements are to use mutexes on a piece of shared data . When you access data , If a mutex is detected , It can be confirmed that when accessing data , The calling thread has locked the mutex and reported access failure . By tagging your shared data , You can use library functions to check data sharing .

If a special thread has multiple mutually exclusive elements at one time , The application library function can also record the order of locks . If another thread locks the mutex at a different timing , Even if the test runs without errors , It will also be marked as a possible deadlock .

Another special class of library functions that test multithreading is the thread that obtains the lock through multiple threads or through notify_one() The function calls a race variable, and the control of the thread is handed over to the tester to realize the atomic properties of the thread , Such as mutually exclusive elements and conditional variables . This allows you to create specific test scenarios and verify whether the code can run smoothly in these specific scenarios .

Besides , stay C++ There are also some library functions that can be used for testing in the standard library functions , We can call these standard library functions in our testing tools . After reading the different ways of executing test code , Now let's look at how to build test code to achieve the scheduling sequence you want .

reference

C++ Concurrent programming practice

原网站

版权声明
本文为[Zhan Miao]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060557471252.html