当前位置:网站首页>How to write high-performance code (IV) optimize data access

How to write high-performance code (IV) optimize data access

2022-06-11 13:09:00 xindoo

 Insert picture description here

   The same logic , The code performance of different people's implementations will vary by orders of magnitude ; The same code , You may fine tune the order of a few characters or a line of code , There will be several times the performance improvement ; The same code , It is also possible that running on different processors will have several times the performance difference ; Ten times It doesn't just exist in legends , It may be everywhere around us . Tenfold Embodied in the programmer's method , Code performance is one of the most intuitive aspects .

   This article is about 《 How to write high-performance code 》 The fourth in the series , This article will show you how data access can affect the performance of your program , And how to improve the performance of the program by changing the way of data access .

Why does data access speed affect program performance ?

   Each of the running procedures can be simplified into such a three-step model : First step , Reading data ( Of course, there are also some data sent from other places ); The second step , Processing the data ; The third step , Write the processed result to the memory . Here I will abbreviate these three steps to Read calculate write . Actually, it's true CPU The instruction execution process will be slightly more complicated , But it's actually these three steps . And a complex program contains countless CPU Instructions , If reading or writing data is too slow , It will inevitably affect the performance of the program .
 Insert picture description here
   In order to be more intuitive , Here, I compare the process of program execution to that of chef cooking , The chef's workflow is to take the original ingredients , Then process the ingredients ( Fried, roasted, fried and boiled ), Finally, the pot is served . The factors that affect the chef's cooking speed except before the processing process , The time taken to obtain food materials will also affect the chef's cooking speed . Some ingredients are at hand , You can quickly get , But some ingredients may be in the cold storage 、 Even in the vegetable market , It is very inconvenient to obtain .

  CPU Like a chef , And the data is CPU The ingredients of , The data in the register is CPU The ingredients at hand , The data in the memory is the food in the cold storage , Solid state disk (SSD) The data on is the ingredients still in the vegetable market , Mechanical drive (HDD) The data on the is like vegetables growing in the field …… If CPU When running a program , If you can't get the data you need , It can only wait there to waste time .

How much does data access speed affect program performance ?

   The time delay of reading and writing data from different memories varies greatly , Given that in most scenarios , We all read data , Let's just take data reading as an example , Fastest registers and slowest mechanical disks , The delay difference between random reads and writes is millions of times . Maybe you don't have an intuitive concept , Let's make an analogy with cooks .

   Suppose the cook wants to cook a scrambled egg with tomatoes , If the ingredients are prepared , It only takes about ten seconds for the ingredients to stir fry . We compare this time to CPU Time to get data from the register . But if it is CPU To get data from a disk , The time it takes is equivalent to a chef growing tomatoes or raising chickens to lay eggs (3-4 Months ). thus it can be seen , Get data from the wrong storage device , Will greatly affect the running speed of the program .

   Let me tell you another actual case we encountered in the production environment , We have also had failures in the production environment . That's why , We have a time for service containerization transformation , It is not deployed in the same machine room as the upstream service , Cross machine rooms will only increase 1ms Delay of , But there is something wrong with their service code , There is an interface that serially calls another service in batches , The serial accumulation causes the interface delay to increase by hundreds ms. Services without performance problems , Because of the relocation of the computer room , Cause performance problems ……

Memory performance differences

   In fact, when coding , There are a variety of storage devices encountered , register 、 Memory 、 disk 、 Network storage ……, Each device has its own characteristics . Only by recognizing the differences between the various memories , We can use the right memory in the right scenario . The following table is the random read delay reference data of various common storage devices ……
 Insert picture description here

remarks : The above data will be different in different hardware devices , This is just to show the difference , Does not represent an accurate value , Please refer to the hardware manual for accurate information .

   Although we think that the reading speed of memory is very fast , What data acquisition is slow when writing code everyday , Add a memory cache and the speed will take off . But the memory access speed is relative to CPU The running speed is still too slow , Time to read memory once , It's enough CPU Hundreds of instructions have been executed , So modern CPU All add cache to the memory .

How to reduce the impact of data access delay on performance ?

   Reducing the impact of data access latency on performance is also simple , That is to put the data on the fastest storage medium possible . However , Access speed 、 Capacity 、 There is an irreconcilable contradiction among the three prices , In a nutshell The faster the speed, the smaller the capacity, but the more expensive the price , On the contrary, the larger the capacity, the slower the speed and the cheaper the price .
 Insert picture description here
   The world is always such a coincidence , As if everything had been arranged , We don't need to put all the data on the fastest storage medium . Remember we were in the second ( Skillfully using data characteristics )[https://blog.csdn.net/xindoo/article/details/123941141] Mentioned data locality ! There are two kinds of locality , Spatial locality and temporal locality .

  • Temporal locality : If a piece of data has been accessed at a certain time , That data will be accessed again soon .
  • Spatial locality : If a storage element has been accessed , Probably not long after that , Nearby storage units will also be accessed .

   To sum up, these two points are , Most of the time, the program only accesses a small part of the data . This means that we can cover most of the accessed data with less storage space . To be direct , We can add cache . actually , Whether it's computer hardware 、 database 、 Or business systems , The cache is everywhere . Even every line of code you write , The cache is used when running on the machine , I wonder if you have paid attention to CPU,CPU A parameter , Is the cache size , We use intel core i7-12650HX For example , It has 24MB Third level cache of , This cache is CPU Cache to memory . It's just that modern computers have shielded the underlying details , It is unlikely that we will mainly arrive at .

  ** When we write our own code , You can also add cache to improve program performance .** Take a recent example we encountered in the system , We are currently working on Data permission related functions , Different employees have different permissions in our system , So the data they see should be different . Our implementation method is that each user requests the system , First, get the permission list of the user , Then display all the data in the permission list .

   Because everyone has a large permission list , So the performance of the permission interface is not very good , Each request takes a long time . therefore , We directly cache the data of this interface , Get priority from cache , Unable to access the reconditioning interface , Greatly improves program performance . Of course, the permission data will not change frequently , Therefore, it is not necessary to consider the consequences caused by data lag . in addition , We only added a few minutes to cache data , Because a single user can use our system for several minutes , After a few minutes, the data expiration cache space will be automatically released , Achieve the purpose of saving space .

   When I was in college, I would , Notebook computers were still equipped with mechanical hard disks as standard , At that time, the computer will be too laggy forever , Later I learned that changing clothes SSD It will improve the performance of the computer , At that time SSD It's quite expensive , Ordinary notebooks are not standard SSD Later, I saved half a month's living expenses to replace a piece of my notebook 120g Of SSD, The running speed of the computer has been significantly improved , In essence, it is still because SSD The random access time delay of is hundreds of times faster than that of mechanical hard disk . Before, a large factory claimed that it would mysql Performance has been improved hundreds of times , It's also based on SSD Do a lot of query optimization .

Cache is not a silver bullet

Silver bullet ( english :Silver Bullet), A bullet made of pure silver or silver plated . In European folklore and 19 Under the influence of the trend of Gothic Novels since the th century , Silver bullets are often depicted as weapons with exorcism , It's for werewolves 、 Special effect weapons of supernatural monsters such as vampires . Later, it was also compared to an extremely effective solution , As an assassin's Mace 、 The strongest killing move 、 Trump, etc .

 Insert picture description here
   Here's a special reminder , Caching is not everything , Caching actually has side effects , That is, the validity of data is difficult to be guaranteed . The cache actually contains old data , Is the current time data still like this ? Not sure , Maybe the data has changed , So when using cache We must pay attention to the validity of cached data . If the cache time is too long , Possible performance of data failure , The greater the risk caused by inconsistent data . If the cache time is too short , Because it is often necessary to obtain raw data , The less meaningful the cache is . So in Using caching requires a trade-off between data inconsistency and performance (trade-off), You need to properly evaluate the timeliness of the data , Set a reasonable expiration policy for the cache .

   As mentioned above, the cache is used in every line of code we write , Now we all know that this cache is actually CPU Of Cache.CPU Of Cache It also has obvious side effects , When writing multithreaded code, we have to pay attention to , That's multicore CPU The problem of data consistency between . because CPU Cache The existence of , We have to consider data synchronization when writing multithreaded code , Making multithreaded code hard to write , If something goes wrong, it's hard to check .

   There is an interview essay topic that can easily explain this problem —— Multithread counter , Multithreading to operate counters , Accumulate Statistics , How to ensure the accuracy of data statistics . If it's just for simple use cnt++ Realization , There will be multicore CPU Data inconsistency caused by caching , The specific principle will not be explained here , Anyway, the result is that the statistical data will be less than the real data . The right thing to do is , You must add a multi-threaded synchronization mechanism to the accumulation process , Ensure that only one thread can operate at the same time , After the operation, the data can be written back to memory , stay java Must be implemented using locks or atomic classes . And this is a threshold for novice programmers .


summary

   Data access is an integral part of any program , Even for most programs, time is spent on data access , So as long as you optimize the time consumption of this part , The performance of the program must be improved .

   That's all for this article , Next , We will continue to discuss how to optimize the performance to the extreme , Coming soon !! in addition , If you are interested, you can also refer to the previous articles .

How to write high-performance code series

原网站

版权声明
本文为[xindoo]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206111304390910.html