当前位置:网站首页>Distributed global ID generation scheme

Distributed global ID generation scheme

2022-07-07 05:40:00 Qin Tian

Catalog

One 、 Distributed ID Characteristics of

Two 、 Distributed ID The generation scheme of

1、 Database autoincrement ID

2、 Batch build ID

3、UUID

4、 Take the current milliseconds

5、Redis Generate ID

6、 class snowflake Algorithm


One 、 Distributed ID Characteristics of

  • Uniqueness : Make sure that the generated ID It is the only one in the whole network .
  • Order and increment : Make sure that the generated ID For a certain user or business, it increases in order according to a certain number .
  • High availability : Make sure that you can correctly generate... At any time ID.
  • Take time :ID It contains time , A glance at the past will tell which day of the transaction .

Two 、 Distributed ID The generation scheme of

1、 Database autoincrement ID

Using the database id Self increasing strategy , Such as MySQL Of auto_increment. And you can use two databases to set different steps , Generation does not repeat ID To achieve high availability .

advantage :

(1) Simple , Use the existing functions of the database

(2) Can guarantee uniqueness

(3) Database generated id Can guarantee the incremental

(4) Fixed step size

shortcoming :

(1) Availability is hard to guarantee : The common database architecture is one master and many slaves + Read / write separation , Generate auto increment ID It's a request , You can't play when the main library hangs up

(2) Expandability of , There's an upper limit to performance : Because writing is a single point , The write performance of the main database determines ID The upper bound of the generation performance , And it's hard to expand

How to improve :

(1) Add the main library , Avoid writing single dots

(2) Data horizontal segmentation , Ensure that each main library generates ID No repetition

As shown above , from 1 A write library becomes 3 Write Libraries , Each write library has a different auto_increment Initial value , And the same growth step , To ensure that each database generates ID Is different ( The library in the above figure 0 Generate 0,3,6,9…, library 1 Generate 1,4,7,10, library 2 Generate 2,5,8,11…)

The improved architecture ensures usability , But the disadvantage is :

(1) Lost ID Generated “ Absolute increasement ”: First visit the Library 0 Generate 0,3, Visit the library again 1 Generate 1, It can lead to... In a very short time ,ID Generation is not absolutely incremental ( It's not a big problem , Our goal is to increase the trend , It's not an absolute increase )

(2) The database is still under a lot of pressure , Each generation ID Access to the database

To solve these two problems , The following common solutions are introduced

 

2、 Batch build ID

Generate multiple on-demand batches at a time ID, Every student needs to access the database , Modify the database to the largest ID value , And record the current value and maximum value in memory .

Distributed systems are difficult , One of the important reasons is that “ There is no global clock , It's hard to guarantee absolute timing ”, To ensure absolute timing , Or can we only use single point service , Use the local clock to guarantee “ Absolute timing ”. Database writing pressure is high , Because every time you generate ID All accessed the database , You can use batch method to reduce the pressure of database writing .

         As shown above , The database uses dual master Guaranteed availability , Only the current ID The maximum of , for example 0.ID The generation Service assumes that each batch pull 6 individual ID, Services access databases , Will the current ID The maximum value of is changed to 5, In this way, application access ID Generate service request ID,ID The build service does not need to access the database every time , You can distribute them in turn 0,1,2,3,4,5 these ID 了 , When ID After sending , then ID The maximum value of is changed to 11, And then it can be distributed again 6,7,8,9,10,11 these ID 了 , So the pressure on the database is reduced to the original 1/6 了 .

advantage :

(1) To ensure the ID Absolute increasing order of generation

(2) Greatly reduce the pressure of the database ,ID It can generate tens of thousands of them per second

shortcoming :

(1) The service is still single point

(2) If the service hangs up , After the service is restarted , Continue to build ID It could be discontinuous , There's a hole in the middle ( The service memory keeps 0,1,2,3,4,5, In the database max-id yes 5, Assigned to 3 when , The service is restarted , Next time I will 6 Assigned starting ,4 and 5 It's empty , But it's not a big problem )

(3) Although it can generate hundreds of thousands per second ID, But after all, there is a performance limit , No horizontal expansion

How to improve :

The common high availability optimization scheme for single point service is “ Standby service ”, Also called “ Shadow service ”, So we can optimize these disadvantages in the following ways (1):

 

 

Pictured above , The external service is the main service , There is a shadow service in standby at all times , When the main service is down, the shadow service is on top . This switching process is transparent to the caller , Can be done automatically , The common technique is vip+keepalived, The details are not here .

3、UUID

The core idea of the algorithm is to combine the network card of the machine 、 Local time 、 A random number to generate UUID.

UUID Generate... For the local ID Methods , High performance , Low delay .uuid Is a common solution :string ID =GenUUID();

advantage :

(1) Locally generated ID, There is no need to make a remote call , Low latency , There is no high availability risk

(2) Good scalability , Basically, there is no upper performance limit

shortcoming :

(1) Increasing trend cannot be guaranteed

(2)uuid Too long , It's often represented as a string , It is inefficient to build index as primary key , The common optimization scheme is “ Into two uint64 Integer storage ” perhaps “ Half storage ”( Half can't guarantee uniqueness )

4、 Take the current milliseconds

uuid It's a local algorithm , High generation performance , But there is no guarantee that the trend will increase , And as a string ID Low retrieval efficiency , Is there a local algorithm that can guarantee increment ?

Taking the current number of milliseconds is a common solution :uint64 ID = GenTimeMS();

advantage :

(1) Locally generated ID, There is no need to make a remote call , Low latency

(2) Generated ID The trend is increasing

(3) Generated ID Is an integer , After the index is established, the query efficiency is high

shortcoming :

(1) If the concurrency exceeds 1000, Duplicate will be generated ID

I went to , This shortcoming is fatal , There is no guarantee ID Uniqueness . Of course , Using microseconds can reduce the probability of conflict , But it can only generate at most per second 1000000 individual ID, If there are more, there will be conflicts , So using microseconds doesn't fundamentally solve the problem .

5、Redis Generate ID

Redis All command operations of are single threaded , It provides the image incr and increby Such a self increasing atomic order , So it's guaranteed that ID It must be the only orderly .

  • advantage : Database independent , Flexible and convenient , And its performance is better than that of database ; Numbers ID Natural sequencing , It's very helpful for paging or sorting results .
  • shortcoming : If there is no Redis, New components need to be introduced , Increase system complexity ; The amount of coding and configuration required is large .

Considering the performance bottleneck of single node , have access to Redis Cluster to get higher throughput . If a cluster has 5 platform Redis. Each can be initialized Redis The values of 1, 2, 3, 4, 5, And then the steps are 5. each Redis Generated ID by :

A:1, 6, 11, 16, 21
B:2, 7, 12, 17, 22
C:3, 8, 13, 18, 23
D:4, 9, 14, 19, 24
E:5, 10, 15, 20, 25

Decide which machine you want to load , It's hard to make changes in the future . The step size and initial value must be determined in advance . Use Redis Clusters can also be used for single point of failure problems .

in addition , It's more suitable for Redis To generate every day from 0 The serial number of the beginning . Like the order number = date + Since the growth of the day . Can be in every day Redis Generate a Key , Use INCR Add up .

6、 class snowflake Algorithm

snowflake yes twitter Open source distributed ID generating algorithm , The central idea is this : One long Type ID, Use among them 41bit As milliseconds ,10bit As machine number ,12bit As a sequence number in milliseconds . Theoretically, this algorithm can generate at most 1000*(2^12), That is to say 400W Of ID, Fully meet the needs of the business .

reference snowflake Thought , Combine the business logic and concurrency of each company , You can implement your own distributed ID generating algorithm .

give an example , Suppose a company ID The requirements of the generator service are as follows :

(1) The peak concurrency of a single machine is less than 1W, Is expected in the future 5 The annual peak concurrency of single machine is less than 10W

(2) Yes 2 A computer room , Is expected in the future 5 The annual number of machine rooms is less than 4 individual

(3) The number of machines in each machine room is less than 100 platform

(4) There are 5 Business lines have ID Generating requirements , It is estimated that the number of business lines in the future will be less than 10 individual

(5)…

The analysis process is as follows :

(1) High order fetching 2016 year 1 month 1 The number of milliseconds from today ( Suppose the system ID The generator service goes online after this time ), Assume that the system is running at least 10 year , At least 10 year *365 God *24 Hours *3600 second *1000 millisecond =320*10^9, Almost reserved 39bit Give milliseconds

(2) The peak concurrency of a single machine per second is less than 10W, That is, the average peak concurrency of a single machine per millisecond is less than 100, Almost reserved 7bit Give the serial number every millisecond

(3)5 The number of machine rooms in the year is less than 4 individual , reserve 2bit Mark the machine room

(4) Each machine room is less than 100 Taiwan machine , reserve 7bit Identify the server in each machine room

(5) The line of business is less than 10 individual , reserve 4bit Identify the business line

Designed like this 64bit identification , Can guarantee :

(1) Each line of business 、 Every computer room 、 Generated by each machine ID It's all different

(2) The same machine , Generated in every millisecond ID It's all different

(3) The same machine , In the same millisecond , Distinguish guaranteed generated by serial number ID Is different

(4) Put the number of milliseconds in the highest order , Guaranteed to be generated ID It's an increasing trend

shortcoming :

(1) because “ There is no global clock ”, Each server is assigned ID It's absolutely increasing , But on the whole , Generated ID It's just an increasing trend ( Some servers are early , Some servers are late )

The last one that's easy to ignore :

Generated ID, for example message-id/ order-id/ tiezi-id, When the amount of data is large, it is often necessary to divide the database and table , these ID It is often used as the basis for taking module, dividing library and dividing table , In order to divide the database and table, the data is even ,ID Generation often has “ Randomness of moduli ” The needs of , So we usually put the sequence number per second in ID At the bottom of the list , Guaranteed to be generated ID Is random .

And if , We're crossing milliseconds , The serial number always goes to 0, It makes the serial number 0 Of ID More , Resulting in ID Uneven after mold removal . The solution is , The serial number doesn't always go to 0, It's one 0 To 9 The random number , This place .

原网站

版权声明
本文为[Qin Tian]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207062341084590.html