当前位置：网站首页>Common solutions for distributed ID - take one

Common solutions for distributed ID - take one

2022-07-28 08:05:00 【On knowledge】

author ： On knowledge ,CSDN Contracted lecturer ,CSDN Force author , High quality creators in the back-end field , Love to share and create
official account ： On knowledge
Be good at the field ： Back end full stack engineer 、 Reptiles 、ACM Algorithm
Contact information vx：zsqtcc

She put distributed ID Common solutions are really thorough .

Take it all for him this time
Why? Distributed ID Used so often ？ This is mainly because of the large amount of data , High concurrency makes the single database seem inadequate .

Here comes the main dish

be based on sql Database scheme

Database primary key auto increment

This way is relatively simple and straightforward , Through relational database The self incrementing primary key is unique ID.
With MySQL give an example , We can do it in the following way .
. Create a database table .

CREATE TABLE `sequence_id` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `stub` char(10) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `stub` (`stub`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

stub Fields are meaningless , Just for space , It's easy for us to insert or modify data . also , to stub Field creates a unique index , Make sure it's unique .

. adopt replace into To insert data .

BEGIN;
REPLACE INTO sequence_id (stub) VALUES ('stub');
SELECT LAST_INSERT_ID();
COMMIT;

Insert data here , We didn't use insert into But use replace into To insert data , The specific steps are as follows Here comes id The method is to insert if the primary key is self incremented ,insert Will return automatically generated id：

First step ： Try inserting data into the table .
The second step ： If the primary key or unique index field has a duplicate data error and the insertion fails , First, delete the conflict row with duplicate key values from the table , Then try inserting the data into the table again .

The advantages and disadvantages of this method are also obvious ：

advantage ： It's easy to implement 、ID Orderly increase 、 Small storage consumption
shortcoming ： The amount of concurrency supported is small 、 There is a single point problem with the database （ You can use database cluster to solve , But it adds complexity ）、ID No specific business implications 、 safety problem （ For example, according to the order ID The daily order quantity can be calculated according to the law of increasing , Business secrets ！）、 Every time to get ID You have to access the database once （ Increased pressure on the database , The acquisition speed is also slow ）

Database segment mode

The mode of database primary key auto increment , Every time to get ID You have to access the database once ,ID When the demand is big , Surely not .

If we could Batch acquisition , And then it's in memory , When it's needed , It's comfortable to take it directly from the memory ！ That's what we're talking about Based on the database segment pattern to generate distributed ID.

The number segment mode of database is also a mainstream distributed mode at present ID generation . Like didi open source Tinyid That's how it's done . however ,TinyId Even segment caching is used 、 Add more db Support and other ways to further optimize .

With MySQL give an example , We can do it in the following way .
Create a database table .

CREATE TABLE `sequence_id_generator` (
  `id` int(10) NOT NULL,
  `current_max_id` bigint(20) NOT NULL COMMENT ' At present, the biggest id',
  `step` int(10) NOT NULL COMMENT ' Length of segment No ',
  `version` int(20) NOT NULL COMMENT ' Version number ',
  `biz_type`    int(20) NOT NULL COMMENT ' Business types ',
   PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

current_max_id Fields and step The field is mainly used to get batch data ID, Get the batch number id by ： current_max_id ~ current_max_id+step.
version Fields are mainly used to solve concurrency problems （ Optimism lock ）,biz_type It is mainly used to represent the business type .

Insert a row of data first .

INSERT INTO `sequence_id_generator` (`id`, `current_max_id`, `step`, `version`, `biz_type`)
VALUES
	(1, 0, 100, 0, 101);

adopt SELECT Get the unique batch number under the specified business ID

SELECT `current_max_id`, `step`,`version` FROM `sequence_id_generator` where `biz_type` = 101

result ：

id current_max_id step version biz_type
1 0 100 0 101

If it's not enough , Update and then re SELECT that will do .

UPDATE sequence_id_generator SET current_max_id = 0+100, version=version+1 WHERE version = 0  AND `biz_type` = 101
SELECT `current_max_id`, `step`,`version` FROM `sequence_id_generator` where `biz_type` = 101

result ：

id current_max_id step version biz_type
1 100 100 1 101

advantage ：ID Orderly increase 、 Small storage consumption
shortcoming ： There is a single point problem with the database （ You can use database cluster to solve , But it adds complexity ）、ID No specific business implications 、 safety problem （ For example, according to the order ID The daily order quantity can be calculated according to the law of increasing , Business secrets ！

be based on NoSql Solutions for

Warm reminder ： This is a little bit more , Please look down carefully
In general ,NoSQL Scheme use Redis A little more . We go through Redis Of incr Command can be implemented on id The order of atoms is increasing .

127.0.0.1:6379> set sequence_id_biz_type 1
OK
127.0.0.1:6379> incr sequence_id_biz_type
(integer) 2
127.0.0.1:6379> get sequence_id_biz_type
"2"

To improve availability and concurrency , We can use Redis Cluster.Redis Cluster yes Redis Official Redis Clustering solutions （3.0+ edition ）.

except Redis Cluster outside , You can also use open source Redis Cluster solution Codis （ It is recommended for large-scale clusters with hundreds of nodes ）.

In addition to high availability and concurrency , We know Redis Memory based , We need persistent data , Avoid data loss after machine restart or machine failure .Redis Supports two different ways of persistence ： snapshot （snapshotting,RDB）、 Just append files （append-only file, AOF）. also ,Redis 4.0 Start supporting RDB and AOF Mixed persistence of （ Off by default , You can use the configuration item aof-use-rdb-preamble Turn on ）.

Redis The advantages and disadvantages of the scheme ：

advantage ： Good performance and generated ID It's increasing in order
shortcoming ： Similar to the disadvantages of the database primary key autoincrement scheme

be based on UUID Solutions for

UUID yes Universally Unique Identifier（ Universal unique identifier ） Abbreviation .UUID contain 32 individual 16 Hexadecimal Numbers （8-4-4-4-12）.

JDK It provides ready-made generation UUID Methods , Just one line of code .

// Output example ：cb4a9ede-fa5e-4585-b9bb-d60bce986eaa
UUID.randomUUID()

Insert picture description here
Let's focus on this Version( edition ), Different versions correspond to UUID The rules of generation are different .
5 Different species Version( edition ) The meaning of each value ：

edition 1 : UUID It's based on time and node ID（ Usually MAC Address ） Generate ;
edition 2 : UUID It's based on the identifier （ It's usually a group or a user ID）、 Time and node ID Generate ;
edition 3、 edition 5 : edition 5 - deterministic UUID By hashing （hashing） Namespace （namespace） Identifier and name generation ;
edition 4 : UUID Use randomness or pseudo randomness to generate .

JDK Pass through UUID Of randomUUID() Method generated UUID The default version of is 4.

UUID uuid = UUID.randomUUID();
int version = uuid.version();// 4

in addition ,Variant( variant ) Also have 4 Different values , These values correspond to different meanings . I won't introduce it here , It seems that I don't need much attention at ordinary times .

When it's needed , Check out Wikipedia for UUID Of Variant( variant ) Relevant introduction is enough .

As can be seen from the introduction above ,UUID Can guarantee uniqueness , Because its generating rules include MAC Address 、 Time stamp 、 Namespace （Namespace）、 Random or pseudorandom number 、 Time series and other elements , The computer is based on these rules UUID I'm sure it won't be repeated .

although ,UUID It can be globally unique , however , We rarely use it in general .

For example, use UUID As MySQL Database primary key is very inappropriate ：

The database primary key should be as short as possible , and UUID The consumption of storage space is relatively large （32 A string ,128 position ）.
UUID It's out of order ,InnoDB Under the engine , The disorder of database primary key will seriously affect the performance of database .
Last , Let's make a brief analysis UUID Advantages and disadvantages （ You may be asked during the interview ！） :
advantage ： The generation speed is relatively fast 、 Simple and easy to use .
shortcoming ： Storage consumes a lot of space （32 A string ,128 position ）、 unsafe （ be based on MAC Address generation UUID Our algorithm will cause MAC Address leak )、 disorder （ Not self increasing ）、 No specific business implications 、 We need to solve the duplication ID problem （ When the machine time is wrong , It can lead to duplication ID）.

Solution based on snowflake Algorithm

Snowflake yes Twitter Open source distributed ID generating algorithm .Snowflake from 64 bit It's made up of binary numbers , this 64bit The binary of is divided into several parts , Each part of the stored data has a specific meaning ：

The first 0 position ： Sign bit （ Mark positive and negative ）, Always be 0, of no avail , Never mind .
The first 1~41 position ： altogether 41 position , Used to represent a timestamp , In milliseconds , Can support 2 ^41 millisecond （ about 69 year ）
The first 42~52 position ： altogether 10 position , Generally speaking , front 5 Bit represents the computer room ID, after 5 Bit means machine ID（ In the actual project, it can be adjusted according to the actual situation ）. In this way, different clusters can be distinguished / The node of the computer room .
The first 53~64 position ： altogether 12 position , Used to represent a serial number . The serial number is self incrementing , It represents the maximum output per millisecond of a single machine ID Count (2^12 = 4096), In other words, a single machine can generate at most 4096 individual only ID.

If you want to use Snowflake Algorithmic words , You don't need to make your own wheels . There's a lot based on Snowflake Open source implementation of algorithms, such as meituan Of Leaf、 Baidu UidGenerator, And these open source implementation of the original Snowflake The algorithm is optimized .

in addition , In the actual project , We're also generally interested in Snowflake The algorithm is modified , The most common is in Snowflake Algorithm generated ID Add the business type information in .

Let's see Snowflake Advantages and disadvantages of the algorithm ：

advantage ： The generation speed is relatively fast 、 Generated ID Orderly increase 、 More flexible （ It can be done to Snowflake Algorithm for simple transformation, such as joining the business ID）
shortcoming ： We need to solve the duplication ID problem （ Depending on time , When the machine time is wrong , It can lead to duplication ID{ refer to Clock back problem }）

The solution of clock callback

Independent of machine clock drive , There will be no clock back . That is, define an initial timestamp , Self incrementing on the initial timestamp , Do not follow the machine clock to increase . When does the timestamp self increment ？ When the serial number increases to the maximum , This time stamp +1, This will not waste the serial number at all , It is suitable for scenarios with large traffic , If the flow is small , Time lapse may occur .

Still rely on the machine clock , If the clock callback range is small , Such as tens of milliseconds , You can wait until time returns to normal ; If the flow is small , The serial number of the first few hundred milliseconds or seconds must be left , You can cache the serial number of the first few hundred milliseconds or seconds , If a clock callback occurs , Just get the serial number from the cache and auto increment .

summary

In addition to the way described above , image ZooKeeper This kind of middleware can also help us generate unique ID. Be sure to choose the most suitable scheme according to the actual project .

Mybatis Collection of selected questions , After watching it, I will

原网站

版权声明
本文为[On knowledge]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/197/202207131646520026.html