当前位置:网站首页>Common solutions for distributed ID - take one
Common solutions for distributed ID - take one
2022-07-28 08:05:00 【On knowledge】
author : On knowledge ,CSDN Contracted lecturer ,CSDN Force author , High quality creators in the back-end field , Love to share and create
official account : On knowledge
Be good at the field : Back end full stack engineer 、 Reptiles 、ACM Algorithm
Contact information vx:zsqtcc
She put distributed ID Common solutions are really thorough .
Take it all for him this time
Why? Distributed ID Used so often ? This is mainly because of the large amount of data , High concurrency makes the single database seem inadequate .
Here comes the main dish
be based on sql Database scheme
Database primary key auto increment
This way is relatively simple and straightforward , Through relational database The self incrementing primary key is unique ID.
With MySQL give an example , We can do it in the following way .
. Create a database table .
CREATE TABLE `sequence_id` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`stub` char(10) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `stub` (`stub`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
stub Fields are meaningless , Just for space , It's easy for us to insert or modify data . also , to stub Field creates a unique index , Make sure it's unique .
. adopt replace into To insert data .
BEGIN;
REPLACE INTO sequence_id (stub) VALUES ('stub');
SELECT LAST_INSERT_ID();
COMMIT;
Insert data here , We didn't use insert into But use replace into To insert data , The specific steps are as follows Here comes id The method is to insert if the primary key is self incremented ,insert Will return automatically generated id:
First step : Try inserting data into the table .
The second step : If the primary key or unique index field has a duplicate data error and the insertion fails , First, delete the conflict row with duplicate key values from the table , Then try inserting the data into the table again .
The advantages and disadvantages of this method are also obvious :
advantage : It's easy to implement 、ID Orderly increase 、 Small storage consumption
shortcoming : The amount of concurrency supported is small 、 There is a single point problem with the database ( You can use database cluster to solve , But it adds complexity )、ID No specific business implications 、 safety problem ( For example, according to the order ID The daily order quantity can be calculated according to the law of increasing , Business secrets ! )、 Every time to get ID You have to access the database once ( Increased pressure on the database , The acquisition speed is also slow )
Database segment mode
The mode of database primary key auto increment , Every time to get ID You have to access the database once ,ID When the demand is big , Surely not .
If we could Batch acquisition , And then it's in memory , When it's needed , It's comfortable to take it directly from the memory ! That's what we're talking about Based on the database segment pattern to generate distributed ID.
The number segment mode of database is also a mainstream distributed mode at present ID generation . Like didi open source Tinyid That's how it's done . however ,TinyId Even segment caching is used 、 Add more db Support and other ways to further optimize .
With MySQL give an example , We can do it in the following way .
Create a database table .
CREATE TABLE `sequence_id_generator` (
`id` int(10) NOT NULL,
`current_max_id` bigint(20) NOT NULL COMMENT ' At present, the biggest id',
`step` int(10) NOT NULL COMMENT ' Length of segment No ',
`version` int(20) NOT NULL COMMENT ' Version number ',
`biz_type` int(20) NOT NULL COMMENT ' Business types ',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
current_max_id Fields and step The field is mainly used to get batch data ID, Get the batch number id by : current_max_id ~ current_max_id+step.
version Fields are mainly used to solve concurrency problems ( Optimism lock ),biz_type It is mainly used to represent the business type .
Insert a row of data first .
INSERT INTO `sequence_id_generator` (`id`, `current_max_id`, `step`, `version`, `biz_type`)
VALUES
(1, 0, 100, 0, 101);
adopt SELECT Get the unique batch number under the specified business ID
SELECT `current_max_id`, `step`,`version` FROM `sequence_id_generator` where `biz_type` = 101
result :
id current_max_id step version biz_type
1 0 100 0 101
If it's not enough , Update and then re SELECT that will do .
UPDATE sequence_id_generator SET current_max_id = 0+100, version=version+1 WHERE version = 0 AND `biz_type` = 101
SELECT `current_max_id`, `step`,`version` FROM `sequence_id_generator` where `biz_type` = 101
result :
id current_max_id step version biz_type
1 100 100 1 101
advantage :ID Orderly increase 、 Small storage consumption
shortcoming : There is a single point problem with the database ( You can use database cluster to solve , But it adds complexity )、ID No specific business implications 、 safety problem ( For example, according to the order ID The daily order quantity can be calculated according to the law of increasing , Business secrets !
be based on NoSql Solutions for
Warm reminder : This is a little bit more , Please look down carefully
In general ,NoSQL Scheme use Redis A little more . We go through Redis Of incr Command can be implemented on id The order of atoms is increasing .
127.0.0.1:6379> set sequence_id_biz_type 1
OK
127.0.0.1:6379> incr sequence_id_biz_type
(integer) 2
127.0.0.1:6379> get sequence_id_biz_type
"2"
To improve availability and concurrency , We can use Redis Cluster.Redis Cluster yes Redis Official Redis Clustering solutions (3.0+ edition ).
except Redis Cluster outside , You can also use open source Redis Cluster solution Codis ( It is recommended for large-scale clusters with hundreds of nodes ).
In addition to high availability and concurrency , We know Redis Memory based , We need persistent data , Avoid data loss after machine restart or machine failure .Redis Supports two different ways of persistence : snapshot (snapshotting,RDB)、 Just append files (append-only file, AOF). also ,Redis 4.0 Start supporting RDB and AOF Mixed persistence of ( Off by default , You can use the configuration item aof-use-rdb-preamble Turn on ).
Redis The advantages and disadvantages of the scheme :
advantage : Good performance and generated ID It's increasing in order
shortcoming : Similar to the disadvantages of the database primary key autoincrement scheme
be based on UUID Solutions for
UUID yes Universally Unique Identifier( Universal unique identifier ) Abbreviation .UUID contain 32 individual 16 Hexadecimal Numbers (8-4-4-4-12).
JDK It provides ready-made generation UUID Methods , Just one line of code .
// Output example :cb4a9ede-fa5e-4585-b9bb-d60bce986eaa
UUID.randomUUID()

Let's focus on this Version( edition ), Different versions correspond to UUID The rules of generation are different .
5 Different species Version( edition ) The meaning of each value :
- edition 1 : UUID It's based on time and node ID( Usually MAC Address ) Generate ;
- edition 2 : UUID It's based on the identifier ( It's usually a group or a user ID)、 Time and node ID Generate ;
- edition 3、 edition 5 : edition 5 - deterministic UUID By hashing (hashing) Namespace (namespace) Identifier and name generation ;
- edition 4 : UUID Use randomness or pseudo randomness to generate .
JDK Pass through UUID Of randomUUID() Method generated UUID The default version of is 4.
UUID uuid = UUID.randomUUID();
int version = uuid.version();// 4
in addition ,Variant( variant ) Also have 4 Different values , These values correspond to different meanings . I won't introduce it here , It seems that I don't need much attention at ordinary times .
When it's needed , Check out Wikipedia for UUID Of Variant( variant ) Relevant introduction is enough .
As can be seen from the introduction above ,UUID Can guarantee uniqueness , Because its generating rules include MAC Address 、 Time stamp 、 Namespace (Namespace)、 Random or pseudorandom number 、 Time series and other elements , The computer is based on these rules UUID I'm sure it won't be repeated .
although ,UUID It can be globally unique , however , We rarely use it in general .
For example, use UUID As MySQL Database primary key is very inappropriate :
The database primary key should be as short as possible , and UUID The consumption of storage space is relatively large (32 A string ,128 position ).
UUID It's out of order ,InnoDB Under the engine , The disorder of database primary key will seriously affect the performance of database .
Last , Let's make a brief analysis UUID Advantages and disadvantages ( You may be asked during the interview !) :
advantage : The generation speed is relatively fast 、 Simple and easy to use .
shortcoming : Storage consumes a lot of space (32 A string ,128 position ) 、 unsafe ( be based on MAC Address generation UUID Our algorithm will cause MAC Address leak )、 disorder ( Not self increasing )、 No specific business implications 、 We need to solve the duplication ID problem ( When the machine time is wrong , It can lead to duplication ID).
Solution based on snowflake Algorithm
Snowflake yes Twitter Open source distributed ID generating algorithm .Snowflake from 64 bit It's made up of binary numbers , this 64bit The binary of is divided into several parts , Each part of the stored data has a specific meaning :
The first 0 position : Sign bit ( Mark positive and negative ), Always be 0, of no avail , Never mind .
The first 1~41 position : altogether 41 position , Used to represent a timestamp , In milliseconds , Can support 2 ^41 millisecond ( about 69 year )
The first 42~52 position : altogether 10 position , Generally speaking , front 5 Bit represents the computer room ID, after 5 Bit means machine ID( In the actual project, it can be adjusted according to the actual situation ). In this way, different clusters can be distinguished / The node of the computer room .
The first 53~64 position : altogether 12 position , Used to represent a serial number . The serial number is self incrementing , It represents the maximum output per millisecond of a single machine ID Count (2^12 = 4096), In other words, a single machine can generate at most 4096 individual only ID.
If you want to use Snowflake Algorithmic words , You don't need to make your own wheels . There's a lot based on Snowflake Open source implementation of algorithms, such as meituan Of Leaf、 Baidu UidGenerator, And these open source implementation of the original Snowflake The algorithm is optimized .
in addition , In the actual project , We're also generally interested in Snowflake The algorithm is modified , The most common is in Snowflake Algorithm generated ID Add the business type information in .
Let's see Snowflake Advantages and disadvantages of the algorithm :
advantage : The generation speed is relatively fast 、 Generated ID Orderly increase 、 More flexible ( It can be done to Snowflake Algorithm for simple transformation, such as joining the business ID)
shortcoming : We need to solve the duplication ID problem ( Depending on time , When the machine time is wrong , It can lead to duplication ID{ refer to Clock back problem })
The solution of clock callback
Independent of machine clock drive , There will be no clock back . That is, define an initial timestamp , Self incrementing on the initial timestamp , Do not follow the machine clock to increase . When does the timestamp self increment ? When the serial number increases to the maximum , This time stamp +1, This will not waste the serial number at all , It is suitable for scenarios with large traffic , If the flow is small , Time lapse may occur .
Still rely on the machine clock , If the clock callback range is small , Such as tens of milliseconds , You can wait until time returns to normal ; If the flow is small , The serial number of the first few hundred milliseconds or seconds must be left , You can cache the serial number of the first few hundred milliseconds or seconds , If a clock callback occurs , Just get the serial number from the cache and auto increment .
summary
In addition to the way described above , image ZooKeeper This kind of middleware can also help us generate unique ID. Be sure to choose the most suitable scheme according to the actual project .
Mybatis Collection of selected questions , After watching it, I will
边栏推荐
- 0727~ sorting out interview questions
- DNA脱氧核糖核酸修饰金属铂纳米颗粒PtNPS-DNA|科研试剂
- How do we run batch mode in MySQL?
- Some experience of gd32 using Hal Library of ST and Gd official library
- 【13】 Adder: how to build a circuit like Lego (Part 1)?
- 近红外二区AgzS量子点包裹脱氧核糖核酸DNA|DNA-AgzSQDs(齐岳)
- OpenTSDB-时序数据库
- Chapter 01 introduction of [notes of Huashu]
- flowable工作流所有业务概念
- Opentsdb time series database
猜你喜欢

DNA-Ag2SQDs脱氧核糖核酸DNA修饰硫化银Ag2S量子点的合成方法

使用FFmpeg来批量生成单图+单音频的一图流视频

It has been rectified seven times and took half a month. Painful EMC summary

The underlying principles of RDB persistence and AOF persistence of redis

CLion调试redis6源码

DNA修饰金属铑Rh纳米颗粒RhNPS-DNA(DNA修饰贵金属纳米颗粒)

Discrimination coverage index / index coverage / Samsung index

These mobile security browsers are more than a little easy to use

记录一次mycat连接Communications link failure问题解决

Don't be afraid of ESD static electricity. This article tells you some solutions
随机推荐
【17】 Establish data path (upper): instruction + operation =cpu
DNA-Ag2SQDs脱氧核糖核酸DNA修饰硫化银Ag2S量子点的合成方法
Adjust the array order so that odd numbers precede even numbers - two questions per day
Record a MYCAT connection and solve the problems of communications link failure
EMC设计攻略 —时钟
JUC原子类: CAS, Unsafe、CAS缺点、ABA问题如何解决详解
Understanding of spark operator aggregatebykey
Disassemble Huawei switches and learn Basic EMC operations
Forward propagation of deep learning neural networks (1)
MySQL query error [err] 1046 - no database selected
【13】 Adder: how to build a circuit like Lego (Part 1)?
Delete the nodes in the linked list - daily question
Synthesis of dna-ag2sqds DNA modified silver sulfide Ag2S quantum dots
flowable工作流所有业务概念
0727~ sorting out interview questions
DNA deoxyribonucleic acid modified platinum nanoparticles ptnps DNA | scientific research reagent
ArcGIS JS customizes the accessor and uses the watchutils related method to view the attribute
Oracle local network service
mysql:LIKE和REGEXP操作有什么区别?
数据化管理洞悉零售及电子商务运营——数据化管理介绍