当前位置:网站首页>Distributed global ID generation scheme
Distributed global ID generation scheme
2022-07-07 05:40:00 【Qin Tian】
Catalog
One 、 Distributed ID Characteristics of
Two 、 Distributed ID The generation scheme of
4、 Take the current milliseconds
One 、 Distributed ID Characteristics of
- Uniqueness : Make sure that the generated ID It is the only one in the whole network .
- Order and increment : Make sure that the generated ID For a certain user or business, it increases in order according to a certain number .
- High availability : Make sure that you can correctly generate... At any time ID.
- Take time :ID It contains time , A glance at the past will tell which day of the transaction .
Two 、 Distributed ID The generation scheme of
1、 Database autoincrement ID
Using the database id Self increasing strategy , Such as MySQL Of auto_increment. And you can use two databases to set different steps , Generation does not repeat ID To achieve high availability .
advantage :
(1) Simple , Use the existing functions of the database
(2) Can guarantee uniqueness
(3) Database generated id Can guarantee the incremental
(4) Fixed step size
shortcoming :
(1) Availability is hard to guarantee : The common database architecture is one master and many slaves + Read / write separation , Generate auto increment ID It's a request , You can't play when the main library hangs up
(2) Expandability of , There's an upper limit to performance : Because writing is a single point , The write performance of the main database determines ID The upper bound of the generation performance , And it's hard to expand
How to improve :
(1) Add the main library , Avoid writing single dots
(2) Data horizontal segmentation , Ensure that each main library generates ID No repetition
As shown above , from 1 A write library becomes 3 Write Libraries , Each write library has a different auto_increment Initial value , And the same growth step , To ensure that each database generates ID Is different ( The library in the above figure 0 Generate 0,3,6,9…, library 1 Generate 1,4,7,10, library 2 Generate 2,5,8,11…)
The improved architecture ensures usability , But the disadvantage is :
(1) Lost ID Generated “ Absolute increasement ”: First visit the Library 0 Generate 0,3, Visit the library again 1 Generate 1, It can lead to... In a very short time ,ID Generation is not absolutely incremental ( It's not a big problem , Our goal is to increase the trend , It's not an absolute increase )
(2) The database is still under a lot of pressure , Each generation ID Access to the database
To solve these two problems , The following common solutions are introduced
2、 Batch build ID
Generate multiple on-demand batches at a time ID, Every student needs to access the database , Modify the database to the largest ID value , And record the current value and maximum value in memory .
Distributed systems are difficult , One of the important reasons is that “ There is no global clock , It's hard to guarantee absolute timing ”, To ensure absolute timing , Or can we only use single point service , Use the local clock to guarantee “ Absolute timing ”. Database writing pressure is high , Because every time you generate ID All accessed the database , You can use batch method to reduce the pressure of database writing .
As shown above , The database uses dual master Guaranteed availability , Only the current ID The maximum of , for example 0.ID The generation Service assumes that each batch pull 6 individual ID, Services access databases , Will the current ID The maximum value of is changed to 5, In this way, application access ID Generate service request ID,ID The build service does not need to access the database every time , You can distribute them in turn 0,1,2,3,4,5 these ID 了 , When ID After sending , then ID The maximum value of is changed to 11, And then it can be distributed again 6,7,8,9,10,11 these ID 了 , So the pressure on the database is reduced to the original 1/6 了 .
advantage :
(1) To ensure the ID Absolute increasing order of generation
(2) Greatly reduce the pressure of the database ,ID It can generate tens of thousands of them per second
shortcoming :
(1) The service is still single point
(2) If the service hangs up , After the service is restarted , Continue to build ID It could be discontinuous , There's a hole in the middle ( The service memory keeps 0,1,2,3,4,5, In the database max-id yes 5, Assigned to 3 when , The service is restarted , Next time I will 6 Assigned starting ,4 and 5 It's empty , But it's not a big problem )
(3) Although it can generate hundreds of thousands per second ID, But after all, there is a performance limit , No horizontal expansion
How to improve :
The common high availability optimization scheme for single point service is “ Standby service ”, Also called “ Shadow service ”, So we can optimize these disadvantages in the following ways (1):
Pictured above , The external service is the main service , There is a shadow service in standby at all times , When the main service is down, the shadow service is on top . This switching process is transparent to the caller , Can be done automatically , The common technique is vip+keepalived, The details are not here .
3、UUID
The core idea of the algorithm is to combine the network card of the machine 、 Local time 、 A random number to generate UUID.
UUID Generate... For the local ID Methods , High performance , Low delay .uuid Is a common solution :string ID =GenUUID();
advantage :
(1) Locally generated ID, There is no need to make a remote call , Low latency , There is no high availability risk
(2) Good scalability , Basically, there is no upper performance limit
shortcoming :
(1) Increasing trend cannot be guaranteed
(2)uuid Too long , It's often represented as a string , It is inefficient to build index as primary key , The common optimization scheme is “ Into two uint64 Integer storage ” perhaps “ Half storage ”( Half can't guarantee uniqueness )
4、 Take the current milliseconds
uuid It's a local algorithm , High generation performance , But there is no guarantee that the trend will increase , And as a string ID Low retrieval efficiency , Is there a local algorithm that can guarantee increment ?
Taking the current number of milliseconds is a common solution :uint64 ID = GenTimeMS();
advantage :
(1) Locally generated ID, There is no need to make a remote call , Low latency
(2) Generated ID The trend is increasing
(3) Generated ID Is an integer , After the index is established, the query efficiency is high
shortcoming :
(1) If the concurrency exceeds 1000, Duplicate will be generated ID
I went to , This shortcoming is fatal , There is no guarantee ID Uniqueness . Of course , Using microseconds can reduce the probability of conflict , But it can only generate at most per second 1000000 individual ID, If there are more, there will be conflicts , So using microseconds doesn't fundamentally solve the problem .
5、Redis Generate ID
Redis All command operations of are single threaded , It provides the image incr and increby Such a self increasing atomic order , So it's guaranteed that ID It must be the only orderly .
- advantage : Database independent , Flexible and convenient , And its performance is better than that of database ; Numbers ID Natural sequencing , It's very helpful for paging or sorting results .
- shortcoming : If there is no Redis, New components need to be introduced , Increase system complexity ; The amount of coding and configuration required is large .
Considering the performance bottleneck of single node , have access to Redis Cluster to get higher throughput . If a cluster has 5 platform Redis. Each can be initialized Redis The values of 1, 2, 3, 4, 5, And then the steps are 5. each Redis Generated ID by :
A:1, 6, 11, 16, 21
B:2, 7, 12, 17, 22
C:3, 8, 13, 18, 23
D:4, 9, 14, 19, 24
E:5, 10, 15, 20, 25
Decide which machine you want to load , It's hard to make changes in the future . The step size and initial value must be determined in advance . Use Redis Clusters can also be used for single point of failure problems .
in addition , It's more suitable for Redis To generate every day from 0 The serial number of the beginning . Like the order number = date + Since the growth of the day . Can be in every day Redis Generate a Key , Use INCR Add up .
6、 class snowflake Algorithm
snowflake yes twitter Open source distributed ID generating algorithm , The central idea is this : One long Type ID, Use among them 41bit As milliseconds ,10bit As machine number ,12bit As a sequence number in milliseconds . Theoretically, this algorithm can generate at most 1000*(2^12), That is to say 400W Of ID, Fully meet the needs of the business .
reference snowflake Thought , Combine the business logic and concurrency of each company , You can implement your own distributed ID generating algorithm .
give an example , Suppose a company ID The requirements of the generator service are as follows :
(1) The peak concurrency of a single machine is less than 1W, Is expected in the future 5 The annual peak concurrency of single machine is less than 10W
(2) Yes 2 A computer room , Is expected in the future 5 The annual number of machine rooms is less than 4 individual
(3) The number of machines in each machine room is less than 100 platform
(4) There are 5 Business lines have ID Generating requirements , It is estimated that the number of business lines in the future will be less than 10 individual
(5)…
The analysis process is as follows :
(1) High order fetching 2016 year 1 month 1 The number of milliseconds from today ( Suppose the system ID The generator service goes online after this time ), Assume that the system is running at least 10 year , At least 10 year *365 God *24 Hours *3600 second *1000 millisecond =320*10^9, Almost reserved 39bit Give milliseconds
(2) The peak concurrency of a single machine per second is less than 10W, That is, the average peak concurrency of a single machine per millisecond is less than 100, Almost reserved 7bit Give the serial number every millisecond
(3)5 The number of machine rooms in the year is less than 4 individual , reserve 2bit Mark the machine room
(4) Each machine room is less than 100 Taiwan machine , reserve 7bit Identify the server in each machine room
(5) The line of business is less than 10 individual , reserve 4bit Identify the business line
Designed like this 64bit identification , Can guarantee :
(1) Each line of business 、 Every computer room 、 Generated by each machine ID It's all different
(2) The same machine , Generated in every millisecond ID It's all different
(3) The same machine , In the same millisecond , Distinguish guaranteed generated by serial number ID Is different
(4) Put the number of milliseconds in the highest order , Guaranteed to be generated ID It's an increasing trend
shortcoming :
(1) because “ There is no global clock ”, Each server is assigned ID It's absolutely increasing , But on the whole , Generated ID It's just an increasing trend ( Some servers are early , Some servers are late )
The last one that's easy to ignore :
Generated ID, for example message-id/ order-id/ tiezi-id, When the amount of data is large, it is often necessary to divide the database and table , these ID It is often used as the basis for taking module, dividing library and dividing table , In order to divide the database and table, the data is even ,ID Generation often has “ Randomness of moduli ” The needs of , So we usually put the sequence number per second in ID At the bottom of the list , Guaranteed to be generated ID Is random .
And if , We're crossing milliseconds , The serial number always goes to 0, It makes the serial number 0 Of ID More , Resulting in ID Uneven after mold removal . The solution is , The serial number doesn't always go to 0, It's one 0 To 9 The random number , This place .
边栏推荐
- English grammar_ Noun possessive
- SAP ABAP BDC(批量数据通信)-018
- Leetcode 1189 maximum number of "balloons" [map] the leetcode road of heroding
- Codeforces Round #416 (Div. 2) D. Vladik and Favorite Game
- Leakage relay jd1-100
- 实现网页内容可编辑
- Intelligent annotation scheme of entity recognition based on hugging Face Pre training model: generate doccano request JSON format
- Is the human body sensor easy to use? How to use it? Which do you buy between aqara green rice and Xiaomi
- sql查询:将下一行减去上一行,并做相应的计算
- English语法_名词 - 所有格
猜你喜欢
SAP webservice 测试出现404 Not found Service cannot be reached
DOM node object + time node comprehensive case
Leakage relay llj-100fs
How digitalization affects workflow automation
WEB架构设计过程
[PM products] what is cognitive load? How to adjust cognitive load reasonably?
JVM (XX) -- performance monitoring and tuning (I) -- Overview
Use, configuration and points for attention of network layer protocol (taking QoS as an example) when using OPNET for network simulation
什么是消息队列?
随机生成session_id
随机推荐
Senior programmers must know and master. This article explains in detail the principle of MySQL master-slave synchronization, and recommends collecting
说一说MVCC多版本并发控制器?
基于 hugging face 预训练模型的实体识别智能标注方案:生成doccano要求json格式
5. Data access - entityframework integration
Flink SQL realizes reading and writing redis and dynamically generates hset key
CVE-2021-3156 漏洞复现笔记
WEB架构设计过程
JVM(十九) -- 字节码与类的加载(四) -- 再谈类的加载器
论文阅读【Semantic Tag Augmented XlanV Model for Video Captioning】
1.AVL树:左右旋-bite
实现网页内容可编辑
How can professional people find background music materials when doing we media video clips?
Let f (x) = Σ x^n/n^2, prove that f (x) + F (1-x) + lnxln (1-x) = Σ 1/n^2
Simple case of SSM framework
利用OPNET进行网络任意源组播(ASM)仿真的设计、配置及注意点
nodejs获取客户端ip
Make web content editable
分布式事务解决方案之TCC
分布式事务介绍
[Oracle] simple date and time formatting and sorting problem