当前位置:网站首页>In distributed scenarios, do you know how to generate unique IDs?
In distributed scenarios, do you know how to generate unique IDs?
2022-06-30 18:18:00 【Java notes shrimp】
Click on the official account , utilize Fragment time to learn
Preface
In the scenario of complex distributed system and huge amount of data , It is generally necessary to uniquely identify a large amount of data .
such as :
After the database is divided into databases and tables, a unique ID To identify a piece of data .
nosql Data in , Need a unique ID Associate with data from other data sources
This paper compares and summarizes several common methods , You can refer to .
I often use... In practical projects ksuid Algorithm . It's simple and reliable , You can also sort by time .
only ID Generation rule requirements
Globally unique
The trend is increasing
stay MySQL Of InnoDB Used in the engine Btree Data structure to store indexes , On the primary key, we should try to use the ordered primary key to ensure the write performance .
Information security
If ID It's continuous , Malicious user numbers can be based on id Directly know our daily data volume , And the reptile can follow id Easily crawl all the data in sequence
It is better to include a timestamp
Can from ID I know this distributed ID When was it generated
only ID Generating system requirements
High availability
The server should ensure that 99.999% Can normally create a unique ID
Low latency
Accept a get unique ID Request , The server needs to respond very quickly
high QPS
Such as the concurrent 10 Million to create a unique ID request , The server can be successfully created in a short time 10 Ten thousand unique ID
Five ways to get unique ID Solutions for
1.UUID
V1 : Based on timestamps + mac Address
V2 : Based on timestamps + mac Address + POSIX Of UID or GID.
V3 : Namespace based MD5
V4 : Based on random numbers
V5 : SHA1 Version of V3
General selection V4 edition ,V1 There is exposure mac The risk of address ,V2 Only for specific scenarios ,V3、V5 The same input parameters result in the same UUID
Advantages and disadvantages
advantage : Simple and reliable
shortcoming : Not sortable , Not conducive to retrieval
2.mysql Self increasing id
apply mysql Self increasing of id Mechanism , Satisfy Incremental 、 monotonicity 、 Uniqueness .
In the case of single machine , If the concurrency is high ,mysql There will be a lot of pressure .
In a distributed situation , It is generally necessary to set the initial value of each machine ID, To avoid ID repeat . This approach has limitations , And the horizontal expansion scheme is complex , Easy to have a problem .
Advantages and disadvantages
advantage : Simple and reliable , On stand-alone 、 Low concurrency 、 It is applicable when the amount of data is small
shortcoming : In the sub database and sub table 、 Not applicable in high concurrency scenarios
3.redis
because Redis A single thread , Born to guarantee atomicity , You can use atomic operations INCR and INCRBY To achieve
Advantages and disadvantages of stand-alone and distributed systems mysql similar
4.snowflake
snowflake By Twitter Open source is a distributed unique ID The algorithm of
The algorithm structure

snowflake from 4 Part of it is made up of :
The first part :
The highest bit in binary is the sign bit ,1 A negative number ,0 It means a positive number . Generated ID It's usually an integer , So the highest position is fixed to 0.
The second part :
41 Bit timestamp bit , Millisecond timestamps 41 Bits can represent 2^41 - 1 millisecond ≈ 69 year , That is, at most 69 year .
The third part :
use 10 Bit to record the working machine ID, Can be deployed at most 2^10 = 1024 Nodes ,
The fourth part :
use 12 Bit to record the serial number , At most 2^12 = 4096 A serial number .
My understanding of the core logic implementation demo
demo Only the core algorithm content is implemented in , You can intuitively understand the implementation logic of the algorithm through the code
const (
epoch = 1640966400000 // Starting time 2022-01-01 00:00:00, You can use 69 year
timeBits = uint8(41) // Time bits
workerBits = uint8(10) // machine id digit
seqBits = uint8(12) // Sequence bits
workerIdMax = -1 ^ (-1 << workerBits) // The biggest machine id
seqMax = -1 ^ (-1 << seqBits) // Maximum sequence value
timeShift = workerBits + seqBits // Time offset bits
workerShift = seqBits // Machine offset digits
)
type Snowflake struct {
sync.Mutex
epoch time.Time
timestamp int64
workerId int64
seq int64
}
func NewSnowflake(workId int64) (*Snowflake, error) {
if workId < 0 || workId > workerIdMax {
return nil, fmt.Errorf("workId Range 0 - %d", workerIdMax)
}
s := &Snowflake{workerId: workId}
return s, nil
}
func (s *Snowflake) Generate() int64 {
s.Lock()
defer s.Unlock()
now := time.Now().UnixMilli()
if now == s.timestamp {
// In the same millisecond , Sequence growth
s.seq = (s.seq + 1) & seqMax
if s.seq == 0 {
// & seqMax == 0 when , The sequence has been used up , Wait for the next millisecond
for now <= s.timestamp {
now = time.Now().UnixMilli()
}
}
} else {
s.seq = 0
}
s.timestamp = now
t := s.timestamp - epoch
return t<<timeShift | s.workerId<<workerShift | s.seq
}advantage :
A single machine can generate... Within one millisecond 4096 Unique ID
Because the highest bit is a timestamp , therefore snowflake Generated ID They all increase with time
Because there is workerId To make a distinction , So there will be no duplication in the whole distributed system ID
The biggest problem : Clock back
snowflake Very dependent on the consistency of system time , If a system time callback occurs , change , It could happen id Duplication
Here are some solutions I have summarized :
Simple and crude , Throw an error directly , Let the business layer solve
Turn off server time synchronization
Save the past hour , Sequence number usage per millisecond . If the time goes back to a certain millisecond , You can use this millisecond sequence number , Continue to build ID
Generate ID Time for , Do not follow the server time in real time , When 1 All serial numbers in milliseconds are used up , Just jump to the next millisecond . If you generate ID The concurrency of is small , There is a lot of time left unused , Even if the clock goes back , It is also a time that is not used .
Advantages and disadvantages
advantage : Generated ID The trend is increasing , High generation efficiency , Guaranteed non repetition
shortcoming : It is complicated to deal with the problem of clock callback , Be prone to problems
5.ksuid
The algorithm structure

ksuid It's made up of two parts
The first part
32 Second timestamps of bits
The second part
128 Bit randomly generated load
advantage :
Because the highest bit is a timestamp , therefore snowflake Generated ID They all increase with time
and 128 Digit number space , The probability of a random collision in one second is very low ,1/2^128 About equal to the probability that the meteorite will hit the earth tomorrow
No serial number can avoid snowflake Clock callback problem
Advantages and disadvantages
advantage : Generated ID The trend is increasing , High generation efficiency , There is no clock back problem
shortcoming : There are random parts , Theoretically, there is the possibility of random collision
ending
Contrast 5 Kind of solution . In my business scenario , I choose simple and reliable ksuid Algorithm to generate unique ID.
source :blog.csdn.net/h1993726/article/
details/124020328
recommend :
The most comprehensive java Interview question bank
PS: Because the official account platform changed the push rules. , If you don't want to miss the content , Remember to click after reading “ Looking at ”, Add one “ Star standard ”, In this way, each new article push will appear in your subscription list for the first time . spot “ Looking at ” Support us !边栏推荐
- ASP. Net authentication code login
- [cloud resident co creation] Huawei iconnect enables IOT terminals to connect at one touch
- Distributed machine learning: model average Ma and elastic average easgd (pyspark)
- 5g has been in business for three years. Where will innovation go in the future?
- 零基础也能做Apple大片!这款免费工具帮你渲染、做特效、丝滑展示
- Redis (VI) - master-slave replication
- News management system based on SSM
- Only black-and-white box test is required for test opening post? No, but also learn performance test
- K-line diagram interpretation and practical application skills (see position entry)
- 漏洞复现----38、ThinkPHP5 5.0.23 远程代码执行漏洞
猜你喜欢

Fragmentary knowledge points of MySQL

Conception d'un centre commercial en ligne basé sur SSH

每日面试1题-如何防止CDN防护被绕过

Daily interview 1 question - how to prevent CDN protection from being bypassed

MIT科技评论2022年35岁以下创新者名单发布,含AlphaFold作者等

墨天轮沙龙 | 清华乔嘉林:Apache IoTDB,源于清华,建设开源生态之路

又一篇CVPR 2022论文被指抄袭,平安保险研究者控诉IBM苏黎世团队

基于SSH的网上商城设计

Mo Tianlun salon | Tsinghua qiaojialin: Apache iotdb, originated from Tsinghua, is building an open source ecological road

Redis (I) - data type
随机推荐
[Netease Yunxin] playback demo build: unable to convert parameter 1 from "asyncmodalrunner *" to "std:: nullptr\u T"**
后渗透之文件系统+上传下载文件
Redis (IV) - delete policy
应届生毕业之后先就业还是先择业?
【二叉树】前序遍历构造二叉搜索树
同济、阿里的CVPR 2022最佳学生论文奖研究了什么?这是一作的解读
Alexnet of CNN classic network (Theory)
MIT科技评论2022年35岁以下创新者名单发布,含AlphaFold作者等
Word中添加代码块(转载)
Taishan Office Technology Lecture: how to align and draw words of different sizes on the same line
[bjdctf2020]the mystery of ip|[ciscn2019 southeast China division]web11|ssti injection
uni-app进阶之内嵌应用【day14】
vue3 响应式数据库—— reactive
MySQL reports that the column timestamp field cannot be null
Mo Tianlun salon | Tsinghua qiaojialin: Apache iotdb, originated from Tsinghua, is building an open source ecological road
Deep understanding of JVM (IV) - garbage collection (I)
4 years of working experience, and you can't tell the five communication modes between multithreads. Can you believe it?
每日面试1题-蓝队基础面试题-应急响应(1)应急响应基本思路流程+Windows入侵排查思路
NFT: 开启加密艺术时代的无限可能
元宇宙带来的游戏变革会是怎样的?