当前位置：网站首页>In distributed scenarios, do you know how to generate unique IDs?

In distributed scenarios, do you know how to generate unique IDs?

2022-06-30 18:18:00 【Java notes shrimp】

Click on the official account , utilize Fragment time to learn

Preface

In the scenario of complex distributed system and huge amount of data , It is generally necessary to uniquely identify a large amount of data .

such as ：

After the database is divided into databases and tables, a unique ID To identify a piece of data .
nosql Data in , Need a unique ID Associate with data from other data sources

This paper compares and summarizes several common methods , You can refer to .

I often use... In practical projects ksuid Algorithm . It's simple and reliable , You can also sort by time .

only ID Generation rule requirements

Globally unique
The trend is increasing

stay MySQL Of InnoDB Used in the engine Btree Data structure to store indexes , On the primary key, we should try to use the ordered primary key to ensure the write performance .

Information security

If ID It's continuous , Malicious user numbers can be based on id Directly know our daily data volume , And the reptile can follow id Easily crawl all the data in sequence

It is better to include a timestamp

Can from ID I know this distributed ID When was it generated

only ID Generating system requirements

High availability

The server should ensure that 99.999% Can normally create a unique ID

Low latency

Accept a get unique ID Request , The server needs to respond very quickly

high QPS

Such as the concurrent 10 Million to create a unique ID request , The server can be successfully created in a short time 10 Ten thousand unique ID

Five ways to get unique ID Solutions for

1.UUID

V1 : Based on timestamps + mac Address
V2 : Based on timestamps + mac Address + POSIX Of UID or GID.
V3 : Namespace based MD5
V4 : Based on random numbers
V5 : SHA1 Version of V3

General selection V4 edition ,V1 There is exposure mac The risk of address ,V2 Only for specific scenarios ,V3、V5 The same input parameters result in the same UUID

Advantages and disadvantages

advantage ： Simple and reliable
shortcoming ： Not sortable , Not conducive to retrieval

2.mysql Self increasing id

apply mysql Self increasing of id Mechanism , Satisfy Incremental 、 monotonicity 、 Uniqueness .

In the case of single machine , If the concurrency is high ,mysql There will be a lot of pressure .
In a distributed situation , It is generally necessary to set the initial value of each machine ID, To avoid ID repeat . This approach has limitations , And the horizontal expansion scheme is complex , Easy to have a problem .

Advantages and disadvantages

advantage ： Simple and reliable , On stand-alone 、 Low concurrency 、 It is applicable when the amount of data is small
shortcoming ： In the sub database and sub table 、 Not applicable in high concurrency scenarios

3.redis

because Redis A single thread , Born to guarantee atomicity , You can use atomic operations INCR and INCRBY To achieve

Advantages and disadvantages of stand-alone and distributed systems mysql similar

4.snowflake

snowflake By Twitter Open source is a distributed unique ID The algorithm of

The algorithm structure

snowflake from 4 Part of it is made up of ：

The first part ：

The highest bit in binary is the sign bit ,1 A negative number ,0 It means a positive number . Generated ID It's usually an integer , So the highest position is fixed to 0.

The second part :

41 Bit timestamp bit , Millisecond timestamps 41 Bits can represent 2^41 - 1 millisecond ≈ 69 year , That is, at most 69 year .

The third part ：

use 10 Bit to record the working machine ID, Can be deployed at most 2^10 = 1024 Nodes ,

The fourth part ：

use 12 Bit to record the serial number , At most 2^12 = 4096 A serial number .

My understanding of the core logic implementation demo

demo Only the core algorithm content is implemented in , You can intuitively understand the implementation logic of the algorithm through the code

const (
   epoch       = 1640966400000           //  Starting time  2022-01-01 00:00:00, You can use 69 year 
   timeBits    = uint8(41)               //  Time bits 
   workerBits  = uint8(10)               //  machine id digit 
   seqBits     = uint8(12)               //  Sequence bits 
   workerIdMax = -1 ^ (-1 << workerBits) // The biggest machine id
   seqMax      = -1 ^ (-1 << seqBits)    // Maximum sequence value 
   timeShift   = workerBits + seqBits    // Time offset bits 
   workerShift = seqBits                 // Machine offset digits 
)

type Snowflake struct {
   sync.Mutex
   epoch     time.Time
   timestamp int64
   workerId  int64
   seq       int64
}

func NewSnowflake(workId int64) (*Snowflake, error) {
   if workId < 0 || workId > workerIdMax {
      return nil, fmt.Errorf("workId  Range  0 - %d", workerIdMax)
   }

   s := &Snowflake{workerId: workId}
   return s, nil
}

func (s *Snowflake) Generate() int64 {
   s.Lock()
   defer s.Unlock()

   now := time.Now().UnixMilli()
   if now == s.timestamp {
      //  In the same millisecond , Sequence growth 
      s.seq = (s.seq + 1) & seqMax
      if s.seq == 0 {
         // & seqMax == 0  when , The sequence has been used up , Wait for the next millisecond 
         for now <= s.timestamp {
            now = time.Now().UnixMilli()
         }
      }
   } else {
      s.seq = 0
   }

   s.timestamp = now
   t := s.timestamp - epoch
   
   return t<<timeShift | s.workerId<<workerShift | s.seq
}

advantage ：

A single machine can generate... Within one millisecond 4096 Unique ID
Because the highest bit is a timestamp , therefore snowflake Generated ID They all increase with time
Because there is workerId To make a distinction , So there will be no duplication in the whole distributed system ID

The biggest problem ： Clock back

snowflake Very dependent on the consistency of system time , If a system time callback occurs , change , It could happen id Duplication

Here are some solutions I have summarized ：

Simple and crude , Throw an error directly , Let the business layer solve
Turn off server time synchronization
Save the past hour , Sequence number usage per millisecond . If the time goes back to a certain millisecond , You can use this millisecond sequence number , Continue to build ID
Generate ID Time for , Do not follow the server time in real time , When 1 All serial numbers in milliseconds are used up , Just jump to the next millisecond . If you generate ID The concurrency of is small , There is a lot of time left unused , Even if the clock goes back , It is also a time that is not used .

Advantages and disadvantages

advantage ： Generated ID The trend is increasing , High generation efficiency , Guaranteed non repetition
shortcoming ： It is complicated to deal with the problem of clock callback , Be prone to problems

5.ksuid

The algorithm structure

ksuid It's made up of two parts

The first part

32 Second timestamps of bits

The second part

128 Bit randomly generated load

advantage ：

Because the highest bit is a timestamp , therefore snowflake Generated ID They all increase with time
and 128 Digit number space , The probability of a random collision in one second is very low ,1/2^128 About equal to the probability that the meteorite will hit the earth tomorrow
No serial number can avoid snowflake Clock callback problem

Advantages and disadvantages

advantage ： Generated ID The trend is increasing , High generation efficiency , There is no clock back problem
shortcoming ： There are random parts , Theoretically, there is the possibility of random collision

ending

Contrast 5 Kind of solution . In my business scenario , I choose simple and reliable ksuid Algorithm to generate unique ID.

source ：blog.csdn.net/h1993726/article/

details/124020328

 recommend ：

 The most comprehensive java Interview question bank 


PS： Because the official account platform changed the push rules. , If you don't want to miss the content , Remember to click after reading “ Looking at ”, Add one “ Star standard ”, In this way, each new article push will appear in your subscription list for the first time . spot “ Looking at ” Support us ！

原网站

版权声明
本文为[Java notes shrimp]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/181/202206301644108324.html