当前位置:网站首页>System design: partition or data partition
System design: partition or data partition
2022-06-24 06:59:00 【Xiaochengxin post station】
Definition
Data partition ( Also known as fragmentation ) Is a kind of large database (DB) Technology that breaks down into many smaller parts . It is to split one across multiple computers DB/ Table process , To improve application manageability 、 performance 、 Availability and load balancing .
reason
The reason for data fragmentation is , After reaching a certain scale , It is cheaper to scale horizontally by adding more machines than vertically by adding more powerful servers 、 More feasible .
One 、 Division method
You can use many different scenarios to decide how to decompose an application database into smaller databases . Here are the three most popular scenarios used by various large-scale applications .
A. Horizontal zoning
In this scheme , We put different rows in different tables . for example , If we store different locations in a table , We can confirm that the area code is less than 1000 The location of is stored in a table , The region code is greater than 1000 The location of is stored in a separate table . This is also called Range based sharding , Because we store different ranges of data in different tables .
The key problem with this approach is , If you do not carefully select the range value for slicing , The partition scheme will result in server imbalance . For example, Beijing may have more data than other regions .
B Vertical zones
In this scheme , We divide the data into tables related to specific functions and store them in their own servers . for example , If we are building an application similar to an e-commerce website — We can decide to put the user information on one computer DB Server , The merchant list is placed on another server , The product is placed on the third server .
Vertical partitioning is easy to implement , Less impact on the application . The main problem with this method is , If our application experiences additional growth , Then it may be necessary to further divide the function specific databases on different servers ( for example , A single server cannot handle 1.4 Million users to 100 All metadata queries of 100 million photos )
C Directory based partition
The loosely coupled solution to the problems mentioned in the above solution is to create a lookup service , The service understands the current partition scheme , And take it from DB Abstracted from the access code . therefore , To find the location of a particular data entity , We query to save each tuple key to its DB Mapping directory servers between servers . This loosely coupled approach means that we can perform tasks such as transferring data to the DB Tasks such as adding servers to the pool or changing the partition scheme .
Two 、 Division criteria
A. Key or hash based partitioning ( Hash partition )
Under this scheme , We apply hash functions to some of the key attributes of our stored entities ; This produces the partition number . for example , If we had 100 individual DB The server , And our ID It's a number , Each time a new record is inserted , It will increase by one . In this case , The hash function can be 'ID%100', This will provide us with the ability to store / Read the server number of the record . This approach should ensure uniform distribution of data between servers . The fundamental problem with this approach is , It effectively fixes DB Total number of servers , Because adding a new server means changing the hash function , This will require reallocation of data and service downtime . One way to solve this problem is to use consistent hashes .
B List partition
In this scheme , Each partition is assigned a list of values , So whenever we want to insert a new record , We all see which partition contains our keys , Then store it there . for example , We can decide to live in Iceland 、 The Norwegian 、 The Swedish 、 All users in Finland or Denmark will be stored in partitions in the Nordic countries .
C Cycle partition ( Hash modulus )
This is a very simple strategy , It can ensure the consistency of data distribution . about 'n' Partition ,'i' Tuples are assigned to partitions (i mod n).
D Combined zones
Under this scheme , We combine any of the above partition schemes to design a new scheme . for example , Apply the list partition scheme first , Then apply hash based partitions . Consistent hashing can be thought of as a combination of hashing and list partitioning , Where hashing reduces the key space to a size that can be listed
3、 ... and 、 Segmentation FAQs
On the shard database , There are some additional restrictions on the different operations that can be performed . Most of these limitations are due to the fact that operations that span multiple tables or multiple rows in the same table will no longer run on the same server . Here are some of the limitations and additional complexity of sharding :
A. League table query join And the use of inverse paradigms
Performing a join on a database running on a server is simple , But once a database is partitioned and distributed across multiple computers , It is often not feasible to perform joins across database fragments . Because you have to compile data from multiple servers , Such a connection will not improve performance . A common way to solve this problem is to denormalize the database , So that you can execute previously required joined queries from a single table . Of course , The service must now deal with all the dangers of denormalization , For example, the data is inconsistent .
B Citation integrity
As we can see , It is not feasible to perform cross sharding queries on a partitioned database , Similarly , Enforce data integrity constraints in a fragmented database ( Such as foreign keys ) It can be very difficult .
majority RDBMS Foreign key constraints between databases on different database servers are not supported . This means that applications that require referential integrity on a fragmented database must usually be enforced in the application code . Usually in this case , The application must run regular SQL Job to clear dangling references .
C Repartition
There may be many reasons why we have to change the fragmentation scheme :
1. Uneven data distribution , For example, there are many places in a particular postal code that cannot be put into a database partition .
2. One shard A lot of load , such as DB shard Too many user photo requests processed .
under these circumstances , Or we have to create more DB shard, Or you have to rebalance the existing shard, This means that the partitioning scheme has changed , All existing data is moved to a new location . It is very difficult to do this without causing downtime . Using a scheme similar to directory based partitioning does make the rebalancing experience more enjoyable , But the cost is to increase the complexity of the system and create a new single point of failure ( Find service / database ).
So, based on the theory of Google system design, how to operate the specific practice ? The author has experienced the above process in JD before , The data volume of each table directly affects the data storage volume and performance index according to the description complexity of the table , According to the author's average data volume of a single table at that time 400-700 There is a data skew between million , As well as a single big boost due to business growth 15 More than 100 million data leads to the need to repartition , Specific practice case reference 2018 Articles in MySQL Practice of sub database and sub table .
Reference material
grok_system_design_interview.pdf
边栏推荐
- oracle sql综合运用 习题
- Nine unique skills of Huawei cloud low latency Technology
- Spirit information development log (1)
- Jumping game ii[greedy practice]
- leetcode:85. 最大矩形
- Record -- about the problem of garbled code when JSP foreground passes parameters to the background
- 云监控系统 HertzBeat v1.1.0 发布,一条命令开启监控之旅!
- How to register the cloud service platform and what are the advantages of cloud server
- Station B collapsed. Let's talk to the injured programmers
- On BOM and DOM (2): DOM node hierarchy / attributes / Selectors / node relationships / detailed operation
猜你喜欢
![[binary tree] - middle order traversal of binary tree](/img/93/442bdbecb123991dbfbd1e5ecc9d64.png)
[binary tree] - middle order traversal of binary tree

创客教育给教师发展带来的挑战

【愚公系列】2022年6月 ASP.NET Core下CellReport报表工具基本介绍和使用

35 year old crisis? It has become a synonym for programmers

Interpreting top-level design of AI robot industry development

puzzle(019.1)Hook、Gear

Record -- about the problem of garbled code when JSP foreground passes parameters to the background

leetcode:84. The largest rectangle in the histogram

【JUC系列】Executor框架之CompletionFuture

Open source and innovation
随机推荐
Five minute run through 3D map demo
leetcode:84. The largest rectangle in the histogram
[binary number learning] - Introduction to trees
Online font converter what is the meaning of font conversion
数据库 存储过程 begin end
Application configuration management, basic principle analysis
How do I check the IP address? What is an IP address
Domain name purchase method good domain name selection principle
Jumping game ii[greedy practice]
十年
Go breakpoint continuation
If you want to learn programming well, don't recite the code!
In the half year, there were 2.14 million paying users, a year-on-year increase of 62.5%, and New Oriental online launched its private domain
Le système de surveillance du nuage hertzbeat v1.1.0 a été publié, une commande pour démarrer le voyage de surveillance!
The cloud monitoring system hertzbeat V1.1.0 is released, and a command starts the monitoring journey!
What is the role of domain name websites? How to query domain name websites
You have a chance, here is a stage
潞晨科技获邀加入NVIDIA初创加速计划
puzzle(019.1)Hook、Gear
多传感器融合track fusion