当前位置:网站首页>System design: partition or data partition
System design: partition or data partition
2022-06-24 06:59:00 【Xiaochengxin post station】
Definition
Data partition ( Also known as fragmentation ) Is a kind of large database (DB) Technology that breaks down into many smaller parts . It is to split one across multiple computers DB/ Table process , To improve application manageability 、 performance 、 Availability and load balancing .
reason
The reason for data fragmentation is , After reaching a certain scale , It is cheaper to scale horizontally by adding more machines than vertically by adding more powerful servers 、 More feasible .
One 、 Division method
You can use many different scenarios to decide how to decompose an application database into smaller databases . Here are the three most popular scenarios used by various large-scale applications .
A. Horizontal zoning
In this scheme , We put different rows in different tables . for example , If we store different locations in a table , We can confirm that the area code is less than 1000 The location of is stored in a table , The region code is greater than 1000 The location of is stored in a separate table . This is also called Range based sharding , Because we store different ranges of data in different tables .
The key problem with this approach is , If you do not carefully select the range value for slicing , The partition scheme will result in server imbalance . For example, Beijing may have more data than other regions .
B Vertical zones
In this scheme , We divide the data into tables related to specific functions and store them in their own servers . for example , If we are building an application similar to an e-commerce website — We can decide to put the user information on one computer DB Server , The merchant list is placed on another server , The product is placed on the third server .
Vertical partitioning is easy to implement , Less impact on the application . The main problem with this method is , If our application experiences additional growth , Then it may be necessary to further divide the function specific databases on different servers ( for example , A single server cannot handle 1.4 Million users to 100 All metadata queries of 100 million photos )
C Directory based partition
The loosely coupled solution to the problems mentioned in the above solution is to create a lookup service , The service understands the current partition scheme , And take it from DB Abstracted from the access code . therefore , To find the location of a particular data entity , We query to save each tuple key to its DB Mapping directory servers between servers . This loosely coupled approach means that we can perform tasks such as transferring data to the DB Tasks such as adding servers to the pool or changing the partition scheme .
Two 、 Division criteria
A. Key or hash based partitioning ( Hash partition )
Under this scheme , We apply hash functions to some of the key attributes of our stored entities ; This produces the partition number . for example , If we had 100 individual DB The server , And our ID It's a number , Each time a new record is inserted , It will increase by one . In this case , The hash function can be 'ID%100', This will provide us with the ability to store / Read the server number of the record . This approach should ensure uniform distribution of data between servers . The fundamental problem with this approach is , It effectively fixes DB Total number of servers , Because adding a new server means changing the hash function , This will require reallocation of data and service downtime . One way to solve this problem is to use consistent hashes .
B List partition
In this scheme , Each partition is assigned a list of values , So whenever we want to insert a new record , We all see which partition contains our keys , Then store it there . for example , We can decide to live in Iceland 、 The Norwegian 、 The Swedish 、 All users in Finland or Denmark will be stored in partitions in the Nordic countries .
C Cycle partition ( Hash modulus )
This is a very simple strategy , It can ensure the consistency of data distribution . about 'n' Partition ,'i' Tuples are assigned to partitions (i mod n).
D Combined zones
Under this scheme , We combine any of the above partition schemes to design a new scheme . for example , Apply the list partition scheme first , Then apply hash based partitions . Consistent hashing can be thought of as a combination of hashing and list partitioning , Where hashing reduces the key space to a size that can be listed
3、 ... and 、 Segmentation FAQs
On the shard database , There are some additional restrictions on the different operations that can be performed . Most of these limitations are due to the fact that operations that span multiple tables or multiple rows in the same table will no longer run on the same server . Here are some of the limitations and additional complexity of sharding :
A. League table query join And the use of inverse paradigms
Performing a join on a database running on a server is simple , But once a database is partitioned and distributed across multiple computers , It is often not feasible to perform joins across database fragments . Because you have to compile data from multiple servers , Such a connection will not improve performance . A common way to solve this problem is to denormalize the database , So that you can execute previously required joined queries from a single table . Of course , The service must now deal with all the dangers of denormalization , For example, the data is inconsistent .
B Citation integrity
As we can see , It is not feasible to perform cross sharding queries on a partitioned database , Similarly , Enforce data integrity constraints in a fragmented database ( Such as foreign keys ) It can be very difficult .
majority RDBMS Foreign key constraints between databases on different database servers are not supported . This means that applications that require referential integrity on a fragmented database must usually be enforced in the application code . Usually in this case , The application must run regular SQL Job to clear dangling references .
C Repartition
There may be many reasons why we have to change the fragmentation scheme :
1. Uneven data distribution , For example, there are many places in a particular postal code that cannot be put into a database partition .
2. One shard A lot of load , such as DB shard Too many user photo requests processed .
under these circumstances , Or we have to create more DB shard, Or you have to rebalance the existing shard, This means that the partitioning scheme has changed , All existing data is moved to a new location . It is very difficult to do this without causing downtime . Using a scheme similar to directory based partitioning does make the rebalancing experience more enjoyable , But the cost is to increase the complexity of the system and create a new single point of failure ( Find service / database ).
So, based on the theory of Google system design, how to operate the specific practice ? The author has experienced the above process in JD before , The data volume of each table directly affects the data storage volume and performance index according to the description complexity of the table , According to the author's average data volume of a single table at that time 400-700 There is a data skew between million , As well as a single big boost due to business growth 15 More than 100 million data leads to the need to repartition , Specific practice case reference 2018 Articles in MySQL Practice of sub database and sub table .
Reference material
grok_system_design_interview.pdf
边栏推荐
- C language student management system - can check the legitimacy of user input, two-way leading circular linked list
- How long does the domain name filing take and what materials need to be prepared
- FreeRTOS MPU使系统更健壮!
- Koa source code analysis
- 记录--关于virtual studio2017添加报表控件的方法--Reportview控件
- Page Jump and database connection of student management system
- How do I turn off win10 automatic update? What are the good ways?
- What are the audio formats? Can the audio format be converted
- Talk about how to dynamically specify feign call service name according to the environment
- 如何低成本构建一个APP
猜你喜欢

记录--关于virtual studio2017添加报表控件的方法--Reportview控件
![跳跃游戏II[贪心练习]](/img/e4/f59bb1f5137495ea357462100e2b38.png)
跳跃游戏II[贪心练习]

Oracle SQL comprehensive application exercises
![[binary tree] - middle order traversal of binary tree](/img/93/442bdbecb123991dbfbd1e5ecc9d64.png)
[binary tree] - middle order traversal of binary tree

创客教育给教师发展带来的挑战

数据库 存储过程 begin end

RealNetworks vs. 微软:早期流媒体行业之争

Record -- about the method of adding report control to virtual studio2017 -- reportview control

Internet cafe management system and database

记录--关于JSP前台传参数到后台出现乱码的问题
随机推荐
Tencent launched the "reassuring agricultural product plan" to support 100 landmark agricultural product brands!
Open source and innovation
智能视觉组A4纸识别样例
学生管理系统页面跳转及数据库连接
Basic knowledge of wechat applet cloud development literacy chapter (I) document structure
How do I check the IP address? What is an IP address
Asp+access web server reports an error CONN.ASP error 80004005
开源与创新
创客教育给教师发展带来的挑战
Several methods for reinstalling the system:
Five minute run through 3D map demo
Centos7 deploying mysql-5.7
【二叉树】——二叉树中序遍历
When the VPC main network card has multiple intranet IP addresses, the server cannot access the network internally, but the server can be accessed externally. How to solve this problem
Virtual file system
RS485 serial port wiring description of smart lamp post smart gateway
RealNetworks vs. Microsoft: the battle in the early streaming media industry
go 断点续传
SAP实施项目上的内部顾问与外部顾问,相互为难还是相互成就?【英文版】
成为 TD Hero,做用技术改变世界的超级英雄 | 来自 TDengine 社区的邀请函