当前位置:网站首页>Introduction to data fragmentation
Introduction to data fragmentation
2022-07-07 08:43:00 【Blue sky ⊙ white clouds】
background
Traditional general Data sets Storage to a single node , In performance 、 The three aspects of availability and operation and maintenance cost have been difficult to meet the scenario of massive data .
In terms of performance , because Relational database Most use B+ Index of tree type , When the amount of data exceeds the threshold , The increase of index depth will also make disk access IO More times , This leads to a decline in query performance ; meanwhile , High concurrent access requests also make the centralized database the biggest bottleneck of the system .
In terms of usability , The statelessness of service , It can achieve random expansion with less cost , This will inevitably result in the final pressure of the system falling on the database . And a single data node , Or a simple master-slave architecture , It's getting harder and harder to bear . Database availability , Has become the key to the whole system .
In terms of operation and maintenance costs , When the data in a database instance reaches threshold above , about DBA The operation and maintenance pressure will increase . The time cost of data backup and recovery will become more and more uncontrollable with the amount of data . In general , The threshold of data for a single database instance is 1TB within , It's a reasonable range .
In the case that the traditional relational database can not meet the needs of Internet scenarios , Store data to native, distributed support NoSQL There are more and more attempts . but NoSQL Yes SQL The incompatibility of and the imperfection of the ecosystem , Make them in the game with the relational database has always been unable to complete a fatal blow , However, the status of relational database is still unshakable .
Data fragmentation refers to storing the data stored in a single database in multiple databases or tables according to a certain dimension, so as to improve the performance bottleneck and availability . The effective method of data fragmentation is to divide databases and tables into relational databases . Sub database and sub table can effectively avoid the query bottleneck caused by the amount of data exceeding the tolerable threshold . besides , Sub database can also be used to effectively disperse the single point of access to the database ; Although sub table can't relieve the pressure of database , But it can provide the possibility of transforming distributed transaction into local transaction as much as possible , When it comes to cross database update operations , Distributed transactions tend to complicate problems . Use the multi master and multi slave split mode , Can effectively avoid data single point , So as to improve the availability of data architecture .
The amount of data in each table is kept below the threshold by splitting the data into sub database and sub table , And traffic grooming to deal with high traffic , It's about responding to High concurrency And massive data system . Data fragmentation is divided into vertical fragmentation and horizontal fragmentation .
Vertical slice
The way of business splitting is called vertical segmentation , Also known as vertical split , Its core idea is dedicated to special storage . Before splitting , A database consists of multiple data tables , Each table corresponds to a different business . And after the split , It is to classify the table according to the business , Distributed to different databases , And then spread the pressure to different databases . The figure below shows the business needs , Scheme of vertically slicing user table and order table into different databases .

Vertical segmentation often needs to adjust the architecture and design . Generally speaking , It's too late to cope with the rapid change of Internet business demand ; and , It doesn't really solve the single bottleneck . Vertical splitting can alleviate the problems caused by data volume and access volume , But it can't cure . If after vertical split , The amount of data in the table still exceeds the threshold that a single node can carry , It needs to be further processed by horizontal sectioning .
Horizontal slice
Horizontal segmentation is also called horizontal splitting . Relative to the vertical slice , It no longer classifies data according to business logic , But through a certain field ( Or some fields ), Spread data across multiple libraries or tables according to certain rules , Each slice contains only a part of the data . for example : Slice according to the primary key , Even primary key records are put into 0 library ( Or table ), The record of odd primary key is put into 1 library ( Or table ), As shown below .
select * from t_user where id=1
select * from t_user where id=2
In theory, horizontal slicing breaks through the bottleneck of single machine data processing , And expand relative freedom , Is a standard solution for data fragmentation .
Challenge
Although data fragmentation solves the problem of performance 、 Availability and single point backup and recovery , But distributed architecture gains benefits at the same time , It also introduces new questions .
In the face of such scattered data after fragmentation , It is one of the most important challenges for application development engineers and database administrators to operate on the database . They need to know from which specific database sub tables the data needs to be obtained .
Another challenge is , Can correctly run in a single node database SQL, It doesn't always work correctly in the partitioned database . for example , Sub table results in the modification of table name , Or pagination 、 Sort 、 Incorrect handling of operations such as aggregation grouping .
Cross database transaction is also a thorny issue for distributed database cluster . Reasonable use of sub table , It can reduce the amount of data in a single table , Try to use local transactions , Good at using different tables in the same database can effectively avoid the trouble caused by distributed transactions . In a scenario where cross database transactions cannot be avoided , Some businesses still need to keep transactions consistent . And based on XA Because of the high concurrency of the distributed transaction in the scene of neutral can not meet the needs , Not used on a large scale by internet giants , Most of them use the final consistent flexible transaction instead of the strong consistent transaction .
The goal is
Try to be transparent about the impact of sub database and sub table , Let users try to use the database cluster after horizontal fragmentation just like a database .
边栏推荐
- AVL balanced binary search tree
- Golang compilation constraint / conditional compilation (/ / +build < tags>)
- 求有符号数的原码、反码和补码【C语言】
- Snyk dependency security vulnerability scanning tool
- 23 Chengdu instrument customization undertaking_ Discussion on automatic wiring method of PCB in Protel DXP
- IP地址的类别
- [Yu Yue education] higher vocational English reference materials of Nanjing Polytechnic University
- About using CDN based on Kangle and EP panel
- MySQL introduction - crud Foundation (establishment of the prototype of the idea of adding, deleting, changing and searching)
- 关于基于kangle和EP面板使用CDN
猜你喜欢
随机推荐
[kuangbin] topic 15 digit DP
PLSQL的安装和配置
About using CDN based on Kangle and EP panel
Obsidan之数学公式的输入
Opencv learning note 4 - expansion / corrosion / open operation / close operation
Pvtv2--pyramid vision transformer V2 learning notes
MES system is a necessary choice for enterprise production
国标GB28181协议视频平台EasyGBS新增拉流超时配置
[Chongqing Guangdong education] accounting reference materials of Nanjing University of Information Engineering
Interpolation lookup (two methods)
使用AGC重签名服务前后渠道号信息异常分析
A method for quickly viewing pod logs under frequent tests (grep awk xargs kuberctl)
Greenplum6.x监控软件搭建
数据分析方法论与前人经验总结2【笔记干货】
2-3查找樹
Appeler l'interface du moteur de création du service multimédia de jeu Huawei renvoie le Code d'erreur 1002, le message d'erreur: les paramètres sont l'erreur
SSM integration
Data type - floating point (C language)
POJ - 3616 Milking Time(DP+LIS)
JEditableTable的使用技巧









