当前位置:网站首页>Introduction to data fragmentation
Introduction to data fragmentation
2022-07-07 08:43:00 【Blue sky ⊙ white clouds】
background
Traditional general Data sets Storage to a single node , In performance 、 The three aspects of availability and operation and maintenance cost have been difficult to meet the scenario of massive data .
In terms of performance , because Relational database Most use B+ Index of tree type , When the amount of data exceeds the threshold , The increase of index depth will also make disk access IO More times , This leads to a decline in query performance ; meanwhile , High concurrent access requests also make the centralized database the biggest bottleneck of the system .
In terms of usability , The statelessness of service , It can achieve random expansion with less cost , This will inevitably result in the final pressure of the system falling on the database . And a single data node , Or a simple master-slave architecture , It's getting harder and harder to bear . Database availability , Has become the key to the whole system .
In terms of operation and maintenance costs , When the data in a database instance reaches threshold above , about DBA The operation and maintenance pressure will increase . The time cost of data backup and recovery will become more and more uncontrollable with the amount of data . In general , The threshold of data for a single database instance is 1TB within , It's a reasonable range .
In the case that the traditional relational database can not meet the needs of Internet scenarios , Store data to native, distributed support NoSQL There are more and more attempts . but NoSQL Yes SQL The incompatibility of and the imperfection of the ecosystem , Make them in the game with the relational database has always been unable to complete a fatal blow , However, the status of relational database is still unshakable .
Data fragmentation refers to storing the data stored in a single database in multiple databases or tables according to a certain dimension, so as to improve the performance bottleneck and availability . The effective method of data fragmentation is to divide databases and tables into relational databases . Sub database and sub table can effectively avoid the query bottleneck caused by the amount of data exceeding the tolerable threshold . besides , Sub database can also be used to effectively disperse the single point of access to the database ; Although sub table can't relieve the pressure of database , But it can provide the possibility of transforming distributed transaction into local transaction as much as possible , When it comes to cross database update operations , Distributed transactions tend to complicate problems . Use the multi master and multi slave split mode , Can effectively avoid data single point , So as to improve the availability of data architecture .
The amount of data in each table is kept below the threshold by splitting the data into sub database and sub table , And traffic grooming to deal with high traffic , It's about responding to High concurrency And massive data system . Data fragmentation is divided into vertical fragmentation and horizontal fragmentation .
Vertical slice
The way of business splitting is called vertical segmentation , Also known as vertical split , Its core idea is dedicated to special storage . Before splitting , A database consists of multiple data tables , Each table corresponds to a different business . And after the split , It is to classify the table according to the business , Distributed to different databases , And then spread the pressure to different databases . The figure below shows the business needs , Scheme of vertically slicing user table and order table into different databases .

Vertical segmentation often needs to adjust the architecture and design . Generally speaking , It's too late to cope with the rapid change of Internet business demand ; and , It doesn't really solve the single bottleneck . Vertical splitting can alleviate the problems caused by data volume and access volume , But it can't cure . If after vertical split , The amount of data in the table still exceeds the threshold that a single node can carry , It needs to be further processed by horizontal sectioning .
Horizontal slice
Horizontal segmentation is also called horizontal splitting . Relative to the vertical slice , It no longer classifies data according to business logic , But through a certain field ( Or some fields ), Spread data across multiple libraries or tables according to certain rules , Each slice contains only a part of the data . for example : Slice according to the primary key , Even primary key records are put into 0 library ( Or table ), The record of odd primary key is put into 1 library ( Or table ), As shown below .
select * from t_user where id=1
select * from t_user where id=2
In theory, horizontal slicing breaks through the bottleneck of single machine data processing , And expand relative freedom , Is a standard solution for data fragmentation .
Challenge
Although data fragmentation solves the problem of performance 、 Availability and single point backup and recovery , But distributed architecture gains benefits at the same time , It also introduces new questions .
In the face of such scattered data after fragmentation , It is one of the most important challenges for application development engineers and database administrators to operate on the database . They need to know from which specific database sub tables the data needs to be obtained .
Another challenge is , Can correctly run in a single node database SQL, It doesn't always work correctly in the partitioned database . for example , Sub table results in the modification of table name , Or pagination 、 Sort 、 Incorrect handling of operations such as aggregation grouping .
Cross database transaction is also a thorny issue for distributed database cluster . Reasonable use of sub table , It can reduce the amount of data in a single table , Try to use local transactions , Good at using different tables in the same database can effectively avoid the trouble caused by distributed transactions . In a scenario where cross database transactions cannot be avoided , Some businesses still need to keep transactions consistent . And based on XA Because of the high concurrency of the distributed transaction in the scene of neutral can not meet the needs , Not used on a large scale by internet giants , Most of them use the final consistent flexible transaction instead of the strong consistent transaction .
The goal is
Try to be transparent about the impact of sub database and sub table , Let users try to use the database cluster after horizontal fragmentation just like a database .
边栏推荐
- uniapp 微信小程序监测网络
- National standard gb28181 protocol video platform easygbs adds streaming timeout configuration
- Pvtv2--pyramid vision transformer V2 learning notes
- 数据分片介绍
- 注解@ConfigurationProperties的三种使用场景
- Rapid integration of authentication services - harmonyos platform
- 【微信小程序:缓存操作】
- JS的操作
- opencv 将16位图像数据转为8位、8转16
- Interface as a parameter (interface callback)
猜你喜欢

如何在HarmonyOS应用中集成App Linking服务
![[untitled]](/img/b5/348b1d8b5d34cf10e715522b9871f2.png)
[untitled]

oracle一次性说清楚,多种分隔符的一个字段拆分多行,再多行多列多种分隔符拆多行,最终处理超亿亿。。亿级别数据量
![FPGA knowledge accumulation [6]](/img/db/c3721c3e842ddf4c1088a3f54e9f2a.jpg)
FPGA knowledge accumulation [6]

调用华为游戏多媒体服务的创建引擎接口返回错误码1002,错误信息:the params is error

Installation and configuration of PLSQL

iptables 之 state模块(ftp服务练习)

关于基于kangle和EP面板使用CDN

Compilation and linking of programs

SSM integration
随机推荐
let const
redis故障处理 “Can‘t save in background: fork: Cannot allocate memory“
Opencv learning note 3 - image smoothing / denoising
字符串操作
路由信息协议——RIP
数据中台落地实施之法
MES system is a necessary choice for enterprise production
快速集成认证服务-HarmonyOS平台
Virtual address space
POJ - 3616 Milking Time(DP+LIS)
Le système mes est un choix nécessaire pour la production de l'entreprise
Required String parameter ‘XXX‘ is not present
2-3 lookup tree
Snyk dependency security vulnerability scanning tool
Xcit learning notes
Other 7 features of TCP [sliding window mechanism ▲]
GFS distributed file system
[Yugong series] February 2022 U3D full stack class 008 - build a galaxy scene
南京商品房买卖启用电子合同,君子签助力房屋交易在线网签备案
Compilation and linking of programs