当前位置:网站首页>Introduction to data fragmentation
Introduction to data fragmentation
2022-07-07 08:43:00 【Blue sky ⊙ white clouds】
background
Traditional general Data sets Storage to a single node , In performance 、 The three aspects of availability and operation and maintenance cost have been difficult to meet the scenario of massive data .
In terms of performance , because Relational database Most use B+ Index of tree type , When the amount of data exceeds the threshold , The increase of index depth will also make disk access IO More times , This leads to a decline in query performance ; meanwhile , High concurrent access requests also make the centralized database the biggest bottleneck of the system .
In terms of usability , The statelessness of service , It can achieve random expansion with less cost , This will inevitably result in the final pressure of the system falling on the database . And a single data node , Or a simple master-slave architecture , It's getting harder and harder to bear . Database availability , Has become the key to the whole system .
In terms of operation and maintenance costs , When the data in a database instance reaches threshold above , about DBA The operation and maintenance pressure will increase . The time cost of data backup and recovery will become more and more uncontrollable with the amount of data . In general , The threshold of data for a single database instance is 1TB within , It's a reasonable range .
In the case that the traditional relational database can not meet the needs of Internet scenarios , Store data to native, distributed support NoSQL There are more and more attempts . but NoSQL Yes SQL The incompatibility of and the imperfection of the ecosystem , Make them in the game with the relational database has always been unable to complete a fatal blow , However, the status of relational database is still unshakable .
Data fragmentation refers to storing the data stored in a single database in multiple databases or tables according to a certain dimension, so as to improve the performance bottleneck and availability . The effective method of data fragmentation is to divide databases and tables into relational databases . Sub database and sub table can effectively avoid the query bottleneck caused by the amount of data exceeding the tolerable threshold . besides , Sub database can also be used to effectively disperse the single point of access to the database ; Although sub table can't relieve the pressure of database , But it can provide the possibility of transforming distributed transaction into local transaction as much as possible , When it comes to cross database update operations , Distributed transactions tend to complicate problems . Use the multi master and multi slave split mode , Can effectively avoid data single point , So as to improve the availability of data architecture .
The amount of data in each table is kept below the threshold by splitting the data into sub database and sub table , And traffic grooming to deal with high traffic , It's about responding to High concurrency And massive data system . Data fragmentation is divided into vertical fragmentation and horizontal fragmentation .
Vertical slice
The way of business splitting is called vertical segmentation , Also known as vertical split , Its core idea is dedicated to special storage . Before splitting , A database consists of multiple data tables , Each table corresponds to a different business . And after the split , It is to classify the table according to the business , Distributed to different databases , And then spread the pressure to different databases . The figure below shows the business needs , Scheme of vertically slicing user table and order table into different databases .
Vertical segmentation often needs to adjust the architecture and design . Generally speaking , It's too late to cope with the rapid change of Internet business demand ; and , It doesn't really solve the single bottleneck . Vertical splitting can alleviate the problems caused by data volume and access volume , But it can't cure . If after vertical split , The amount of data in the table still exceeds the threshold that a single node can carry , It needs to be further processed by horizontal sectioning .
Horizontal slice
Horizontal segmentation is also called horizontal splitting . Relative to the vertical slice , It no longer classifies data according to business logic , But through a certain field ( Or some fields ), Spread data across multiple libraries or tables according to certain rules , Each slice contains only a part of the data . for example : Slice according to the primary key , Even primary key records are put into 0 library ( Or table ), The record of odd primary key is put into 1 library ( Or table ), As shown below .
select * from t_user where id=1
select * from t_user where id=2
In theory, horizontal slicing breaks through the bottleneck of single machine data processing , And expand relative freedom , Is a standard solution for data fragmentation .
Challenge
Although data fragmentation solves the problem of performance 、 Availability and single point backup and recovery , But distributed architecture gains benefits at the same time , It also introduces new questions .
In the face of such scattered data after fragmentation , It is one of the most important challenges for application development engineers and database administrators to operate on the database . They need to know from which specific database sub tables the data needs to be obtained .
Another challenge is , Can correctly run in a single node database SQL, It doesn't always work correctly in the partitioned database . for example , Sub table results in the modification of table name , Or pagination 、 Sort 、 Incorrect handling of operations such as aggregation grouping .
Cross database transaction is also a thorny issue for distributed database cluster . Reasonable use of sub table , It can reduce the amount of data in a single table , Try to use local transactions , Good at using different tables in the same database can effectively avoid the trouble caused by distributed transactions . In a scenario where cross database transactions cannot be avoided , Some businesses still need to keep transactions consistent . And based on XA Because of the high concurrency of the distributed transaction in the scene of neutral can not meet the needs , Not used on a large scale by internet giants , Most of them use the final consistent flexible transaction instead of the strong consistent transaction .
The goal is
Try to be transparent about the impact of sub database and sub table , Let users try to use the database cluster after horizontal fragmentation just like a database .
边栏推荐
- idea里使用module项目的一个bug
- Arm GIC (IV) GIC V3 register class analysis notes.
- [Nanjing University] - [software analysis] course learning notes (I) -introduction
- Greenplum6.x搭建_环境配置
- [hard core science popularization] working principle of dynamic loop monitoring system
- [machine learning] watermelon book data set_ data sharing
- Count sort (diagram)
- 快速集成认证服务-HarmonyOS平台
- 如何在HarmonyOS应用中集成App Linking服务
- Pvtv2--pyramid vision transformer V2 learning notes
猜你喜欢
快速集成认证服务-HarmonyOS平台
归并排序和非比较排序
数据分析方法论与前人经验总结2【笔记干货】
Laravel8 uses passport login and JWT (generate token)
The single value view in Splunk uses to replace numeric values with text
Data type - floating point (C language)
Through the "last mile" of legal services for the masses, fangzheng Puhua labor and personnel law self-service consulting service platform has been frequently "praised"
打通法律服务群众“最后一公里”,方正璞华劳动人事法律自助咨询服务平台频获“点赞”
Are you holding back on the publicity of the salary system for it posts such as testing, development, operation and maintenance?
let const
随机推荐
Golang 编译约束/条件编译 ( // +build <tags> )
2-3查找树
Implementation of navigation bar at the bottom of applet
Leetcode 1984. Minimum difference in student scores
About using CDN based on Kangle and EP panel
如何在快应用中实现滑动操作组件
Iptables' state module (FTP service exercise)
National standard gb28181 protocol video platform easygbs adds streaming timeout configuration
Tips for using jeditabletable
Redis summary
[Chongqing Guangdong education] organic electronics (Bilingual) reference materials of Nanjing University of Posts and Telecommunications
注解@ConfigurationProperties的三种使用场景
对API接口或H5接口做签名认证
Xcit learning notes
归并排序和非比较排序
如何理解分布式架构和微服务架构呢
数据分析方法论与前人经验总结2【笔记干货】
Shell script for changing the current folder and the file date under the folder
求有符号数的原码、反码和补码【C语言】
iptables 之 state模块(ftp服务练习)