当前位置:网站首页>Introduction to data fragmentation
Introduction to data fragmentation
2022-07-07 08:43:00 【Blue sky ⊙ white clouds】
background
Traditional general Data sets Storage to a single node , In performance 、 The three aspects of availability and operation and maintenance cost have been difficult to meet the scenario of massive data .
In terms of performance , because Relational database Most use B+ Index of tree type , When the amount of data exceeds the threshold , The increase of index depth will also make disk access IO More times , This leads to a decline in query performance ; meanwhile , High concurrent access requests also make the centralized database the biggest bottleneck of the system .
In terms of usability , The statelessness of service , It can achieve random expansion with less cost , This will inevitably result in the final pressure of the system falling on the database . And a single data node , Or a simple master-slave architecture , It's getting harder and harder to bear . Database availability , Has become the key to the whole system .
In terms of operation and maintenance costs , When the data in a database instance reaches threshold above , about DBA The operation and maintenance pressure will increase . The time cost of data backup and recovery will become more and more uncontrollable with the amount of data . In general , The threshold of data for a single database instance is 1TB within , It's a reasonable range .
In the case that the traditional relational database can not meet the needs of Internet scenarios , Store data to native, distributed support NoSQL There are more and more attempts . but NoSQL Yes SQL The incompatibility of and the imperfection of the ecosystem , Make them in the game with the relational database has always been unable to complete a fatal blow , However, the status of relational database is still unshakable .
Data fragmentation refers to storing the data stored in a single database in multiple databases or tables according to a certain dimension, so as to improve the performance bottleneck and availability . The effective method of data fragmentation is to divide databases and tables into relational databases . Sub database and sub table can effectively avoid the query bottleneck caused by the amount of data exceeding the tolerable threshold . besides , Sub database can also be used to effectively disperse the single point of access to the database ; Although sub table can't relieve the pressure of database , But it can provide the possibility of transforming distributed transaction into local transaction as much as possible , When it comes to cross database update operations , Distributed transactions tend to complicate problems . Use the multi master and multi slave split mode , Can effectively avoid data single point , So as to improve the availability of data architecture .
The amount of data in each table is kept below the threshold by splitting the data into sub database and sub table , And traffic grooming to deal with high traffic , It's about responding to High concurrency And massive data system . Data fragmentation is divided into vertical fragmentation and horizontal fragmentation .
Vertical slice
The way of business splitting is called vertical segmentation , Also known as vertical split , Its core idea is dedicated to special storage . Before splitting , A database consists of multiple data tables , Each table corresponds to a different business . And after the split , It is to classify the table according to the business , Distributed to different databases , And then spread the pressure to different databases . The figure below shows the business needs , Scheme of vertically slicing user table and order table into different databases .
Vertical segmentation often needs to adjust the architecture and design . Generally speaking , It's too late to cope with the rapid change of Internet business demand ; and , It doesn't really solve the single bottleneck . Vertical splitting can alleviate the problems caused by data volume and access volume , But it can't cure . If after vertical split , The amount of data in the table still exceeds the threshold that a single node can carry , It needs to be further processed by horizontal sectioning .
Horizontal slice
Horizontal segmentation is also called horizontal splitting . Relative to the vertical slice , It no longer classifies data according to business logic , But through a certain field ( Or some fields ), Spread data across multiple libraries or tables according to certain rules , Each slice contains only a part of the data . for example : Slice according to the primary key , Even primary key records are put into 0 library ( Or table ), The record of odd primary key is put into 1 library ( Or table ), As shown below .
select * from t_user where id=1
select * from t_user where id=2
In theory, horizontal slicing breaks through the bottleneck of single machine data processing , And expand relative freedom , Is a standard solution for data fragmentation .
Challenge
Although data fragmentation solves the problem of performance 、 Availability and single point backup and recovery , But distributed architecture gains benefits at the same time , It also introduces new questions .
In the face of such scattered data after fragmentation , It is one of the most important challenges for application development engineers and database administrators to operate on the database . They need to know from which specific database sub tables the data needs to be obtained .
Another challenge is , Can correctly run in a single node database SQL, It doesn't always work correctly in the partitioned database . for example , Sub table results in the modification of table name , Or pagination 、 Sort 、 Incorrect handling of operations such as aggregation grouping .
Cross database transaction is also a thorny issue for distributed database cluster . Reasonable use of sub table , It can reduce the amount of data in a single table , Try to use local transactions , Good at using different tables in the same database can effectively avoid the trouble caused by distributed transactions . In a scenario where cross database transactions cannot be avoided , Some businesses still need to keep transactions consistent . And based on XA Because of the high concurrency of the distributed transaction in the scene of neutral can not meet the needs , Not used on a large scale by internet giants , Most of them use the final consistent flexible transaction instead of the strong consistent transaction .
The goal is
Try to be transparent about the impact of sub database and sub table , Let users try to use the database cluster after horizontal fragmentation just like a database .
边栏推荐
- Three usage scenarios of annotation @configurationproperties
- mysql分区讲解及操作语句
- 路由信息协议——RIP
- 说一个软件创业项目,有谁愿意投资的吗?
- Virtual address space
- Greenplum6.x搭建_环境配置
- Opencv learning notes II - basic image operations
- Novice entry SCM must understand those things
- [Yu Yue education] C language programming reference of Zhongbei College of Nanjing Normal University
- Installation and configuration of PLSQL
猜你喜欢
Pvtv2--pyramid vision transformer V2 learning notes
iptables 之 state模块(ftp服务练习)
Arm GIC (IV) GIC V3 register class analysis notes.
AVL平衡二叉搜索树
Data type - integer (C language)
一种适用于应用频繁测试下快速查看Pod的日志的方法(grep awk xargs kuberctl)
Count sort (diagram)
Input of mathematical formula of obsidan
[hard core science popularization] working principle of dynamic loop monitoring system
[Yu Yue education] basic reference materials of electrical and electronic technology of Nanjing Institute of information technology
随机推荐
POJ - 3616 Milking Time(DP+LIS)
FPGA knowledge accumulation [6]
如何在HarmonyOS应用中集成App Linking服务
[Yu Yue education] basic reference materials of electrical and electronic technology of Nanjing Institute of information technology
Golan idea IntelliJ cannot input Chinese characters
Required String parameter ‘XXX‘ is not present
Leetcode 1984. Minimum difference in student scores
Opencv learning notes 1 -- several methods of reading images
数据中台落地实施之法
Golang compilation constraint / conditional compilation (/ / +build < tags>)
Data type - integer (C language)
Snyk dependency security vulnerability scanning tool
[Yu Yue education] C language programming reference of Zhongbei College of Nanjing Normal University
Obsidan之数学公式的输入
打通法律服务群众“最后一公里”,方正璞华劳动人事法律自助咨询服务平台频获“点赞”
Greenplum6.x监控软件搭建
Implementation method of data platform landing
【MySQL】数据库进阶之触发器内容详解
MySQL introduction - crud Foundation (establishment of the prototype of the idea of adding, deleting, changing and searching)
[Yugong series] February 2022 U3D full stack class 006 unity toolbar