当前位置:网站首页>Introduction to data fragmentation
Introduction to data fragmentation
2022-07-07 08:43:00 【Blue sky ⊙ white clouds】
background
Traditional general Data sets Storage to a single node , In performance 、 The three aspects of availability and operation and maintenance cost have been difficult to meet the scenario of massive data .
In terms of performance , because Relational database Most use B+ Index of tree type , When the amount of data exceeds the threshold , The increase of index depth will also make disk access IO More times , This leads to a decline in query performance ; meanwhile , High concurrent access requests also make the centralized database the biggest bottleneck of the system .
In terms of usability , The statelessness of service , It can achieve random expansion with less cost , This will inevitably result in the final pressure of the system falling on the database . And a single data node , Or a simple master-slave architecture , It's getting harder and harder to bear . Database availability , Has become the key to the whole system .
In terms of operation and maintenance costs , When the data in a database instance reaches threshold above , about DBA The operation and maintenance pressure will increase . The time cost of data backup and recovery will become more and more uncontrollable with the amount of data . In general , The threshold of data for a single database instance is 1TB within , It's a reasonable range .
In the case that the traditional relational database can not meet the needs of Internet scenarios , Store data to native, distributed support NoSQL There are more and more attempts . but NoSQL Yes SQL The incompatibility of and the imperfection of the ecosystem , Make them in the game with the relational database has always been unable to complete a fatal blow , However, the status of relational database is still unshakable .
Data fragmentation refers to storing the data stored in a single database in multiple databases or tables according to a certain dimension, so as to improve the performance bottleneck and availability . The effective method of data fragmentation is to divide databases and tables into relational databases . Sub database and sub table can effectively avoid the query bottleneck caused by the amount of data exceeding the tolerable threshold . besides , Sub database can also be used to effectively disperse the single point of access to the database ; Although sub table can't relieve the pressure of database , But it can provide the possibility of transforming distributed transaction into local transaction as much as possible , When it comes to cross database update operations , Distributed transactions tend to complicate problems . Use the multi master and multi slave split mode , Can effectively avoid data single point , So as to improve the availability of data architecture .
The amount of data in each table is kept below the threshold by splitting the data into sub database and sub table , And traffic grooming to deal with high traffic , It's about responding to High concurrency And massive data system . Data fragmentation is divided into vertical fragmentation and horizontal fragmentation .
Vertical slice
The way of business splitting is called vertical segmentation , Also known as vertical split , Its core idea is dedicated to special storage . Before splitting , A database consists of multiple data tables , Each table corresponds to a different business . And after the split , It is to classify the table according to the business , Distributed to different databases , And then spread the pressure to different databases . The figure below shows the business needs , Scheme of vertically slicing user table and order table into different databases .
Vertical segmentation often needs to adjust the architecture and design . Generally speaking , It's too late to cope with the rapid change of Internet business demand ; and , It doesn't really solve the single bottleneck . Vertical splitting can alleviate the problems caused by data volume and access volume , But it can't cure . If after vertical split , The amount of data in the table still exceeds the threshold that a single node can carry , It needs to be further processed by horizontal sectioning .
Horizontal slice
Horizontal segmentation is also called horizontal splitting . Relative to the vertical slice , It no longer classifies data according to business logic , But through a certain field ( Or some fields ), Spread data across multiple libraries or tables according to certain rules , Each slice contains only a part of the data . for example : Slice according to the primary key , Even primary key records are put into 0 library ( Or table ), The record of odd primary key is put into 1 library ( Or table ), As shown below .
select * from t_user where id=1
select * from t_user where id=2
In theory, horizontal slicing breaks through the bottleneck of single machine data processing , And expand relative freedom , Is a standard solution for data fragmentation .
Challenge
Although data fragmentation solves the problem of performance 、 Availability and single point backup and recovery , But distributed architecture gains benefits at the same time , It also introduces new questions .
In the face of such scattered data after fragmentation , It is one of the most important challenges for application development engineers and database administrators to operate on the database . They need to know from which specific database sub tables the data needs to be obtained .
Another challenge is , Can correctly run in a single node database SQL, It doesn't always work correctly in the partitioned database . for example , Sub table results in the modification of table name , Or pagination 、 Sort 、 Incorrect handling of operations such as aggregation grouping .
Cross database transaction is also a thorny issue for distributed database cluster . Reasonable use of sub table , It can reduce the amount of data in a single table , Try to use local transactions , Good at using different tables in the same database can effectively avoid the trouble caused by distributed transactions . In a scenario where cross database transactions cannot be avoided , Some businesses still need to keep transactions consistent . And based on XA Because of the high concurrency of the distributed transaction in the scene of neutral can not meet the needs , Not used on a large scale by internet giants , Most of them use the final consistent flexible transaction instead of the strong consistent transaction .
The goal is
Try to be transparent about the impact of sub database and sub table , Let users try to use the database cluster after horizontal fragmentation just like a database .
边栏推荐
- Input and output of floating point data (C language)
- Rapid integration of authentication services - harmonyos platform
- Teach you how to select PCB board by hand (II)
- Sign and authenticate API interface or H5 interface
- How to integrate app linking services in harmonyos applications
- POJ - 3616 Milking Time(DP+LIS)
- [Chongqing Guangdong education] organic electronics (Bilingual) reference materials of Nanjing University of Posts and Telecommunications
- Merge sort and non comparison sort
- A bug using module project in idea
- [step on the pit] Nacos registration has been connected to localhost:8848, no available server
猜你喜欢
Input and output of floating point data (C language)
[Nanjing University] - [software analysis] course learning notes (I) -introduction
Train your dataset with swinunet
Obsidan之数学公式的输入
Implement your own dataset using bisenet
Compilation and linking of programs
登山小分队(dfs)
如何在HarmonyOS应用中集成App Linking服务
Quick sorting (detailed illustration of single way, double way, three way)
iptables 之 state模块(ftp服务练习)
随机推荐
详解华为应用市场2022年逐步减少32位包体上架应用和策略
Iptables' state module (FTP service exercise)
All about PDF crack, a complete solution to meet all your PDF needs
2-3 lookup tree
数据分析方法论与前人经验总结2【笔记干货】
Qt Charts使用(重写QChartView,实现一些自定义功能)
[kuangbin]专题十五 数位DP
Greenplum6.x搭建_环境配置
let const
redis故障处理 “Can‘t save in background: fork: Cannot allocate memory“
Go语言中,函数是一种类型
ES6_ Arrow function
为什么要选择云原生数据库
求有符号数的原码、反码和补码【C语言】
Opencv learning notes II - basic image operations
idea里使用module项目的一个bug
如何在图片的目标中添加目标的mask
leetcode135. Distribute candy
IP guard helps energy enterprises improve terminal anti disclosure measures to protect the security of confidential information
Opencv learning note 4 - expansion / corrosion / open operation / close operation