当前位置:网站首页>Talk about row storage and column storage of database
Talk about row storage and column storage of database
2022-07-28 21:30:00 【JavaShark】
When many people first learned about databases , It's a relational database , Data is stored in tabular form , A row represents a record . In fact, this is a typical row storage (Row-based store), Store tables on disk partitions by rows .
Some databases also support column storage (Column-based store), It stores tables in columns on disk partitions .
Comparison of storage methods
The difference between the two is shown in the figure below :

As you can see from the diagram , When saving , The attribute values of a row of records are stored in the adjacent space , Then there is the attribute value of the next record .
And when it comes to inventory , All values of a single attribute are stored in adjacent spaces , That is, all data in a column is stored continuously , Each attribute has a different space .
here , You can think about which of the two is more suitable for query , Which is more suitable for modification ?
Comparison on data writing :
1) Write to row store is done at one time . Writing is based on the file system of the operating system , It can guarantee the success or failure of the writing process , The integrity of the data can thus be determined .
2) Column storage because of the need to split a row of records into a single column to save , Write times are significantly more than line storage , Plus the time it takes for the head to move and position on the disc , The actual time consumption will be greater . therefore , Row storage has a great advantage in writing .
3) And data modification , This is actually a write process . therefore , Data modification is also dominated by row storage .
Comparison on data reading :
1) Row storage usually takes a row of data out completely , If only a few columns of data are needed , There will be redundant columns , In order to shorten the processing time , The process of eliminating redundant columns is usually done in memory .
2) Column stores one or all of the data read at a time , There is no redundancy problem , Find content for continuous storage , Especially suitable for projection .
3) Two types of stored data distribution . Because each column of data stored in a column is homogeneous , There is no ambiguity . For example, the data type of a column is integer (int), So its data set must be integer data . This makes data parsing very easy . by comparison , Row storage is much more complicated , Because there are many types of data stored in one row of records , Data parsing requires frequent conversion between multiple data types , This operation is very consuming CPU, Increased parsing time . therefore , The parsing process of column storage is more conducive to analyzing big data .
4) Compare data compression with better performance reading . Data in the same column , Data types are consistent , Column storage mode is suitable for data compression , Different columns can use different compression algorithms , Compressed storage brings IO Performance improvement .
Comparison of advantages and disadvantages
The storage type of a table is the first step in table definition design , The customer business type is the main factor that determines the storage type of the table . That's ok 、 Column storage models have their own advantages and disadvantages , It is suggested to choose according to the actual situation .
That's ok 、 See the table below for the advantages and disadvantages of listing and comparison of applicable scenarios :
Bank deposit | Column to save | |
advantage | The data is kept together .INSERT/UPDATE Easy to . |
|
shortcoming | choice (Selection) Even if only a few columns are involved , All the data will also be read . |
|
Applicable scenario |
|
|
Row storage and column storage experiments
openGauss Support row column hybrid storage , You can specify the storage method when creating tables . Now let's do an experiment .
Experimental environment : Huawei cloud server + openGauss Enterprise Edition 3.0.0 + openEuler20.03
Create row save table custom1 And inventory table custom2 , Insert 50 Ten thousand records .
openGauss=# create table custom1 (id integer,name varchar2(20));CREATE TABLEopenGauss=# create table custom2 (id integer,name varchar2(20)) with (orientation = column);CREATE TABLEopenGauss=# insert into custom1 select n,'testtt'||n from generate_series(1,500000) n;INSERT 0 500000openGauss=# insert into custom2 select * from custom1;INSERT 0 500000
Let's look at the storage space of the two tables , Compare Size Column , It can be seen that the storage space of column storage table is much smaller than that of row storage table , Almost rows are stored in table space 1/7.
openGauss=# \d+List of relationsSchema | Name | Type | Owner | Size | Storage | Description--------+------------+-------+-------+------------+--------------------------------------+-------------public | custom1 | table | omm | 24 MB | {orientation=row,compression=no} |public | custom2 | table | omm | 3104 kB | {orientation=column,compression=low} |
Compare the time of inserting a new record , It's a little slower to list tables .
openGauss=# explain analyze insert into custom1 values(1,'zhang3');QUERY PLAN-----------------------------------------------------------------------------------------------[Bypass]Insert on custom1 (cost=0.00..0.01 rows=1 width=0) (actual time=0.059..0.060 rows=1 loops=1)-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)Total runtime: 0.135 ms(4 rows)openGauss=# explain analyze insert into custom2 values(1,'zhang3');QUERY PLAN-----------------------------------------------------------------------------------------------Insert on custom2 (cost=0.00..0.01 rows=1 width=0) (actual time=0.119..0.120 rows=1 loops=1)-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)Total runtime: 0.207 ms(3 rows)
Finally, delete the test table .
openGauss=# drop table custom1;DROP TABLEopenGauss=#drop table custom2;DROP TABLE
Interested students can test more scenarios by themselves , For example, create large and wide tables 、update Table and other scenarios .
Choose suggestions
Update frequency : If the data is updated frequently , Select row save table .
Insertion frequency : Frequent small insertions , Select row save table . Insert a large amount of data at one time , Select the column save table .
The column number of the table : In general , If the table has more fields, that is, more columns ( A wide watch ), When there are not many columns involved in the query , Suitable for column storage . If the number of fields in the table is small , Query most fields , It is better to select row storage .
Number of columns to query : If every query , Only a few of the tables are involved (<50% The total number of columns ) Several columns , Select the column save table .( Don't ask what the rest of the columns are for , What Party A says is useful is useful .)
compression ratio : The compression ratio of column saving table is higher than that of row saving table . But high compression rates consume more CPU resources .
matters needing attention
Because of the special storage method , There are many constraints when using . such as , The column save table does not support arrays 、 Generating Columns... Is not supported 、 Creating global temporary tables is not supported 、 Foreign key not supported , The supported data types are also less than row storage . You need to view the corresponding database documents .
边栏推荐
猜你喜欢

基于Xilinx的时序分析与约束

Analysis of critical path

(turn) bubble sorting and optimization details

Week 6 Linear Models for Classification (Part B)

怎么理解数据网格(Data Mesh)

Discussion: if you want to land Devops, is it enough to only consider a good PAAS container platform?

Maxwell 一款简单易上手的实时抓取Mysql数据的软件

Pytorch学习记录(三):随机梯度下降、神经网络与全连接

Ctfshow network lost track record (2)

SSM use @async and create threadpooltaskexecutor thread pool
随机推荐
30. Learn highcharts label rotation histogram
Maxwell is an easy-to-use software for capturing MySQL data in real time
Icml2022 | timing self-monitoring video transformer
MySQL
Timing analysis and constraints based on Xilinx
[tidb] importing TXT documents into the database is really efficient
SSM use @async and create threadpooltaskexecutor thread pool
(PMIC) full and half bridge drive csd95481rwj PDF specification
Automatic filling of spare parts at mobile end
关键路径的分析
基于Xilinx的时序分析与约束
The development of smart home industry pays close attention to edge computing and applet container technology
Maintenance of delta hot metal detector principle analysis of v5g-jc-r1 laser measurement sensor / detector
顶级“Redis 笔记”, 缓存雪崩 + 击穿 + 穿透 + 集群 + 分布式锁,NB 了
Source insight uses shortcut keys
Buuctf questions upload labs record pass-01~pass-10
(PMIC)全、半桥驱动器CSD95481RWJ PDF 规格
Applet container technology improves mobile R & D efficiency by 500%
How Oracle exports data (how Oracle backs up databases)
CVPR 2022 | in depth study of batch normalized estimation offset in network