当前位置：网站首页>Talk about row storage and column storage of database

Talk about row storage and column storage of database

2022-07-28 21:30:00 【JavaShark】

When many people first learned about databases , It's a relational database , Data is stored in tabular form , A row represents a record . In fact, this is a typical row storage （Row-based store）, Store tables on disk partitions by rows .

Some databases also support column storage （Column-based store）, It stores tables in columns on disk partitions .

Comparison of storage methods

The difference between the two is shown in the figure below ：

As you can see from the diagram , When saving , The attribute values of a row of records are stored in the adjacent space , Then there is the attribute value of the next record .

And when it comes to inventory , All values of a single attribute are stored in adjacent spaces , That is, all data in a column is stored continuously , Each attribute has a different space .

here , You can think about which of the two is more suitable for query , Which is more suitable for modification ？

Comparison on data writing :

1） Write to row store is done at one time . Writing is based on the file system of the operating system , It can guarantee the success or failure of the writing process , The integrity of the data can thus be determined .

2） Column storage because of the need to split a row of records into a single column to save , Write times are significantly more than line storage , Plus the time it takes for the head to move and position on the disc , The actual time consumption will be greater . therefore , Row storage has a great advantage in writing .

3） And data modification , This is actually a write process . therefore , Data modification is also dominated by row storage .

Comparison on data reading :

1） Row storage usually takes a row of data out completely , If only a few columns of data are needed , There will be redundant columns , In order to shorten the processing time , The process of eliminating redundant columns is usually done in memory .

2） Column stores one or all of the data read at a time , There is no redundancy problem , Find content for continuous storage , Especially suitable for projection .

3） Two types of stored data distribution . Because each column of data stored in a column is homogeneous , There is no ambiguity . For example, the data type of a column is integer (int), So its data set must be integer data . This makes data parsing very easy . by comparison , Row storage is much more complicated , Because there are many types of data stored in one row of records , Data parsing requires frequent conversion between multiple data types , This operation is very consuming CPU, Increased parsing time . therefore , The parsing process of column storage is more conducive to analyzing big data .

4） Compare data compression with better performance reading . Data in the same column , Data types are consistent , Column storage mode is suitable for data compression , Different columns can use different compression algorithms , Compressed storage brings IO Performance improvement .

Comparison of advantages and disadvantages

The storage type of a table is the first step in table definition design , The customer business type is the main factor that determines the storage type of the table . That's ok 、 Column storage models have their own advantages and disadvantages , It is suggested to choose according to the actual situation .

That's ok 、 See the table below for the advantages and disadvantages of listing and comparison of applicable scenarios ：

Bank deposit	Column to save
advantage	The data is kept together .INSERT/UPDATE Easy to .	When querying, only the columns involved will be read . Projection (Projection) Very efficient . Any column can be used as an index .
shortcoming	choice (Selection) Even if only a few columns are involved , All the data will also be read .	When the choice is complete , The selected column is to be reassembled . INSERT/UPDATE More trouble . Point query is not suitable for .
Applicable scenario	Point query ( Less records returned , Simple index based queries ). increase 、 Delete 、 Change the scene with more operations .	Statistical analysis class query (OLAP, For example, data warehouse business , A large number of aggregation calculations will be performed on this type of table , And less column operations are involved , relation 、 There are many grouping operations ). Instant query （ The query condition is uncertain , Row save table scanning is difficult to use index ）.

Row storage and column storage experiments

openGauss Support row column hybrid storage , You can specify the storage method when creating tables . Now let's do an experiment .

Experimental environment ： Huawei cloud server + openGauss Enterprise Edition 3.0.0 + openEuler20.03

Create row save table custom1 And inventory table custom2 , Insert 50 Ten thousand records .

openGauss=# create table custom1 (id integer,name varchar2(20)); 
CREATE TABLE 
openGauss=# create table custom2 (id integer,name varchar2(20)) with (orientation = column); 
CREATE TABLE 
openGauss=# insert into custom1 select n,'testtt'||n from generate_series(1,500000) n; 
INSERT 0 500000 
openGauss=# insert into custom2 select * from custom1; 
INSERT 0 500000

Let's look at the storage space of the two tables , Compare Size Column , It can be seen that the storage space of column storage table is much smaller than that of row storage table , Almost rows are stored in table space 1/7.

openGauss=# \d+ 
                                           List of relations 
 Schema |    Name    | Type  | Owner |    Size    |               Storage                | Description 
--------+------------+-------+-------+------------+--------------------------------------+------------- 
 public | custom1    | table | omm   | 24 MB      | {orientation=row,compression=no}     | 
 public | custom2    | table | omm   | 3104 kB    | {orientation=column,compression=low} |

Compare the time of inserting a new record , It's a little slower to list tables .

openGauss=# explain analyze insert into custom1 values(1,'zhang3'); 
                                          QUERY PLAN 
----------------------------------------------------------------------------------------------- 
 [Bypass] 
 Insert on custom1  (cost=0.00..0.01 rows=1 width=0) (actual time=0.059..0.060 rows=1 loops=1) 
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1) 
 Total runtime: 0.135 ms 
(4 rows) 
 
openGauss=# explain analyze insert into custom2 values(1,'zhang3'); 
                                          QUERY PLAN 
----------------------------------------------------------------------------------------------- 
 Insert on custom2  (cost=0.00..0.01 rows=1 width=0) (actual time=0.119..0.120 rows=1 loops=1) 
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1) 
 Total runtime: 0.207 ms 
(3 rows)

Finally, delete the test table .

openGauss=# drop table custom1; 
DROP TABLE 
openGauss=#drop table custom2; 
DROP TABLE

Interested students can test more scenarios by themselves , For example, create large and wide tables 、update Table and other scenarios .

Choose suggestions

Update frequency ： If the data is updated frequently , Select row save table .
Insertion frequency ： Frequent small insertions , Select row save table . Insert a large amount of data at one time , Select the column save table .
The column number of the table ： In general , If the table has more fields, that is, more columns （ A wide watch ）, When there are not many columns involved in the query , Suitable for column storage . If the number of fields in the table is small , Query most fields , It is better to select row storage .
Number of columns to query ： If every query , Only a few of the tables are involved （<50% The total number of columns ） Several columns , Select the column save table .（ Don't ask what the rest of the columns are for , What Party A says is useful is useful .）
compression ratio ： The compression ratio of column saving table is higher than that of row saving table . But high compression rates consume more CPU resources .

matters needing attention

Because of the special storage method , There are many constraints when using . such as , The column save table does not support arrays 、 Generating Columns... Is not supported 、 Creating global temporary tables is not supported 、 Foreign key not supported , The supported data types are also less than row storage . You need to view the corresponding database documents .

原网站

版权声明
本文为[JavaShark]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207281946046167.html