当前位置:网站首页>Talk about row storage and column storage of database
Talk about row storage and column storage of database
2022-07-28 21:30:00 【JavaShark】
When many people first learned about databases , It's a relational database , Data is stored in tabular form , A row represents a record . In fact, this is a typical row storage (Row-based store), Store tables on disk partitions by rows .
Some databases also support column storage (Column-based store), It stores tables in columns on disk partitions .
Comparison of storage methods
The difference between the two is shown in the figure below :

As you can see from the diagram , When saving , The attribute values of a row of records are stored in the adjacent space , Then there is the attribute value of the next record .
And when it comes to inventory , All values of a single attribute are stored in adjacent spaces , That is, all data in a column is stored continuously , Each attribute has a different space .
here , You can think about which of the two is more suitable for query , Which is more suitable for modification ?
Comparison on data writing :
1) Write to row store is done at one time . Writing is based on the file system of the operating system , It can guarantee the success or failure of the writing process , The integrity of the data can thus be determined .
2) Column storage because of the need to split a row of records into a single column to save , Write times are significantly more than line storage , Plus the time it takes for the head to move and position on the disc , The actual time consumption will be greater . therefore , Row storage has a great advantage in writing .
3) And data modification , This is actually a write process . therefore , Data modification is also dominated by row storage .
Comparison on data reading :
1) Row storage usually takes a row of data out completely , If only a few columns of data are needed , There will be redundant columns , In order to shorten the processing time , The process of eliminating redundant columns is usually done in memory .
2) Column stores one or all of the data read at a time , There is no redundancy problem , Find content for continuous storage , Especially suitable for projection .
3) Two types of stored data distribution . Because each column of data stored in a column is homogeneous , There is no ambiguity . For example, the data type of a column is integer (int), So its data set must be integer data . This makes data parsing very easy . by comparison , Row storage is much more complicated , Because there are many types of data stored in one row of records , Data parsing requires frequent conversion between multiple data types , This operation is very consuming CPU, Increased parsing time . therefore , The parsing process of column storage is more conducive to analyzing big data .
4) Compare data compression with better performance reading . Data in the same column , Data types are consistent , Column storage mode is suitable for data compression , Different columns can use different compression algorithms , Compressed storage brings IO Performance improvement .
Comparison of advantages and disadvantages
The storage type of a table is the first step in table definition design , The customer business type is the main factor that determines the storage type of the table . That's ok 、 Column storage models have their own advantages and disadvantages , It is suggested to choose according to the actual situation .
That's ok 、 See the table below for the advantages and disadvantages of listing and comparison of applicable scenarios :
Bank deposit | Column to save | |
advantage | The data is kept together .INSERT/UPDATE Easy to . |
|
shortcoming | choice (Selection) Even if only a few columns are involved , All the data will also be read . |
|
Applicable scenario |
|
|
Row storage and column storage experiments
openGauss Support row column hybrid storage , You can specify the storage method when creating tables . Now let's do an experiment .
Experimental environment : Huawei cloud server + openGauss Enterprise Edition 3.0.0 + openEuler20.03
Create row save table custom1 And inventory table custom2 , Insert 50 Ten thousand records .
openGauss=# create table custom1 (id integer,name varchar2(20));CREATE TABLEopenGauss=# create table custom2 (id integer,name varchar2(20)) with (orientation = column);CREATE TABLEopenGauss=# insert into custom1 select n,'testtt'||n from generate_series(1,500000) n;INSERT 0 500000openGauss=# insert into custom2 select * from custom1;INSERT 0 500000
Let's look at the storage space of the two tables , Compare Size Column , It can be seen that the storage space of column storage table is much smaller than that of row storage table , Almost rows are stored in table space 1/7.
openGauss=# \d+List of relationsSchema | Name | Type | Owner | Size | Storage | Description--------+------------+-------+-------+------------+--------------------------------------+-------------public | custom1 | table | omm | 24 MB | {orientation=row,compression=no} |public | custom2 | table | omm | 3104 kB | {orientation=column,compression=low} |
Compare the time of inserting a new record , It's a little slower to list tables .
openGauss=# explain analyze insert into custom1 values(1,'zhang3');QUERY PLAN-----------------------------------------------------------------------------------------------[Bypass]Insert on custom1 (cost=0.00..0.01 rows=1 width=0) (actual time=0.059..0.060 rows=1 loops=1)-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)Total runtime: 0.135 ms(4 rows)openGauss=# explain analyze insert into custom2 values(1,'zhang3');QUERY PLAN-----------------------------------------------------------------------------------------------Insert on custom2 (cost=0.00..0.01 rows=1 width=0) (actual time=0.119..0.120 rows=1 loops=1)-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)Total runtime: 0.207 ms(3 rows)
Finally, delete the test table .
openGauss=# drop table custom1;DROP TABLEopenGauss=#drop table custom2;DROP TABLE
Interested students can test more scenarios by themselves , For example, create large and wide tables 、update Table and other scenarios .
Choose suggestions
Update frequency : If the data is updated frequently , Select row save table .
Insertion frequency : Frequent small insertions , Select row save table . Insert a large amount of data at one time , Select the column save table .
The column number of the table : In general , If the table has more fields, that is, more columns ( A wide watch ), When there are not many columns involved in the query , Suitable for column storage . If the number of fields in the table is small , Query most fields , It is better to select row storage .
Number of columns to query : If every query , Only a few of the tables are involved (<50% The total number of columns ) Several columns , Select the column save table .( Don't ask what the rest of the columns are for , What Party A says is useful is useful .)
compression ratio : The compression ratio of column saving table is higher than that of row saving table . But high compression rates consume more CPU resources .
matters needing attention
Because of the special storage method , There are many constraints when using . such as , The column save table does not support arrays 、 Generating Columns... Is not supported 、 Creating global temporary tables is not supported 、 Foreign key not supported , The supported data types are also less than row storage . You need to view the corresponding database documents .
边栏推荐
猜你喜欢

Maintenance of delta hot metal detector principle analysis of v5g-jc-r1 laser measurement sensor / detector
![[tidb] importing TXT documents into the database is really efficient](/img/2a/d33849987a75c4a0d52d8f0ab767ca.png)
[tidb] importing TXT documents into the database is really efficient

Coding with these 16 naming rules can save you more than half of your comments!

The development of smart home industry pays close attention to edge computing and applet container technology

Ctfshow network lost track record (1)

证券企业基于容器化 PaaS 平台的 DevOps 规划建设 29 个典型问题总结

How does lazada store make up orders efficiently? (detailed technical explanation of evaluation self-supporting number)

编码用这16个命名规则能让你少写一半以上的注释!

顶级“Redis 笔记”, 缓存雪崩 + 击穿 + 穿透 + 集群 + 分布式锁,NB 了

How NPM switches Taobao source images
随机推荐
Leetcode linked list problem -- 142. circular linked list II (learn the linked list by one question and one article)
上市1个月接连发生两起安全事故,理想L9还理想吗?
Study and use of cobalt strike
证券企业基于容器化 PaaS 平台的 DevOps 规划建设 29 个典型问题总结
30. Learn highcharts label rotation histogram
详细讲解C语言12(C语言系列)
Leetcode interview question 02.07. Linked list intersection [knowledge points: Double pointers, stack]
(PMIC)全、半桥驱动器CSD95481RWJ PDF 规格
Uncaught Error:Invalid geoJson format Cannot read property ‘length‘ of undefind
Maintenance of delta hot metal detector principle analysis of v5g-jc-r1 laser measurement sensor / detector
35 道 MySQL 面试必问题图解,这样也太好理解了吧
Young freshmen yearn for more open source | here comes the escape guide from open source to employment!
Automatic filling of spare parts at mobile end
ABB electromagnetic flowmeter maintenance signal transmitter maintenance 41f/e4 technical parameters
苹果M1处理器详解:性能及能效成倍提升,Intel酷睿i9也不是对手!
(PMIC) full and half bridge drive csd95481rwj PDF specification
面向千元级5G手机市场,联发科天玑700发布
NTP server time (view server time)
1945. 字符串转化后的各位数字之和
uniapp的进度条自定义