当前位置:网站首页>Talk about row storage and column storage of database
Talk about row storage and column storage of database
2022-07-28 21:30:00 【JavaShark】
When many people first learned about databases , It's a relational database , Data is stored in tabular form , A row represents a record . In fact, this is a typical row storage (Row-based store), Store tables on disk partitions by rows .
Some databases also support column storage (Column-based store), It stores tables in columns on disk partitions .
Comparison of storage methods
The difference between the two is shown in the figure below :
As you can see from the diagram , When saving , The attribute values of a row of records are stored in the adjacent space , Then there is the attribute value of the next record .
And when it comes to inventory , All values of a single attribute are stored in adjacent spaces , That is, all data in a column is stored continuously , Each attribute has a different space .
here , You can think about which of the two is more suitable for query , Which is more suitable for modification ?
Comparison on data writing :
1) Write to row store is done at one time . Writing is based on the file system of the operating system , It can guarantee the success or failure of the writing process , The integrity of the data can thus be determined .
2) Column storage because of the need to split a row of records into a single column to save , Write times are significantly more than line storage , Plus the time it takes for the head to move and position on the disc , The actual time consumption will be greater . therefore , Row storage has a great advantage in writing .
3) And data modification , This is actually a write process . therefore , Data modification is also dominated by row storage .
Comparison on data reading :
1) Row storage usually takes a row of data out completely , If only a few columns of data are needed , There will be redundant columns , In order to shorten the processing time , The process of eliminating redundant columns is usually done in memory .
2) Column stores one or all of the data read at a time , There is no redundancy problem , Find content for continuous storage , Especially suitable for projection .
3) Two types of stored data distribution . Because each column of data stored in a column is homogeneous , There is no ambiguity . For example, the data type of a column is integer (int), So its data set must be integer data . This makes data parsing very easy . by comparison , Row storage is much more complicated , Because there are many types of data stored in one row of records , Data parsing requires frequent conversion between multiple data types , This operation is very consuming CPU, Increased parsing time . therefore , The parsing process of column storage is more conducive to analyzing big data .
4) Compare data compression with better performance reading . Data in the same column , Data types are consistent , Column storage mode is suitable for data compression , Different columns can use different compression algorithms , Compressed storage brings IO Performance improvement .
Comparison of advantages and disadvantages
The storage type of a table is the first step in table definition design , The customer business type is the main factor that determines the storage type of the table . That's ok 、 Column storage models have their own advantages and disadvantages , It is suggested to choose according to the actual situation .
That's ok 、 See the table below for the advantages and disadvantages of listing and comparison of applicable scenarios :
Bank deposit | Column to save | |
advantage | The data is kept together .INSERT/UPDATE Easy to . |
|
shortcoming | choice (Selection) Even if only a few columns are involved , All the data will also be read . |
|
Applicable scenario |
|
|
Row storage and column storage experiments
openGauss Support row column hybrid storage , You can specify the storage method when creating tables . Now let's do an experiment .
Experimental environment : Huawei cloud server + openGauss Enterprise Edition 3.0.0 + openEuler20.03
Create row save table custom1 And inventory table custom2 , Insert 50 Ten thousand records .
openGauss=# create table custom1 (id integer,name varchar2(20));
CREATE TABLE
openGauss=# create table custom2 (id integer,name varchar2(20)) with (orientation = column);
CREATE TABLE
openGauss=# insert into custom1 select n,'testtt'||n from generate_series(1,500000) n;
INSERT 0 500000
openGauss=# insert into custom2 select * from custom1;
INSERT 0 500000
Let's look at the storage space of the two tables , Compare Size Column , It can be seen that the storage space of column storage table is much smaller than that of row storage table , Almost rows are stored in table space 1/7.
openGauss=# \d+
List of relations
Schema | Name | Type | Owner | Size | Storage | Description
--------+------------+-------+-------+------------+--------------------------------------+-------------
public | custom1 | table | omm | 24 MB | {orientation=row,compression=no} |
public | custom2 | table | omm | 3104 kB | {orientation=column,compression=low} |
Compare the time of inserting a new record , It's a little slower to list tables .
openGauss=# explain analyze insert into custom1 values(1,'zhang3');
QUERY PLAN
-----------------------------------------------------------------------------------------------
[Bypass]
Insert on custom1 (cost=0.00..0.01 rows=1 width=0) (actual time=0.059..0.060 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Total runtime: 0.135 ms
(4 rows)
openGauss=# explain analyze insert into custom2 values(1,'zhang3');
QUERY PLAN
-----------------------------------------------------------------------------------------------
Insert on custom2 (cost=0.00..0.01 rows=1 width=0) (actual time=0.119..0.120 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)
Total runtime: 0.207 ms
(3 rows)
Finally, delete the test table .
openGauss=# drop table custom1;
DROP TABLE
openGauss=#drop table custom2;
DROP TABLE
Interested students can test more scenarios by themselves , For example, create large and wide tables 、update Table and other scenarios .
Choose suggestions
Update frequency : If the data is updated frequently , Select row save table .
Insertion frequency : Frequent small insertions , Select row save table . Insert a large amount of data at one time , Select the column save table .
The column number of the table : In general , If the table has more fields, that is, more columns ( A wide watch ), When there are not many columns involved in the query , Suitable for column storage . If the number of fields in the table is small , Query most fields , It is better to select row storage .
Number of columns to query : If every query , Only a few of the tables are involved (<50% The total number of columns ) Several columns , Select the column save table .( Don't ask what the rest of the columns are for , What Party A says is useful is useful .)
compression ratio : The compression ratio of column saving table is higher than that of row saving table . But high compression rates consume more CPU resources .
matters needing attention
Because of the special storage method , There are many constraints when using . such as , The column save table does not support arrays 、 Generating Columns... Is not supported 、 Creating global temporary tables is not supported 、 Foreign key not supported , The supported data types are also less than row storage . You need to view the corresponding database documents .
边栏推荐
- What functions does MySQL have? Don't look everywhere. Just look at this.
- 承载银行关键应用的容器云平台如何选型及建设?
- 编码用这16个命名规则能让你少写一半以上的注释!
- 微服务架构下的系统集成
- 百度搜索符合预期,但涉及外链黑帽策略,什么原因?
- MySQL 是如何归档数据的呢?
- Quii Cordova plugin telerik imagepicker plug-in multi image upload out of sequence
- 职场高薪 |「中高级测试」面试题
- The framing efficiency of setpreviewcallbackwithbuffer will become lower
- MySQL
猜你喜欢
Guanghetong & Qualcomm Internet of things technology open day successfully held
Reading and writing basic data types in protobuf
Coding with these 16 naming rules can save you more than half of your comments!
The ref value ‘xxx‘ will likely have changed by the time this effect function runs.If this ref......
Database -- use of explain
SQL Server 数据库之备份和恢复数据库
Top level "redis notes", cache avalanche + breakdown + penetration + cluster + distributed lock, Nb
Kubedm builds kubernetes cluster
八、QOS队列调度与报文丢弃
The greatest romance of programmers~
随机推荐
提前布局6G赛道!紫光展锐发布《6G无界 有AI》白皮书
Automatic filling of spare parts at mobile end
关于一些小需求,用案例方式记录
Attribute based encryption simulation and code implementation (cp-abe) paper: ciphertext policy attribute based encryption
Ctfshow network lost track record (2)
Paging function (board)
How to build a foreign environment for the self-supporting number of express evaluation? How much does it cost?
(PMIC)全、半桥驱动器CSD95481RWJ PDF 规格
uniapp的进度条自定义
ctfshow 网络迷踪做题记录(1)
How to measure software architecture
Source insight uses shortcut keys
1945. sum of digits after string conversion
Query Oracle view creation statement and how to insert data into the view [easy to understand]
证券企业基于容器化 PaaS 平台的 DevOps 规划建设 29 个典型问题总结
Why on earth is it not recommended to use select *?
DELTA热金属检测器维修V5G-JC-R1激光测量传感器/检测仪原理分析
顶级“Redis 笔记”, 缓存雪崩 + 击穿 + 穿透 + 集群 + 分布式锁,NB 了
职场高薪 |「中高级测试」面试题
Uncaught Error:Invalid geoJson format Cannot read property ‘length‘ of undefind