当前位置：网站首页>Memory optimization table mot management

Persistence refers to long-term data protection （ Also known as disk persistence ）. Persistence means that stored data is not subject to any form of degradation or corruption , Therefore, the data will not be lost or damaged . Persistence ensures that in case of planned downtime （ For example, maintenance ） Or unplanned collapse （ For example, power failure ） Post data and MOT The engine returns to a consistent state .

Memory storage is volatile 、 Power is needed to maintain stored information . On the other hand , Disk storage is nonvolatile , This means that it does not require power to maintain stored information , So don't worry about power failure .MOT Use both types of storage . All data is stored in memory , At the same time, persistent transactional changes to disk , And keep frequent regular MOT checkpoint , To ensure data recovery in case of shutdown .

Users must ensure that there is enough disk space for logging and checkpoint operations . Checkpoints use separate drives , By reducing the number of disks I/O Load to improve performance .

Reference resources MOT key technology Learn how to be in MOT Persistence in the engine .

To set persistence ：

To ensure strict consistency , Please be there. postgresql.conf Parameters in the configuration file sync_commit Configure to on.

MOT Of WAL Redo logs and checkpoints enable persistence , More on this below .

1.1 MOT logging ：WAL Redo log

To ensure durability ,MOT Full integration openGauss Of WAL Mechanism , adopt openGauss Of XLOG Interface persistence WAL Record . It means , Every time MOT Addition of records 、 Updates and deletions are recorded in WAL in . Ensure that the latest data state can be regenerated and restored from this nonvolatile log . for example , If you add... To the table 3 That's ok , Deleted 2 That's ok , Updated 1 That's ok , Then... Will be recorded in the log 6 Entries .

MOT Logging and openGauss Other records of the disk table are written to the same WAL in .

MOT Only the operations in the transaction commit phase are recorded .

MOT Only updated incremental records are recorded , To minimize the amount of data written to disk .

During recovery , Load data from the last known or specific checkpoint ; And then use WAL Redo log completes the data changes that have occurred since that point .

WAL Redo log will keep all table row modifications , Until the checkpoint is executed （ As mentioned above ）. Then you can truncate the log , To reduce recovery time and save disk space .

Be careful ： To ensure that the log IO Equipment will not become a bottleneck , Log files must be placed on a drive with low latency .

1.2 MOT Log type

Two synchronous transaction log options and one asynchronous transaction log option are supported （ standard openGauss The disk engine also supports these options ）.MOT It also supports synchronized group commit logging and NUMA-aware Optimize , As follows .

According to your configuration , Implement the following types of logging ：

Synchronous redo logging
The synchronous redo logging option is the simplest 、 The strictest redo logger . When a client application commits a transaction , Transaction redo entries are recorded in WAL Redo log , As shown below ：
1. When a transaction is in progress , It's stored in MOT In the memory .
2. After the transaction completes , The client application sends Commit command , The transaction is locked , Then write to... On disk WAL Redo log . When transaction log entries are written to the log , The client application is still waiting for a response .
3. Once the entire buffer of the transaction is written to the log , Just change the data in memory , Then commit the transaction . After the transaction is committed , Notifies the client application that the transaction is complete .
  summary
  The synchronous redo logging option is the safest 、 The strictest , Because it ensures that the client application and each transaction commit WAL Full synchronization of redo log entries , This ensures overall persistence and consistency , And never lose data . This logging option prevents client applications from marking transactions as successful when they have not been persisted to disk .
  The disadvantage of the synchronous redo logging option is , It is the slowest logging mechanism of the three options . Because the client application must wait for all data to be written to disk , And too many disk writes slow down the database .
Group synchronization redo logging
The group synchronous redo logging option is very similar to the synchronous redo logging option , It ensures complete persistence , Never lose data , And ensure that the client application and WAL Full synchronization of redo log entries . The difference is , The group synchronous redo logging option writes the transaction redo entry group to the disk at the same time WAL Redo log , Instead of writing every transaction at commit time . Using group synchronous redo logging can reduce disk costs I/O Number , To improve performance , Especially when running heavy workloads .
MOT The engine runs transactions according to the core NUMA Slots automatically group transactions , Use NUMA Aware optimization to perform synchronization of group commit records .
of NUMA-aware More information about memory access , see also NUMA-aware Distribution and affinity .
When a transaction is committed , A set of entries is recorded in WAL Redo log , As shown below ：
1. When a transaction is in progress , It's stored in memory .MOT The engine is based on the core that runs the transaction NUMA Slots group transactions in buckets . All transactions running in the same slot are grouped together , And multiple groups will be populated in parallel according to the core of the transaction .
  such , Write the transaction to WAL More effective , Because all buffers from the same slot are written to disk together .
  Be careful ： Each thread is in a single core that belongs to a single slot /CPU Up operation , Each thread is only written in the slot of its own running core .
2. When the transaction is completed and the client application sends Commit After the command , Transaction redo log entries will be serialized with other transactions in the same group .
3. When a specific set of transactions meets the configuration conditions , Such as Redo log （MOT） Number of committed transactions or timeout as described in section , Transactions in this group will be written to the disk WAL in . When these log entries are written to the log , The client application that issued the submit request is waiting for a response .
4. once NUMA-aware All transaction buffers in the group are written to the log , All transactions in this group will make the necessary changes to the memory storage , And inform the client that these transactions have been completed .
  summary
  The group synchronous redo logging option is an extremely secure and rigorous logging option , Because it ensures that client applications and WAL Full synchronization of redo log entries , This ensures overall persistence and consistency , And never lose data . This logging option prevents client applications from marking transactions as successful when they have not been persisted to disk .
  This option has fewer disk writes than the synchronous redo logging option , This may mean it's faster . The disadvantage is that transactions are locked for longer . In the same NUMA All transactions in memory are written to the disk WAL Before redoing the log , They are always locked .
  Whether to use this option depends on the type of transaction workload . for example , More transactions are beneficial to this system . For systems with few transactions , Disk writes are also small , Therefore, it is not recommended to use .
Asynchronous redo logging
The asynchronous redo logging option is the fastest logging method , But it does not ensure that data is not lost , Some data that is still in the buffer and has not been written to disk may be lost in the event of a power failure or database crash . When a client application commits a transaction , Transaction redo entries will be recorded in the internal buffer , And write to the disk according to the pre configured time interval . Client applications do not wait for data to be written to disk . It will proceed to the next transaction . This is why asynchronous redo logging is the fastest .
When a client application commits a transaction , Transaction redo entries are recorded in WAL Redo log , As shown below ：
1. When a transaction is in progress , It's stored in MOT In the memory .
2. When the transaction is completed and the client application sends Commit After the command , Transaction redo entries will be written to the internal buffer , But not yet written to disk . And change the MOT Memory data , And notify the client application that the transaction has been committed .
3. The redo log thread running in the background collects all cached redo log entries at pre configured intervals , And write them to disk .
  summary
  The asynchronous redo logging option is the fastest logging option , Because it does not require client applications to wait for data to be written to disk . Besides , Group and redo many transactions together , So as to reduce MOT Engine speed disk I/O Number .
  The disadvantage of the asynchronous redo logging option is that it does not ensure that data will not be lost in the event of a crash or failure . Data submitted but not yet written to disk is not persistent at the time of submission , Therefore, it cannot be recovered in case of failure . The asynchronous redo logging option is for those willing to sacrifice data recovery （ Uniformity ） Best for applications that are not performance .

1.3 Configuration log

standard openGauss The disk engine supports two synchronous transaction log options and one asynchronous transaction log option .

Configure logging

stay postgresql.conf In the configuration file sync_commit (On = Synchronous) Parameter specifies whether to perform synchronous or asynchronous transaction logging .
In the redo log section mot.conf In the configuration file , take enable_redo_log Parameter set to True.

If the synchronization mode for transaction logging is selected （ As mentioned above ,synchronous_commit = on）, It's in mot.conf In the configuration file enable_group_commit Specified in parameter Group Synchronous Redo Logging Option or Synchronous Redo Logging Options . If you choose Group Synchronous Redo Logging, Must be in mot.conf The following thresholds are defined in the file , Decide when to record a set of transactions in WAL in .

group_commit_size： Number of transactions committed in a group . for example ,16 When in the same group 16 When their transactions were committed by the client , Aimed at 16 Each of the transactions , On disk WAL Write an entry in the redo log .
group_commit_timeout： Timeout time , The unit is millisecond . for example ,10 It means that 10 In milliseconds , For the same set by the client application in the recent 10 Every transaction committed in milliseconds , On disk WAL Write an entry in the redo log .

explain ： More about configuration , see also Redo log （MOT）.

1.4 MOT checkpoint

A checkpoint is a point in time . At this point in time , All the data of the table row is saved in a file on persistent storage , In order to create a complete and persistent database image . This is a snapshot of data at a certain point in time .

Checkpoints reduce the amount of... That must be replayed to ensure persistence WAL Number of redo log entries , In order to shorten the recovery time of the database . Checkpointing also reduces the storage space required to hold all log entries .

If there are no checkpoints , So in order to restore the database , all WAL Redo entries must be replayed from the start time , It may take days or weeks , It depends on the number of records in the database . Checkpoints record the current state of the database , And allow old redo entries to be discarded .

Checkpoints are in the recovery scenario （ Especially cold start ） It's essential . First , Load data from the last known or specific checkpoint ; And then use WAL Complete the data changes that have occurred since .

for example , If the same table row is modified 100 Time , Then... Will be recorded in the log 100 Entries . When checkpoints are used , Even if the table row is modified 100 Time , Checkpoints can also be recorded at one time . After recording checkpoints , Recovery can be performed based on this checkpoint , And you only need to play what has happened since the checkpoint WAL Redo log entries .

2.MOT recovery

MOT The main goal of recovery is to recover from planned downtime （ For example, maintenance ） Or unplanned collapse （ For example, after power failure ） after , Put the data and MOT The engine returns to a consistent state .

MOT Recovery is with openGauss Automatic recovery of the rest of the database , And fully integrated into openGauss Recovery process （ Also known as cold start ）.

MOT Recovery consists of two phases ：

Checkpoint recovery ： You must load data into memory rows and create indexes , Recover data from the latest checkpoint file on disk .

WAL Redo log recovery ： After using checkpoints from checkpoint recovery , Records that must be added to the log after playback , from WAL Restore the latest data in the redo log （ Not captured in checkpoint ）.

openGauss Manage and trigger WAL Redo log recovery .

Configure recovery .
although WAL Recovery is performed serially , However, checkpoint recovery can be configured to run in a multi-threaded manner （ That is, multiple worker threads run in parallel ）.
stay mot.conf Configuration in file checkpoint_recovery_workers Parameters , see recovery （MOT） Description in .

3.MOT Replication and high availability

because MOT Integrated into the openGauss in , And use or support its replication and high availability , therefore ,MOT The original function is to support synchronous replication and asynchronous replication .

openGauss gs_ctl Tools for usability control and database operations . This includes gs_ctl Switch 、gs_ctl Fail over 、gs_ctl Construction, etc .

For more information , Please see the openGauss Tool reference .

Configure replication and high availability .
Please refer to openGauss The related documents .

4.MOT memory management

For planning and fine tuning, see MOT Memory and storage planning and MOT To configure .

5.MOT VACUUM clear

Use VACUUM Garbage collection , And selectively analyze the database , As shown below .

【Postgres】
stay Postgres in ,VACUUM Used to reclaim the storage space occupied by dead tuples . In the normal Postgres In operation , Deleted tuples or tuples invalidated due to update will not be physically deleted from the table . Only by VACUUM clear . therefore , It needs to be carried out on a regular basis VACUUM, Especially on frequently updated tables .
【MOT Expand 】
MOT There is no need for periodic VACUUM operation , Because the new element group will reuse invalid tuples and empty tuples . Only when MOT The size of has decreased dramatically , And don't plan to return to the original size , That's what we need VACUUM operation .
for example , Applications periodically （ Such as once a week ） Insert new data while deleting a large number of table data , It will take a few days , And not necessarily the same number of lines . under these circumstances , have access to VACUUM.
Yes MOT Of VACUUM Operations are always converted to... With exclusive table locks VACUUM FULL.
Supported syntax and restrictions
Activate according to the specification VACUUM operation .
```
VACUUM [FULL | ANALYZE] [ table ]; 
```
Only support FULL and ANALYZE VACUUM Two types of .VACUUM The operation can only be applied to the whole MOT Conduct .
The following are not supported Postgres VACUUM Options ：
- FREEZE
- VERBOSE
- Column specification
- LAZY Pattern （ Partial table scan ）
Besides , The following features are not supported ：
- AUTOVACUUM

6.MOT Statistics

Statistics are mainly used for performance analysis or debugging . In the production environment , Usually don't open them （ The default is off ）. Statistics are mainly used by database developers , Database users use less .

Has a certain impact on Performance , Especially for servers . The impact on users is negligible .

Statistics are saved in the database server log . The log is located at data In the folder , Name it postgresql-DATE-TIME.log.

For detailed configuration options , see also Statistics （MOT）.

7.MOT monitor

All syntax support monitored is based on Postgres Of FDW surface , Include the following table or index sizes . Besides , There are also for monitoring MOT Special functions for memory consumption , Include MOT Global memory 、MOT Local memory and a single client session .

7.1 Table and index sizes

By querying pg_relation_size To monitor the size of tables and indexes .

for example ：

data size

select pg_relation_size('customer');

Indexes

select pg_relation_size('customer_pkey');

7.2 MOT Global memory details

Check MOT Global memory size , Mainly data and index .

select * from mot_global_memory_detail();

give the result as follows .

numa_node  | reserved_size        | used_size 
----------------+----------------+------------- 
-1            | 194716368896      | 25908215808 
0             | 446693376         | 446693376 
1             | 452984832         | 452984832 
2             | 452984832         | 452984832 
3             | 452984832         | 452984832 
4             | 452984832         | 452984832 
5             | 364904448         | 364904448 
6             | 301989888         | 301989888 
7             | 301989888         | 301989888

among ,

-1 Is the total memory .
0–7 by NUMA Memory nodes .

7.3 MOT Local memory details

Check MOT Local memory size , Including session memory .

select * from mot_local_memory_detail();

give the result as follows .

numa_node  | reserved_size      | used_size    
----------------+----------------+------------- 
-1            | 144703488       | 144703488 
0             | 25165824        | 25165824 
1             | 25165824        | 25165824 
2             | 18874368        | 18874368 
3             | 18874368        | 18874368 
4             | 18874368        | 18874368 
5             | 12582912        | 12582912 
6             | 12582912        | 12582912 
7             | 12582912        | 12582912

among ,

-1 Is the total memory .
0–7 by NUMA Memory nodes .

7.4 Session memory

Session managed memory from MOT Get... From local memory .

All active sessions （ Connect ） The memory usage of can be queried through the following .

select * from mot_session_memory_detail();

give the result as follows .

sessid                   | total_size | free_size | used_size 
---------------------------------––––––-+-----------+----------+---------- 
1591175063.139755603855104 | 6291456    | 1800704   | 4490752

among ,

total_size： Memory allocated to the session .
free_size： Unused memory .
used_size： Memory in use .

DBA The local memory state used by the current session can be determined by the following query .

select * from mot_session_memory_detail()  
 where sessid = pg_current_sessionid();

give the result as follows .

8.MOT Error message

Errors can be caused by a variety of scenarios . All errors are recorded in the database server log file . Besides , User related errors are used as queries 、 Part of the response of a transaction or stored procedure execution or database management operation is returned to the user .

Errors reported in the server log include functions 、 Entity 、 Context 、 Error message 、 Error description and severity .
Errors reported to users are translated into standard PostgreSQL Error code , Maybe by MOT Specific messages and descriptions make up .

Error message 、 The error description and error code are shown below . The error code is actually an internal code , Do not record or return to the user .

8.1 Error writing log file

All errors are recorded in the database server log file . The following lists the errors written to the database server log file but not returned to the user . The log is located at data In the folder , Name it postgresql-DATE-TIME.log.

surface 1 Write only log file error

Log message	Internal error code
Error code denoting success	MOT_NO_ERROR 0
Out of memory	MOT_ERROR_OOM 1
Invalid configuration	MOT_ERROR_INVALID_CFG 2
Invalid argument passed to function	MOT_ERROR_INVALID_ARG 3
System call failed	MOT_ERROR_SYSTEM_FAILURE 4
Resource limit reached	MOT_ERROR_RESOURCE_LIMIT 5
Internal logic error	MOT_ERROR_INTERNAL 6
Resource unavailable	MOT_ERROR_RESOURCE_UNAVAILABLE 7
Unique violation	MOT_ERROR_UNIQUE_VIOLATION 8
Invalid memory allocation size	MOT_ERROR_INVALID_MEMORY_SIZE 9
Index out of range	MOT_ERROR_INDEX_OUT_OF_RANGE 10
Error code unknown	MOT_ERROR_INVALID_STATE 11

8.2 Error returned to user

The following lists the errors written to the database server log file and returned to the user .

MOT Use return code （Return Code,RC） return Postgres Standard error code to package . some RC This will cause an error message to be generated to the user who is interacting with the database .

MOT Return from inside Postgres Code （ See below ） To the database package , Database encapsulation is based on standard Postgres Behavior responds to it .

explain ： In the message %s、%u、%lu Refers to the corresponding error message （ Such as query 、 Table name or other information ）.
%s： character string
%u： Numbers
%lu： Numbers

surface 2 Errors returned to the user and recorded in the log file

The short message returned to the user / Long description	Postgres Code	Internal error code
Success. Denotes success	ERRCODESUCCESSFUL COMPLETION	RC_OK = 0
Failure Unknown error has occurred.	ERRCODE_FDW_ERROR	RC_ERROR = 1
Unknown error has occurred. Denotes aborted operation.	ERRCODE_FDW_ERROR	RC_ABORT
Column definition of %s is not supported. Column type %s is not supported yet.	ERRCODE_INVALID_COLUMN_DEFINITION	RC_UNSUPPORTED_COL_TYPE
Column definition of %s is not supported. Column type Array of %s is not supported yet.	ERRCODE_INVALID_COLUMN_DEFINITION	RC_UNSUPPORTED_COL_TYPE_ARR
Column size %d exceeds max tuple size %u. Column definition of %s is not supported.	ERRCODE_FEATURE_NOT_SUPPORTED	RC_EXCEEDS_MAX_ROW_SIZE
Column name %s exceeds max name size %u. Column definition of %s is not supported.	ERRCODE_INVALID_COLUMN_DEFINITION	RC_COL_NAME_EXCEEDS_MAX_SIZE
Column size %d exceeds max size %u. Column definition of %s is not supported.	ERRCODE_INVALID_COLUMN_DEFINITION	RC_COL_SIZE_INVLALID
Cannot create table. Cannot add column %s; as the number of declared columns exceeds the maximum declared columns.	ERRCODE_FEATURENOT SUPPORTED	RC_TABLE_EXCEEDSMAX DECLARED_COLS
Cannot create index. Total column size is greater than maximum index size %u.	ERRCODE_FDW_KEYSIZE EXCEEDS_MAX_ALLOWED	RC_INDEX_EXCEEDS_MAX_SIZE
Cannot create index. Total number of indexes for table %s is greater than the maximum number of indexes allowed %u.	ERRCODE_FDW_TOOMANY INDEXES	RC_TABLE_EXCEEDS_MAX_INDEXES
Cannot execute statement. Maximum number of DDLs per transaction reached the maximum %u.	ERRCODE_FDW_TOOMANY DDL_CHANGESIN TRANSACTIONNOT ALLOWED	RC_TXN_EXCEEDS_MAX_DDLS
Unique constraint violation Duplicate key value violates unique constraint \“%s\“”. Key %s already exists.	ERRCODEUNIQUE VIOLATION	RC_UNIQUE_VIOLATION
Table \“%s\” does not exist.	ERRCODE_UNDEFINED_TABLE	RC_TABLE_NOT_FOUND
Index \“%s\” does not exist.	ERRCODE_UNDEFINED_TABLE	RC_INDEX_NOT_FOUND
Unknown error has occurred.	ERRCODE_FDW_ERROR	RC_LOCAL_ROW_FOUND
Unknown error has occurred.	ERRCODE_FDW_ERROR	RC_LOCAL_ROW_NOT_FOUND
Unknown error has occurred.	ERRCODE_FDW_ERROR	RC_LOCAL_ROW_DELETED
Unknown error has occurred.	ERRCODE_FDW_ERROR	RC_INSERT_ON_EXIST
Unknown error has occurred.	ERRCODE_FDW_ERROR	RC_INDEX_RETRY_INSERT
Unknown error has occurred.	ERRCODE_FDW_ERROR	RC_INDEX_DELETE
Unknown error has occurred.	ERRCODE_FDW_ERROR	RC_LOCAL_ROW_NOT_VISIBLE
Memory is temporarily unavailable.	ERRCODE_OUT_OF_LOGICAL_MEMORY	RC_MEMORY_ALLOCATION_ERROR
Unknown error has occurred.	ERRCODE_FDW_ERROR	RC_ILLEGAL_ROW_STATE
Null constraint violated. NULL value cannot be inserted into non-null column %s at table %s.	ERRCODE_FDW_ERROR	RC_NULL_VIOLATION
Critical error. Critical error: %s.	ERRCODE_FDW_ERROR	RC_PANIC
A checkpoint is in progress – cannot truncate table.	ERRCODE_FDW_OPERATION_NOT_SUPPORTED	RC_NA
Unknown error has occurred.	ERRCODE_FDW_ERROR	RC_MAX_VALUE
<recovery message>	-	ERRCODE_CONFIG_FILE_ERROR
<recovery message>	-	ERRCODE_INVALIDTABLE DEFINITION
Memory engine – Failed to perform commit prepared.	-	ERRCODE_INVALIDTRANSACTION STATE
Invalid option <option name>	-	ERRCODE_FDW_INVALIDOPTION NAME
Invalid memory allocation request size.	-	ERRCODE_INVALIDPARAMETER VALUE
Memory is temporarily unavailable.	-	ERRCODE_OUT_OFLOGICAL MEMORY
Could not serialize access due to concurrent update.	-	ERRCODE_T_RSERIALIZATION FAILURE
Alter table operation is not supported for memory table. Cannot create MOT tables while incremental checkpoint is enabled. Re-index is not supported for memory tables.	-	ERRCODE_FDW_OPERATIONNOT SUPPORTED
Allocation of table metadata failed.	-	ERRCODE_OUT_OF_MEMORY
Database with OID %u does not exist.	-	ERRCODE_UNDEFINED_DATABASE
Value exceeds maximum precision: %d.	-	ERRCODE_NUMERIC_VALUEOUT OF_RANGE
You have reached a maximum logical capacity %lu of allowed %lu.	-	ERRCODE_OUT_OFLOGICAL MEMORY