当前位置:网站首页>MySQL interview arrangement

MySQL interview arrangement

2022-06-12 16:25:00 Future shadow

MySQL Implement underlying principles

Query structure

# The way 1:
SELECT ...,....,...
FROM ...,...,....
WHERE  Connection conditions of multiple tables 
AND  Filter conditions that do not contain group functions 
GROUP BY ...,...
HAVING  Contains filter conditions for group functions 
ORDER BY ... ASC/DESC
LIMIT ...,...

# The way 2:
SELECT ...,....,...
FROM ... JOIN ...
ON  Connection conditions of multiple tables 
JOIN ...
ON ...
WHERE  Filter conditions that do not contain group functions 
AND/OR  Filter conditions that do not contain group functions 
GROUP BY ...,...
HAVING  Contains filter conditions for group functions 
ORDER BY ... ASC/DESC
LIMIT ...,...
# among :
#(1)from: From which tables to filter 
#(2)on: When associating multiple table queries , Remove Cartesian product 
#(3)where: Conditions filtered from the table 
#(4)group by: Group by 
#(5)having: Sift through the statistical results again 
#(6)order by: Sort 
#(7)limit: Pagination 

SELECT Execution order

FROM -> WHERE -> GROUP BY -> HAVING -> SELECT  Field of  -> DISTINCT -> ORDER BY -> LIMIT
SELECT DISTINCT player_id, player_name, count(*) as num #  The order  5
FROM player JOIN team ON player.team_id = team.team_id #  The order  1
WHERE height > 1.80 #  The order  2
GROUP BY player.team_id #  The order  3
HAVING num > 2 #  The order  4
ORDER BY num DESC #  The order  6
LIMIT 2 #  The order  7

stay SELECT When the statement executes these steps , Each step produces a Virtual table , Then pass this virtual table into the next step as input

SQL Implementation principle of

SELECT It's to execute first FROM This step of . At this stage , If multiple tables are used for associated query , The following steps will be followed

  • 1、 First, through CROSS JOIN Find the Cartesian product , Equivalent to getting a virtual table vt(virtual table)1-1
  • 2、 adopt ON Screening , In virtual tables vt1-1 On the basis of , Get the virtual table vt1-2
  • 3、 Add external row . If we use the left connection 、 Right link or full link , It will involve external lines , That is, in virtual surface vt1-2 Add external rows based on , Get the virtual table vt1-3

Get the original data of the query data table ( Virtual table vt1), On this basis WHERE Stage , Filter to get virtual table vt2

Proceed again GROUP and HAVING Stage , Yes vt2 Grouping and grouping filtering , Get the intermediate virtual table vt3、vt4

Then enter SELECT and DISTINCT Stage , Get the middle virtual table vt5-1、vt5-2

Specify field sort ,ORDER BY Stage , Get the virtual table vt6

Take out the specified line record ,LIMIT Stage , Get the virtual table vt7

engine

InnBD engine

MySQL5.5 Then the default engine

  • Default transactional engine , Designed to handle a large number of short-term transactions , Ensure complete commit and rollback of transactions
  • Cache indexes and real data , High memory requirements , Memory size has an absolute impact on performance
  • Designed for maximum performance in handling huge amounts of data

MyISAM engine

MySQL5.5 The previous default storage engine

  • Provide a large number of features , Include full text index 、 Compress 、 Space function, etc . Transaction is not supported 、 Row-level locks 、 Can't recover safely after a crash
  • Fast access , There is no requirement for the integrity of the transaction or with SELECT、INSERT Mainly applications
  • There is additional constant storage for data statistics

Archive engine

For data archiving

features Support
compressed data Support
Backup / Time to recover ( Implement... In the server , Not in the storage engine ) Support
Geospatial data types Support
Encrypt data ( Implement... In the server ) Support
Update the statistics of the data dictionary Support
Lock granularity Row lock
Data caching I won't support it
Foreign keys I won't support it
Full text search index I won't support it
Clustered index I won't support it
Geospatial index I won't support it
Hash index I won't support it
The index buffer I won't support it
B Tree index I won't support it
MVCC I won't support it
Storage limits unlimited
transaction I won't support it
Cluster database I won't support it

Memory engine

Table in memory , The logical medium used is memory , Fast response . When mysqld Data is lost when the daemon crashes , In addition, the stored data is required to be in a format with constant data length

features :

  • It also supports hash index 、B+ Tree index
  • Than MyISAM One order of magnitude fast
  • The size of the table mainly depends on two parameters ,max_rows( Specify... When creating a table )、max_heap_table_size( Default 16MB)
  • Data files 、 Index files are stored separately
  • Data is easy to lose , Short life cycle

Use scenarios :

  • Less target data , Frequent visits . Store data in memory , If the data is too large, it will cause internal overflow
  • The data is provisional , Must be available immediately
  • Stored in Memory It doesn't matter that the data in the table is suddenly lost

Other engines

  • Merge engine : Manage multiple MyISAM A collection of tables made up of tables
  • NDB engine :MySQL Cluster specific storage engine , It's also called NDB Cluster Storage engine , It is mainly used for MySQL Cluster Distributed cluster environment

Comparison of common engines

characteristic MyISAMInnoDBMEMORYMERGENDB
Storage limits Yes 64TB Yes No, Yes
Transaction security Support
Locking mechanism Table locks Row lock Table locks Table locks Row lock
B Tree index Support Support Support Support Support
Hash index Support Support
Full-text index Support
Cluster index Support
Data caching Support Support Support
The index buffer Cache index , Don't cache data Cache index 、 data Support Support Support
Data can be compressed Support
Space use low high N/A low low
Memory usage low high secondary low high
Speed of batch insertion high low high high high
Support foreign keys Support

InnoDB Supplement to table

InnoDB Advantages of watch

  • Convenient operation 、 Improve database performance 、 Low maintenance cost
  • The server crashes due to hardware or software reasons , Then no additional operation is required after restarting the server ,InnoDB collapse The recovery function automatically finalizes the previously submitted content , Then undo the uncommitted process , After restart, continue to execute from the crash point
  • InnoDB The storage engine maintains buffer pools in main memory , High frequency data will be processed directly in memory . This caching method can be applied to a variety of information , Speed up the processing process
  • InnoDB It not only supports current reading and writing , It will also buffer the changed data to the data stream disk
  • Create or delete indexes without affecting performance and availability
  • When dealing with large amounts of data , InnoDB To two or morethings CPU, For maximum performance

InnoDB and ACID Model

1、 atom , Mainly involves InnoDB Business , And MySQL Related features mainly include :

  • Auto submit settings
  • COMMIT sentence
  • ROLLBACK sentence
  • operation INFORMATION_SCHEMA Table data in the library

2、 Uniformity , It mainly involves the internal protection of data from collapse InnoDB Treatment process , And MySQL Related features mainly include :

  • InnoDB Double write cache
  • InnoDB Crash recovery

3、 Isolation , The level applied to the transaction , And MySQL Related features mainly include :

  • Auto submit settings
  • SET ISOLATION LEVEL sentence
  • InnoDB Low level information of lock

4、 Durability , ACID The durability of the model mainly involves the factors that interact with the hardware configuration MySQL Software features . Due to the complexity and diversity of hardware turn , There are no specific rules to follow in terms of durability . And MySQL Related features are :

  • InnoDB Double write cache , adopt innodb_doublewrite Configuration Item Configuration
  • Configuration item innodb_flush_log_at_trx_commit
  • Configuration item sync_binlog
  • Configuration item innodb_file_per_table
  • Write cache of storage device
  • Backup battery cache of storage device
  • function MySQL Operating system of
  • Continuous power supply
  • Backup policy
  • For distributed or hosted applications , The most important thing is the location of hardware equipment and network conditions

InnoDB framework

  • Buffer pool : A portion of main memory , Used to cache used table and index data . Buffer pools make them often used Data can be obtained directly in memory , To speed up
  • Change cache : A special data structure , When the affected index page is not in the cache , Changing the cache will cache the helper Change of leading page . The index page is loaded into the cache pool when it is read by other operations , The cached changes are merged
  • adaptive hash index : Combine the load with enough memory , bring InnoDB Run like an in memory database , There is no need to reduce performance or reliability on transactions
  • Redo log cache : Store the data to be put into the redo log , Periodically brush log files to disk , Large redo log cache Enables large transactions to run normally without writing to disk
  • SYSTEM tablespace : Include InnoDB The data dictionary 、 Double write cache 、 Update cache and undo logs , It also includes tables and indexes data . Multi table sharing , System tablespaces are considered shared tablespaces
  • Double write cache : Located in the system tablespace , Used to write data pages refreshed from the cache pool . Only when refreshing and writing to the double write cache after ,InnoDB Will write the data page to the appropriate location
  • Undo log : A collection of transaction related undo records , Contains how to undo the most recent changes to the transaction
  • The tablespace of one file per table : Each individual tablespace is created in its own data file , Not in the system tablespace , Each table space consists of a separate .ibd Count According to the document , The file is created in the database directory by default
  • General tablespace : Use CREATE TABLESPACE Syntax to create shared InnoDB Table space . Common tablespaces can be created in MySQL Count Outside the directory, it can manage multiple tables and support all row format tables
  • Undo tablespace : Consists of one or more files containing undo logs
  • Temporary table space : Both user created temporary tablespaces and disk based internal temporary tables are created in temporary tablespaces
  • Redo log : Data structure based on disk , Use... During crash recovery , Used to correct data . During normal operation , The redo log encodes the requested data , These requests will change InnoDB Table data . After an unexpected crash , Incomplete changes will be made from The action is repeated during initialization

Business

Business : A set of logical operating units , To change data from one state to another

Transaction Principles : Ensure that all transactions are executed as a unit of work

  • When a transaction performs multiple operations , Or all transactions are committed (commit), These changes are permanently saved
  • Or database management system (DBMS) Abandon all changes , Transaction rollback to initial state

The transaction ACID characteristic

  • Atomicity (atomicity): Business is a An integral Work unit of , Or the modification succeeds , Or fail and roll back all
    • undo log( Rollback log ) Guarantee
  • Uniformity (consistency): Before and after the transaction , Data from a Legitimacy status Switch to another Legitimacy status
    • persistence + Atomicity + Isolation, Guarantee
  • Isolation, (isolation): The execution of one transaction cannot be interfered by other transactions . The operations and data pairs used within a transaction Concurrent Other transactions are isolated
    • MVCC( Multi version concurrency control ) Or the locking mechanism guarantees
  • persistence (durability): Once the transaction is committed , Changes to the data in the database are Permanence Of , Next, other operations and database failures should not have any impact on it
    • redo log( Redo log ) Guarantee

The state of the transaction

  • Activities (acitvie): The database operation corresponding to the transaction is in progress
  • Partial submission (partially committed): When the last operation in the transaction is completed , Since all operations are performed in memory , The impact is not No refresh to disk
  • Failure (failed): When the transaction is in Activities or Partial submission In the state of , Encountered some errors ( Error in the database itself 、 Operating system error 、 power failure ) And cannot continue , Or consider stopping the execution of the current transaction
  • suspend (aborted): The transaction is partially executed and becomes Failure state , You need to restore the operations in the modified transaction to the state before the transaction is executed , Undo the impact of failed transactions on the current database . The cancellation process is called Roll back , When the rollback is completed , The database is restored to the state before execution , The business is in suspend state
  • Submit (commited): When one is in Partial submission The transaction in status will all the modified data Sync to disk After that , The transaction is in Submit state

Problems caused by parallel transactions

MySQL The server allows multiple clients to connect , It means MySQL Multiple transactions can be processed at the same time

When processing multiple transactions at the same time , Dirty reading may occur (dirty read)、 It can't be read repeatedly (non-repeatable read)、 Fantasy reading (phantom read)

  • Dirty reading : To uncommitted data of other transactions
  • It can't be read repeatedly : The data read before and after is inconsistent
  • Fantasy reading : The number of records read before and after is inconsistent

Severity sort : Dirty reading > Not sortable > Fantasy reading

The isolation level of the transaction

SQL The standard proposes four isolation levels to avoid these phenomena , Higher isolation level , The less efficient

  • Read uncommitted (read uncommitted): When a transaction is not committed , The changes it makes can be seen by other transactions
  • Read the submission (read committed): After a transaction is committed , The changes it makes can only be seen by other transactions
  • Repeatable (repeatable read): Data seen during the execution of a transaction , It is consistent with the data seen when the transaction is started ,InnoDB Default isolation level
  • Serialization (serializable): Put a read-write lock on the record , When multiple transactions read and write this record , If there is a read-write conflict , The post accessed transaction must wait for the execution of the previous transaction to complete , In order to proceed

For different isolation levels , The possible phenomena of concurrent transactions are as follows

How are these four isolation levels achieved ?

  • Read uncommitted : Read the latest data directly

  • Read the submission Repeatable : adopt Read View Realization , Can be Read View Understood as a snapshot

    • Read the submission stay Before each statement is executed Will regenerate a Read View
    • Repeatable stay When starting a transaction Generate a Read View, Use this throughout the transaction Read View
  • Serialization : Add read-write lock

Read View stay MVCC Work in

Read View There are four important fields :

  • Create the Read View The business of affairs id
  • m_ids: establish Read View when , In the current database Active affairs The business of id list
  • min_trx_id: establish Read View when , In the current database Active affairs in Business id Minimum The business of
  • max_trx_id: No m_ids The maximum value in , It's about creating Read View When the current database should be given to the next transaction id value , It is also the largest transaction in the global transaction id value +1

For the use of InnoDB The database table of the storage engine , Its clustered index records contain the following two hidden columns :

  • trx_id: When a transaction changes a clustered index record , The transaction of this transaction id Recorded in the trx_id Hide in the column
  • roll_pointer: Every time a cluster index record is changed , Will write the records of the old version to undo In the log , The hidden column is a pointer , Point to each old version record , Through it, you can find the records before modification

Creating Read View after , Can be recorded in trx_id It is divided into three cases :

  • trx_id < min_trx_id: Indicates that the record of this version is being created Read View Generated by a previously committed transaction , Therefore, the record of this version is visible to the current transaction
  • trx_id > max_trx_id: Indicates that the record of this version is being created Read View Generated by the committed transaction , Therefore, the record of this version is applicable to the current transaction No so
  • min_trx <= trx_id <= max_trx: You need to determine whether you are creating a Read View Inside m_ids In the list
    • In the list , Indicates that the active transaction generating the version record is still active ( The transaction hasn't been committed yet ), This version records the current transaction No so
    • Not in the list , Indicates that the active transaction generating the version record has been committed , This version of the record is visible to the current transaction

Such passage Version chain The behavior of controlling concurrent transactions to access the same record is called MVCC( Multi version concurrency control )

Indexes

MySQL The official definition of : help MySQL Data structure for efficient data acquisition

The picture below is MySQL Structure diagram , Indexes and data are in the storage engine

Advantages and disadvantages of index

advantage

  • Improve the efficiency of data retrieval , Reduce the IO cost , The main reason for creating an index
  • Create unique index , Ensure the uniqueness of each row of data in the database table
  • In terms of reference integrity of data , The accelerometer is directly connected to the meter
  • Significantly reduce the time of grouping and sorting in queries , Reduce CPU Consume

shortcoming

  • Creating and maintaining indexes takes time
  • Indexes need to occupy disk space . In addition to the data space occupied by the data table , Each index also takes up a certain amount of physical space , Stored on disk
  • Reduce the speed of updating tables , When adding data in the table 、 Delete 、 When modifying , Indexes also need to be maintained dynamically

When ( No ) Need index

Need index

  • Fields have uniqueness restrictions , Such as commodity code
  • Often used for where The field of the query condition , If query reconciliation is not a field , Joint index can be established
  • Often used for GROUP BY and ORDER BY Field of , In this way, you don't need to sort again when querying , Because after indexing B+Tree The records in are sorted

There is no need to index

  • wheregroup byorder by Fields used in , The value of an index is to locate quickly , If you cannot locate a field, you usually do not need to create an index
  • There is a lot of duplicate data in the field , Such as gender field , men and women .MySQL There is also a query optimizer , When the query optimizer finds that a value has a high percentage of the data rows in the table , It generally ignores indexes , Perform a full table scan
  • When there is too little table data
  • Frequently updated fields , Due to maintenance B+Tree The order of , It takes so long to rebuild the index frequently , This process will affect the database performance

Classification of indexes

It can be classified and indexed from four perspectives

  • data structure :B+tree Indexes 、Hash Indexes 、Full-text Indexes
  • Physical storage : Cluster index ( primary key )、 Secondary indexes ( Secondary index )
  • Field properties : primary key 、 unique index 、 General index 、 Prefix index
  • Number of fields : Single index 、 Joint index

Data structure classification

From the perspective of data structure ,MySQL Common indexes are B+Tree Indexes 、Hash Indexes 、Full-Text Indexes

The index types supported by each storage engine are not necessarily the same , Here is MySQL Index types supported by common storage engines in

InnoDB Is in MySQL 5.5 Then it becomes the default MySQL Storage engine ,B+Tree The index type is also MySQL The storage engine uses the most index types

When the table is created ,InnoDB The storage engine will select different columns as indexes according to different scenarios :

  • If there is a primary key , By default, the primary key is used as the index key of the cluster index (key)
  • If there is no primary key , Select the first one that does not contain NULL The unique column of the value is used as the index key of the cluster index (key)
  • In the absence of both of the above ,InnoDB An implicit auto increment is automatically generated id Column as the index key of cluster index (key)

Other indexes are secondary indexes , Also known as a secondary index or non clustered index , The primary key index and secondary index created by default use B+Tree Indexes

The leaf node of the primary key index stores the actual data ; The leaf node of the secondary index stores the primary key value

Back to the table : First check the secondary index to find the one B+Tree The index of the value , Find the corresponding leaf node , Then get the primary key value , In the primary key index B+Tree The tree queries the corresponding leaf node , Then get the whole row of data , This process is called back to table , That means checking two B+Tree To find the data

Overlay index : When the data to be queried can be in the secondary index B+Tree Can be queried in the leaf node of ( Check the primary key value ), There is no need to check the primary key index , This process is called overlay indexing , That is, only one B+Tree You can find the data

Why? MySQL InnoDB choice B+Tree Data structure as index ?

B+Tree vs B Tree
B+Tree  Store data only in leaf nodes , and  B  Trees   The non leaf nodes of also store data , therefore  B+Tree  The data volume of a single node is smaller , On the same disk  I/O  Times , You can query more nodes ; in addition ,B+Tree  Leaf nodes are connected by double linked lists , fit  MySQL  Common range based sequential lookup in , and  B  Trees can't do this 

B+Tree vs  Binary tree 
 For having  N  individual   Leaf node  B+Tree, The search complexity is O(logdN), among d Indicates that the maximum number of child nodes allowed by the node is d individual 
 in application ,d Value is greater than 100 Of , This ensures that even if the data reaches tens of millions ,B+Tree The height of the remains at  3~4  Around the floor , A data query operation only needs to be done 3~4 This disk I/O The operation can query the target data 
 The number of child nodes of each parent node of a binary tree can only be 2 individual , It means that the search complexity is O(logN), Zhubi B+Tree A lot higher , Therefore, the target data retrieved by binary tree goes through the disk I/O More times 

B+Tree vs Hash
Hash It is very efficient when doing equivalent query , The search complexity is O(1), But it is not suitable for range query . This is also B+Tree Index is better than Hash The reason why table indexes have a broader application scenario 

Sort by physical storage

From a physical storage perspective , The index is divided into : Cluster index ( primary key )、 Secondary indexes ( Secondary index )

The difference between the two is :

  • Primary key index B+Tree The leaf node of is used to store the actual data , All complete user records are stored in the primary key index B+Tree In the leaf node of
  • Secondary index B+ Tree The leaf node of stores the primary key value , Not the actual data

The secondary index is used in the query , If the queried data can be queried in the secondary index , You don't have to go back to the table , This process is called overlay indexing ; If the queried data is no longer in the secondary index , Then the secondary index will be retrieved first , Find the corresponding leaf node , Get the primary key value , Then retrieve the primary key index , You can query the data , This process is called return table

Sort by field properties

From the perspective of field properties , The index is divided into : primary key 、 unique index 、 General index 、 Prefix index

  • primary key : Index based on primary key field , Tables are usually created together when they are created , A table has at most one primary key index , The value of an index column cannot have a null value
  • unique index : Based on the UNIQUE Index on field , A table can have multiple unique indexes , The value of the index column must be unique , But you can have an empty value
  • General index : An index based on a common field , Fields are not required to be primary keys , The field is not required to be UNIQUE
  • Prefix index : An index of the first few characters of a character type field , Instead of indexing the entire field , Prefix index can be established when the field type is char、varchar、binary、varbinary On the list of . The purpose of using prefix index is to reduce the storage space occupied by the index , Improve query efficiency

Sort by the number of fields

In terms of the number of fields , The index is divided into : Single index 、 Joint index ( Composite index )

Single index : An index built on a single column is called a single column index , Such as primary key index

Joint index : An index built on multiple columns is called a federated index

Left most matching principle : Match the index in the leftmost first way

The premise of using the index is that key Is ordered

The leftmost matching policy of the union index is invalid : When a range query is encountered (>、<、between、like), Will stop matching , The range column can use the union index , But the column after the range column cannot use the union index

Index push down optimization : During the joint index traversal , Judge the fields contained in the union index first , Filter out unqualified records directly , Reduce the number of times to return to the table

Improve indexing efficiency : When building a federated index , Put the fields with large discrimination first , In this way, the fields with a high degree of discrimination are more likely to be used SQL Use to

Discrimination is a field column The number of different values divided by the total number of rows in the table
District branch degree = d i s t t i n c t ( c o l u m n ) c o u n t ( ∗ ) Degree of differentiation = \frac{disttinct(column)}{count(*)} District branch degree =count()disttinct(column)

Index optimization

Common index optimization methods : Prefix index optimization 、 Coverage index optimization 、 The primary key index should be self incrementing 、 Prevent index invalidation

Prefix index optimization

Use the first few characters of a string in a field to build an index , Reduce the index field size , You can add an index page to store index values , Effectively improve the query speed of the index

Limitations of prefix indexing :

  • orber by Cannot use prefix index
  • The prefix index cannot be referenced as an overlay index

Coverage index optimization

Overlay index :SQL All fields queried in , In the index B+Tree You can find those indexes on the leaf nodes of , Query the records from the secondary index , It doesn't need to query the cluster index to get , You can avoid the operation of returning tables

The benefits of covering indexes : There is no need to query all the information including the whole line of records , It reduces a lot of I/O operation

The primary key index should be self incrementing

Build table , By default, the primary key index is set to auto increment

InnoDB Creating a primary key index defaults to a clustered index , The data is stored in B+Tree On the leaf node of . Each data in the same leaf node is stored in primary key order , Whenever a new piece of data is inserted , The database will insert it into the corresponding leaf node according to the primary key value

Use auto increment primary key : Each time new data is inserted, it will be added to the current inode in order , No need to move existing data , When the page is full , Will automatically open a new page . Every time Insert a new record , All are additional operations , No need to re move data , This method of inserting data once is very efficient

Use non self incrementing primary keys : Every time you insert a primary key, the index value is random , So every time you insert new data , It may be inserted into the middle of the existing data page , This will have to be moved to other data to satisfy the insertion of new data , You even need to copy data from one page to another , We usually call this situation Page splitting . Page splitting can cause a lot of memory fragmentation , The index structure is not compact , This will affect the query efficiency

The best index setting is NOT NULL

  • Index column exists NULL This will cause the optimizer to make index selection more complicated , More difficult to optimize . For example, when making index statistics ,count Will be omitted as NULL The line of
  • NULL Value is a meaningless value , But it takes up physical space , It will cause storage space problems ,InnoDB Default row storage format COMPACT, Will use 1 Byte space storage NULL List of values

Prevent index invalidation

Using an index does not mean that the index will be used when querying , So we know in our minds what can lead to index invalidation , So as to avoid writing out query statements with invalid indexes

Common index failures occur :

  • When using left or left-right fuzzy matching , That is to say like %xx perhaps like %xx% Both of these methods will cause index invalidation
  • The index column is calculated in the query criteria 、 function 、 Type conversion operation , In these cases, the index will be invalidated
  • In order to use the joint index correctly, we need to follow the leftmost matching principle , Otherwise, the index will be invalidated
  • stay WHERE clause , If in OR The condition column before is the index column , And in the OR The condition column after is not an index column , Then the index will fail

lock

The type of lock

According to the lock range , Can be divided into : Global lock 、 Table lock 、 Row lock

Global lock

 Use global locks  -> flush tables with read lock
 After execution , The entire database is only read-only , Other threads do the following , Will be blocked :
	- Add, delete and modify data , Such as insert、delete、update sentence 
	- Change the benchmarking structure , Such as alter table、drop table sentence 
 Release global lock  -> unlock tables

 Global lock application scenario ?
 It is mainly used for full database logical backup , This is done during the backup of the database , Not because of the update of data or table structure , The data of the backup file is not the same as expected 

 The disadvantage of adding a global lock ?
 The entire database is read-only . If there is a lot of data in the database , Backup will take a lot of time , During this period, the business can only read data , Instead of updating the data , Will cause business stagnation 

 Using global locks will affect the business , How to avoid ?
 If transaction supported by database engine supports   Repeatable isolation level , Then start the transaction before backing up the database , Will first create  Read View, Then this is used throughout the execution of the transaction  Read View, And because of  MVCC  Support for , During backup, the business can still update the data 

Table lock

MySQL There are table level locks inside : Table locks ; Metadata lock (MDL); Intent locks ;AUTO-INC lock 

 Table locks :
 Table level shared locks ( Read the lock ) -> lock tables t_student read;
 Table level exclusive lock ( Write lock ) -> lock tables t_stuent wirte;
 In addition to restricting the reading and writing of other threads , It also limits the subsequent read and write operations of this thread 
 Try to avoid using InnoDB Use table lock when engine tables , The particle size of the watch lock is too large , It will not affect the performance 

 Metadata lock (MDL):
 No need to show the use of MDL, When we operate on database tables , Will automatically add MDL
	- On a table  CRUD  In operation , Plus MDL Read the lock 
	- When you change the structure of a table , Plus MDL Write lock 
MDL This is to ensure that when the user executes  CRUD  In operation , Prevent other threads from making changes to the table structure 
	- When there are threads to execute  select  sentence ( Add MDL Read the lock ) Period , If another thread wants to change the table structure ( apply MDL Write lock ), Then it will be blocked , Until the end of execution  select  sentence ( Release MDL Read the lock )
	- When a thread changes the table structure ( Add MDL Write lock ) Period , If there are other threads executing  CRUD  operation ( apply MDL Read the lock ), Then it will be blocked , Until the table structure change is completed ( Release MDL Write lock )
MDL There's no need to show calls , So when was it released ? -> MDL It will not be released until the transaction is committed , During the execution of the transaction ,MDL Has been held 
 Why is the thread unable to apply MDL Write lock , Subsequent query operations that apply for read locks will be blocked ? ->  apply MDL Lock operations form a queue , The priority of acquiring write lock in the queue is higher than that of reading lock 

 Intent locks :
 The purpose of intention lock : Quickly determine whether any records in the table are locked 
	- In the use of InnoDB Add... To some records in the engine table   Shared lock   Before , You need to add a   Intention sharing lock 
	- In the use of InnoDB Add... To some records in the engine table   An exclusive lock   Before , You need to add a   Intent exclusive lock 
	- If there is no intention lock , So when you add an exclusive lock , You need to traverse all the records , Check whether the record has an exclusive lock , It's slow 
	- Intentional lock , So when you add an exclusive lock , Directly check whether the table intends to exclusive lock , No need to traverse the records in the table 
 Intention lock is a table lock , There is no conflict between intent locks ; It will conflict with the table lock ; It will not conflict with row level lock 
	
AUTO-INC lock :
 Special table locking mechanism , The lock is no longer a transaction and is released after it is committed , Instead, it will be released immediately after the insertion statement is executed 
 A transaction is holding  AUTO-INC  In the process of locking , Other transactions that insert statements into the table will be blocked , This ensures that when inserting data , By AUTO_INCREMENT Decorated fields are incremented 
 MySQL 5.1.22  Version start ,InnoDB  The storage engine provides a lightweight lock to realize self incrementing 
 	- When  innodb_autoinc_lock_mode = 0, Just use  AUTO-INC  lock 
 	- When  innodb_autoinc_lock_mode = 2, Use lightweight locks 
 	- When  innodb_autoinc_lock_mode = 1, Two locks are mixed , To determine the number of records to insert, a lightweight lock is used , When uncertain, use  AUTO-INC  lock 

Row-level locks

 Row level locks fall into three categories :
	-Record Lock( Record locks ): That is, just lock a record 
	-Gap Lock( Clearance lock ): Lock a range , Does not include the record itself 
	-Next-Key Lock:Record Lock + Gap Lock The combination of , Lock a range , Include the record itself 

Lock rule

Row level locking rules are complex , Different scenarios have different locking forms ,next-key lock In some scenarios, it will degenerate into a record lock or a gap lock

Different versions may have different locking rules , following MySQL Version is 8.0.26

  • Unique index equivalent query
    • Query record exists , Degenerate to Record locks
    • Query record does not exist , Degenerate to Clearance lock
  • Non unique index equivalent query
    • Query record exists , Extra Clearance lock
    • Query record does not exist , Degenerate to clearance lock

Locking rules for range queries with unique and non unique indexes differ in that

  • A unique index when some conditions are met , Degenerate to Clearance lock and Record locks
  • Non unique indexes do not degenerate

journal

Classification of logs

MySQL There are different types of logs , Used to store different types of logs : Slow query log 、 General query log 、 Error log 、 Binary log . stay MySQL8 After that, two kinds of logs were added : relay logs 、 Data definition statement log

  • Slow query log : Record all execution times over long_query_time Of all inquiries , It is convenient to optimize queries
  • General query log : Record the start time and end time of the index connection , And connect all instructions sent to the database service , The actual scenario of the recovery operation 、 Find the problem 、 Audits of database operations are helpful
  • Error log : Record MySQL Start of service 、 To run or stop MySQL Problems with service , It is convenient to understand the status of the server , So as to maintain the server
  • Binary log : Record all statements that change data , Used for data synchronization between master and slave servers 、 Lossless recovery of data in case of server failure
  • relay logs : Used for master-slave server architecture , A middleware file used by the slave server to store the binary log contents of the master server . Read the contents of the relay log from the server , To synchronize operations on the primary server
  • Data definition statement log : Record the metadata operation performed by the data definition statement

In addition to binary logs , Other logs are text files . By default , All logs were created on MySQL In the data directory

Binary log

Three formats

binlog Three formats of :Row、Statement、Mixed

  • Row: The default format , Don't record sql Statement context information , Save only the record and do not modify it ;
    • advantage :row level The log content of will record the details of each row of data modification very clearly . And there will be no stored procedures under certain circumstances , or function、trigger The call and trigger of cannot be copied correctly
  • Statement: Each one will modify the data sql It will be recorded in binlog in
    • advantage : You don't need to record every line change , Less binlog Log volume , save IO, Improve performance
  • Mixed: since 5.1.8 Versions began to be provided

Write mechanism

During transaction execution , Write the log to binlog cache, Transaction commit , And then binlog cache writes binlog In file . A business binlog Can't be taken apart , Ensure write once , The system will allocate a block of memory for each thread binlog cache

write and fsync The timing of , By the parameter sync_binlog control , The default is 0

  • 0: Every time a transaction is committed, only write, It is up to the system to decide when to execute fsync. Although the performance has been improved , But the machine is down ,page cache Inside binlog Will lose
  • 1: Every commit transaction is executed fsync, Like redo log The process of disc brushing is the same
  • N: Every time a transaction is committed write, But cumulatively N After a business fsync

In the presence of I0 When the bottleneck is , take sync_binlog Set to a larger value , Can improve performance . alike , If the machine goes down, the opportunity is lost recently N A business binlog journal

binlog And reolog contrast

  • redolog: Physical log ,InnoDB The storage engine layer generates , The record content is “ What changes have been made on a data page ”
  • binlog: Logic log ,MySQL Server layer , The record content is “ The original logic of a statement ”

Two-phase commit

During the execution of the update statement , Will record redolog、binlog Two logs , In basic transactions .

redolog During the execution of the transaction, you can continuously write ,binlog Write only when the transaction is committed

To solve the problem of logical consistency between logs ,InnoDB The storage engine uses a two-phase commit scheme

Master slave copy

effect

  • Read / write separation
  • The data backup
  • High availability

principle

In fact, the principle of master-slave synchronization is based on binlog For data synchronization . In the master-slave replication process , Will be based on 3 Two threads to operate , A main library thread , Two slave threads

  • Binary log dump thread (Binglog dump thread) It's a main library thread , When connecting from a library thread , The master library can send the binary log to the slave library , When the main library reads Events , Will be in Binlog Lock it up , Release the lock after reading
  • Slave Library I/O Threads connect to the main library , Send request update to main library Binlog. At this point from the library I/O The thread can read the binary log dump of the main library Binlog Update section , And copy it to the local relay log
  • Slave Library SQL The thread reads the relay log from the library , And execute the events in the log , Keep the data in the slave library synchronized with the master library

Simply put, there are three steps :

  1. Master Record the write operation to binlog
  2. Slave take Master Of binlog Copy to its trunk log
  3. Slave Redo events in relay log , Apply the changes to your own database .MySQL Replication is asynchronous and serial , Start replication from the access point after restart

Synchronous data consistency

Master slave synchronization requirements

  • Read the library 、 The data written to the library is consistent
  • Write data must be written to the read library
  • To read data, you must go to the Reading Library

The reason for the delay between master and slave

Under normal network conditions , The main source of the master-slave delay is the completion of the standby database receiving binlog And the time difference between the execution of the transaction

  • The machine performance of the slave library is worse than that of the master library
  • The pressure from the reservoir is high
  • Execution of big business

Direct manifestation of master-slave delay : The slave database consumes relay logs faster than the master database produces binlog It's slow

A scheme to reduce master-slave delay

  • Reduce the probability of concurrency of multithreaded large transactions , Optimize business logic
  • Optimize SQL, Avoid slow SQL, Reduce batch operations , It is recommended to write scripts to update-sleep It's done in this way
  • Improve slave machine configuration , Reduce the main database write binlog And reading from the library binlog Poor efficiency
  • Try to use a short link , That is, the distance between the master database and the slave database server should be as short as possible , Increase port bandwidth , Reduce binlog Network delay of transmission
  • The real-time businesses are forced to go to the main database , Only do disaster recovery from the library 、 Backup

Data consistency problem solving

If the data of the operation is stored in the same database , When updating data , Write lock can be applied to records , In this way, data inconsistency will not occur when reading , But the slave library is only used for backup , No read / write separation 、 Share the reading pressure of the main library

In the case of separation of reading and writing , Solve the problem of inconsistent data in master-slave synchronization , It is to solve the problem of data replication between master and slave . If data consistency is divided from "if" to "strong" , Yes 3 Copy mode in : Asynchronous replication 、 Semi-synchronous replication 、 Group replication

Asynchronous replication

Semi-synchronous replication

Group replication

Asynchronous replication 、 Semi synchronous replication cannot ultimately guarantee data consistency

Group replication technology ,MRG(MySQL Group Replication), On MySQL stay 5.7.17 Introduced a new data replication technology , be based on Paxos Protocol state machine replication

MGR How to work ?
 Combine multiple nodes into a replication group , In the process of reading and writing (RW) In business , It needs to pass the consistency protocol layer , When the number of agreed nodes is greater than (N/2+1) Can only be submitted , For read-only (RO) Transactions do not require intra group consent , direct COMMIT that will do 

 There are multiple nodes in a replication group , Each maintains its own copy of the data , Atomic messages and globally ordered messages are implemented in the consistency protocol layer , So as to ensure the consistency of data within the group 

Reference material

MySQL Database tutorial ,mysql The installation to mysql senior , strong ! hard !_ Bili, Bili _bilibili

The illustration MySQL Introduce | Kobayashi coding (xiaolincoding.com)

Link to the original text

原网站

版权声明
本文为[Future shadow]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/163/202206121621512853.html