当前位置:网站首页>12 MySQL interview questions that you must chew through to enter Alibaba

12 MySQL interview questions that you must chew through to enter Alibaba

2022-07-05 14:49:00 InfoQ

Limited by space, this article is only written 12 Taoist classics MySQL Interview questions , Like the others Redis,SSM frame , Algorithm , The interview questions of technical stacks such as Jiwang will be continuously updated later , It's personal 1000 Yu Dao's interview eight part essay will be put at the end of the essay for everyone to whore for nothing , Students who need to brush questions in the recent interview can turn directly to the end of the article to get .

1.  Can you say myisam  and  innodb The difference between ?

myisam Engine is 5.1 The default engine before version , Support full text search 、 Compress 、 Space function, etc , But transaction and row level locks are not supported , Therefore, it is generally used in scenarios with a large number of queries and a small number of inserts , and myisam Foreign key not supported , And the index and data are stored separately .

innodb It's based on clustering index , and myisam Instead, it supports transactions 、 Foreign keys , And through MVCC To support high concurrency , Index and data stored together .

2.  Under the said mysql What's the index of , What are clustered and nonclustered indexes ?

According to the data structure, the index mainly contains B+ Trees and Hash Indexes .

Suppose we have a watch , The structure is as follows :

create table user(
 id int(11) not null,
 age int(11) not null,
 primary key(id),
 key(age)
);

B+ A tree is a sequential storage structure with small left and large right , Nodes only contain id Index columns , Leaf nodes contain index columns and data , This kind of index method that data and index are stored together is called cluster index , A table can only have one clustered index . Suppose there is no primary key defined ,InnoDB Will choose a unique non empty index instead of , If not, a primary key is implicitly defined as the cluster index .

null
This is the primary key cluster index storage structure , So what is the structure of a nonclustered index ? Nonclustered index ( Secondary indexes ) The primary key is saved id value , This and myisam The data is stored in different addresses .

null
Final , Let's take a picture of InnoDB and Myisam The difference between clustered and nonclustered indexes

null

3.  Do you know what is overlay index and return table ?

Overlay index refers to a query in which , If an index contains or overwrites the values of all fields to be queried , We call it overlay index , Instead of looking back at the table .

And it's a query to determine whether it's covered , We just need explain sql Look at the sentence Extra Is the result of “Using index” that will do .

On the surface of the above user For example , Let's add another name Field , Then try to do some queries .

explain select * from user where age=1; // Of the query name Cannot get from index data
explain select id,age from user where age=1; // You can get it directly from the index

4.  What are the types of locks

mysql The lock is divided into
Shared lock
and
Exclusive lock
, Also called read lock and write lock .

Read locks are shared , Can pass lock in share mode Realization , You can only read but not write .

Writing lock is exclusive , It blocks other write and read locks . In terms of particle size , Can be divided into
Table locks
and
Row lock
Two kinds of .

The table lock locks the entire table and blocks all read and write operations to the table by other users , such as alter The table is locked when the table structure is modified .

Row locks can be divided into
Optimism lock
and
Pessimistic locking
, Pessimism lock can pass through for update Realization , Optimistic lock is implemented by version number .

5.  Can you talk about the basic characteristics and isolation level of transactions ?

Basic characteristics of transaction ACID Namely :

Atomicity
It means that all operations in a transaction are successful , All or nothing .

Uniformity
It means that the database always changes from one consistent state to another . such as A Transfer to B100 Yuan , Suppose in the middle sql System crash during execution A And it won't lose 100 block , Because the transaction was not committed , Changes will not be saved to the database .

Isolation,
It refers to the modification of a transaction before it is finally committed , Is invisible to other transactions .

persistence
Once a transaction is committed , The changes will be permanently saved to the database .

And isolation has 4 Isolation levels , Namely :

read uncommit
  Read uncommitted , You may read uncommitted data from other transactions , It's also called dirty reading .

The user should have read id=1 Users of age Should be 10, As a result, transactions that have not been committed by other transactions are read , Results read results age=20, This is dirty reading .

null
read commit
  Read submitted , The results of the two reads are inconsistent , It's called non repeatable reading .

Non repeatable reading solves the problem of dirty reading , He will only read transactions that have been committed .

User opens transaction read id=1 user , Query to age=10, Read the result again =20, In the same transaction, the same query reads different results, which is called non repeatable read .

null
repeatable read
  Repeatable , This is a mysql Default level for , It's the same result every time you read it , But it's possible to produce fantasy reading .

serializable
  Serial , You don't usually use it , He locks every row of data read , Can lead to a lot of timeout and lock contention problems .

6.  So what do you mean by phantom reading , What is? MVCC?

On unreal reading , First of all, understand MVCC,MVCC It's called multi version concurrency control , In fact, it is a snapshot of the data stored at a certain time node .

We actually hide two columns per row of data , Creation time, version number , Be overdue ( Delete ) Time version number , Every time you start a new business , The version number will increase automatically .

Or take the top one user For example , Suppose we insert two pieces of data , They should actually look like this .

null
At this time, let's assume that Xiao Ming performs the query , here current_version=3

select * from user where id<=3;

meanwhile , At this time, Xiaohong opens the transaction to modify id=1 The record of ,current_version=4

update user set name=' Zhang Sansan ' where id=1;

The result of successful execution is like this
null
If there is little black in the deletion id=2 The data of ,current_version=5, The result of the implementation is like this .
null
because MVCC The principle is to find that the created version is less than or equal to the current transaction version , The deletion version is empty or greater than the current transaction version , Xiao Ming's real query should be like this

select * from user where id<=3 and create_version<=3 and (delete_version>3 or delete_version is null);

So Xiaoming finally found out id=1 The name is still ' Zhang San ', also id=2 You can also find . This is
In order to ensure that the data read by the transaction exists before the transaction starts , Either the transaction inserts or modifies itself
.

understand MVCC principle , It's a lot easier for us to talk about fantasy reading . Take a common scene , When users register , Let's check if the user name exists , Insert if it doesn't exist , Suppose the user name is the only index .

  • Xiao Ming opens up business current_version=6 The query name is ' Wang Wu ' The record of , Discover that there is no .
  • Xiaohong opens the business current_version=7 Insert a piece of data , So it turns out :

null
  • Xiao Ming executes insert name ' Wang Wu ' The record of , Unique index conflict found , Can't insert , This is unreal reading .

7.  that ACID By what guarantee ?

A Atomicity is caused by undo log Log guarantees , It records the log information that needs to be rolled back , When the transaction is rolled back, undo the successfully executed sql

C Consistency is generally guaranteed at the code level

I Isolation by MVCC To guarantee

D Persistence is made up of memory +redo log To guarantee ,mysql Modify data both in memory and redo log Record this operation , The transaction is committed through redo log Brush set , When it goes down, you can start from redo log recovery

8.  Do you know what a clearance lock is ?

Gap locks are only available at repeatable read levels , combination MVCC And gap lock can solve the problem of unreal reading . We still use user give an example , Suppose now user There are several records in the table

null
When we execute :

begin;
select * from user where age=20 for update;

begin;
insert into user(age) values(10); # success
insert into user(age) values(11); # Failure
insert into user(age) values(20); # Failure
insert into user(age) values(21); # Failure
insert into user(age) values(30); # Failure

Only 10 Can insert successfully , Well, because of the gap in the table mysql Automatically generated intervals for us ( Left open right closed )

(negative infinity,10],(10,20],(20,30],(30,positive infinity)

because 20 There are records , therefore (10,20],(20,30] The interval is locked and cannot be inserted 、 Delete .

If you inquire 21 Well ? Will be based on 21 Locate the (20,30) The range of ( It's all open intervals ).

It should be noted that there is no gap index in the unique index .

9.  After the sub table ID How to guarantee uniqueness ?

Because our primary keys are automatically incremented by default , Then the primary key after the sub table will conflict in different tables . There are several ways to think about it :

  • Set the step size , such as 1-1024 Let's set up a table 1024 The basic step size of , In this way, the primary keys will not conflict when they fall into different tables .
  • Distributed ID, Realize a set of distributed ID Generate algorithms or use open source ones like snowflake
  • After table splitting, the primary key is not used as the query basis , Instead, a field is added to each form as a unique primary key , For example, the order number is unique , No matter which table it ends up in, it is based on the order number , It's the same with updates .

10.  What's the magnitude of your data ? How to do sub database and sub table ?

First of all, the database and table are divided into two ways: vertical and horizontal , Generally speaking, the order of splitting is vertical first and then horizontal .

Vertical sub database

Based on the current micro service split , It's all done with the vertical database

null
Vertical sub table

If there are more fields in the table , Will not be commonly used 、 Split the larger data and so on

null
Horizontal sub table

First of all, the business scenario is used to determine what fields are used as sub table fields (sharding_key), For example, we order daily now 1000 ten thousand , Most of our scenes come from C End , We can use user_id As sharding_key, Data query support up to the latest 3 Month's order , exceed 3 Months of filing , that 3 The amount of data in a month is 9 Billion , Can be divided into 1024 A watch , So the data of each table is probably in 100 All around .

Such as user id by 100, Then we all go through hash(100), Then on 1024 modulus , You can drop it on the corresponding table .

11.  After the table is divided, it is not sharding_key How to handle the query of ?

  • You can make one mapping surface , For example, what should businesses do when they want to query the order list ? No user_id You can't scan the whole table if you want to inquire ? So we can make a mapping table , Keep the relationship between the merchant and the user , When querying, first query the user list through the merchant , Re pass user_id Go to query .
  • Wide watch , generally speaking , The real-time data requirement of the merchant is not very high , For example, to query the order list , You can synchronize the order table to offline ( real time ) Several positions , Then make a wide table based on the data warehouse , Based on other things like es Provide inquiry service .
  • If the amount of data is not large , For example, some queries in the background , You can also scan the table through multithreading , And then aggregate the results to do . Or asynchronous form is OK .

List<Callable<List<User>>> taskList = Lists.newArrayList();
for (int shardingIndex = 0; shardingIndex < 1024; shardingIndex++) {
 taskList.add(() -> (userMapper.getProcessingAccountList(shardingIndex)));
}
List<ThirdAccountInfo> list = null;
try {
 list = taskExecutor.executeTask(taskList);
} catch (Exception e) {
 //do something
}

public class TaskExecutor {
 public <T> List<T> executeTask(Collection<? extends Callable<T>> tasks) throws Exception {
 List<T> result = Lists.newArrayList();
 List<Future<T>> futures = ExecutorUtil.invokeAll(tasks);
 for (Future<T> future : futures) {
 result.add(future.get());
 }
 return result;
 }
}

12.  say something mysql Master slave synchronization how to do it ?

First of all, understand mysql Principle of master-slave synchronization

  • master After committing the transaction , write in binlog
  • slave Connect to master, obtain binlog
  • master establish dump Threads , push binglog To slave
  • slave Start a IO Thread reads synchronized master Of binlog, It was recorded that relay log In the relay log
  • slave Open one more sql Thread reads relay log Events and slave perform , Complete synchronization
  • slave Record your own binglog

null
because mysql The default mode of replication is asynchronous , After the master database sends the log to the slave database, it does not care whether the slave database has processed it , This will lead to a problem, that is, assuming that the main database is hung , Failed to process from library , At this time, after upgrading from the warehouse to the main library , The journal was lost . Two concepts emerge from this .

Full synchronous replication

Main library write binlog After forced synchronization log to slave database , All slave libraries are executed before returning to the client , But it's clear that this approach will seriously affect performance .

Semi-synchronous replication

What's different from full sync is , The logic of semi synchronous replication is this , When the log is successfully written from the library, it returns ACK Confirm to the master database , When the master database receives the confirmation from at least one slave database, it is considered that the write operation is completed .

13.  How to solve the delay of master-slave ?

  • For specific business scenarios , Read and write requests are forced to go to the main library
  • Read request from library , If there is no data , Go to the main database for secondary query

This article will be written here first , I have sorted out some of the questions often asked in the interview , It will be continuously updated later , need PDF My good brother can forward this article + After attention 【
Click here to
】 Can receive

null
null
原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/186/202207051429167126.html