当前位置：网站首页>Differences between in and not in, exists and not exists in SQL and performance analysis

Differences between in and not in, exists and not exists in SQL and performance analysis

2022-06-12 04:34:00 【Python's path to becoming a God】

1、in and exists

in It's the exterior and the interior hash Connect , and exists It's external work loop loop , Every time loop Loop and query the inner table , It has always been thought that exists Than in The statement of high efficiency is not accurate .

If the two tables in the query are the same size , Then use in and exists Not much difference ; If one of the two tables is smaller and the other is larger , Then the sub query table is very useful exists, Small use of subquery table in;

for example ： surface A( Watch ), surface B( The big table )

select * from A where cc in(select cc from B)  --> Low efficiency , Yes A On the table cc Column index ;

select * from A where exists(select cc from B where cc=A.cc)  --> Efficient , Yes B On the table cc Column index .

Contrary ：

select * from B where cc in(select cc from A)  --> Efficient , Yes B On the table cc Column index 

select * from B where exists(select cc from A where cc=B.cc)  --> Low efficiency , Yes A On the table cc Column index .

2、not in and not exists

not in Logically, it's not exactly the same as not exists, If you misuse not in, Be careful your program has fatal BUG, Please see the following example ：

create table #t1(c1 int,c2 int);

create table #t2(c1 int,c2 int);

insert into #t1 values(1,2);

insert into #t1 values(1,3);

insert into #t2 values(1,2);

insert into #t2 values(1,null);

 

select * from #t1 where c2 not in(select c2 from #t2);  --> Execution results ： nothing 

select * from #t1 where not exists(select 1 from #t2 where #t2.c2=#t1.c2)  --> Execution results ：1  3

As you can see ,not in There's an unexpected result set , There is a logic error . If you look at these two select Statement execution plan , It will be different , The latter uses hash_aj, therefore , Please try not to use not in( It calls the subquery ), And try to use not exists（ It calls the associated subquery ）.

If any record returned in the subquery contains a null value , Then the query will return no records . If the subquery field has a non empty limit , You can use not in, And you can prompt it to use hasg_aj or merge_aj Connect .

If the query statement uses not in, Then scan both the internal and external tables , No index is used ; and not exists The subquery of can still use the index on the table . So no matter which watch is big , use not exists All ratio not in Be quick .

3、in And = The difference between

select name from student where name in('zhang','wang','zhao');

And

select name from student where name='zhang' or name='wang' or name='zhao'

The result is the same .

Other analysis ：

1.EXISTS The implementation process of

select * from t1 where exists ( select null from t2 where y = x )

It can be understood as :

for x in ( select * from t1 ) loop 

if ( exists ( select null from t2 where y = x.x ) then 
OUTPUT THE RECORD 
end if 
end loop

about in and exists The difference in performance :

If the subquery results in fewer result set records , When the table in the main query is large and has index, it should be used in, On the contrary, if the outer layer has fewer main query records , Table size in subquery , Use when there is index again exists.

In fact, we distinguish in and exists Mainly caused the change of driving sequence （ This is the key to performance change ）, If it is exists, Then take the outer table as the driving table , Be interviewed first , If it is IN, Then execute the subquery first , So we will aim to drive the quick return of the table , Then we will consider the relationship between index and result set

in addition IN It's not the right time NULL To deal with

Such as ：select 1 from dual where null in (0,1,2,null) It's empty

2.NOT IN And NOT EXISTS:

NOT EXISTS The implementation process of

select ..... from rollup R  where not exists ( select 'Found' from title T where R.source_id = T.Title_ID);

It can be understood as :

for x in ( select * from rollup ) loop 
if ( not exists ( that query ) ) then 
OUTPUT 
end if; 
end loop;

Be careful :NOT EXISTS And NOT IN Can't completely replace each other , Look at the specific needs . If the selected column can be empty , Can't be replaced .

In official account, programmer Xiao Le reply. “Java”, obtain Java Surprise package for interview questions and answers .

For example, the following statement , Look at the difference between them ：

select x,y from t;

Inquire about x and y The data are as follows ：

x y 
------ ------ 
1 3 
3 1 
1 2 
1 1 
3 1 
5

Use not in and not exists The query results are as follows ：

select * from t where x not in (select y from t t2 ) ;

The query has no result ：no rows

select * from t where not exists (select null from t t2 where t2.y=t.x ) ;

The query result is ：

x y 
------ ------ 
5 NULL

So it's up to specific needs to decide

about not in and not exists The difference in performance ：

not in Only when the subquery ,select The fields after the keyword are not null To constrain or imply not in, In addition, if the table in the main query is large , The table in the subquery is small but has many records , Should be used not in, And use anti hash join.

If there are few records in the main query table , There are many records in the subquery table , And there's an index , have access to not exists, in addition not in It's better to use /*+ HASH_AJ */ Or external connection +is null

NOT IN Better in cost based applications

such as :

select ..... 
from rollup R 
where not exists ( select 'Found' from title T 
where R.source_id = T.Title_ID);

Change to （ better ）

select ...... 
from title T, rollup R 
where R.source_id = T.Title_id(+) 
and T.Title_id is null;

perhaps （ better ）

sql> select /*+ HASH_AJ */ ... 
from rollup R 
where ource_id NOT IN ( select ource_id 
from title T 
where ource_id IS NOT NULL )

Discuss IN and EXISTS.

select * from t1 where x in ( select y from t2 )

In fact, it can be understood as ：

select * 
from t1, ( select distinct y from t2 ) t2 
where t1.x = t2.y;

—— If you have a certain SQL Optimization experience , It's natural to think of t2 It can't be a big watch , Because it needs to be right t2 Do a full table “ Unique sort ”, If t2 The big performance of this sort is Intolerable . however t1 It can be very big , Why? ？ The most popular understanding is that t1.x=t2.y You can use the index .

But that's not a good explanation . Just imagine , If t1.x and t2.y They all have indexes , We know that index is an ordered structure , therefore t1 and t2 The best solution is to go merge join. in addition , If t2.y There's an index on , Yes t2 The sorting performance of is also greatly improved .

select * from t1 where exists ( select null from t2 where y = x )

It can be understood as ：

for x in ( select * from t1 ) 
loop 
if ( exists ( select null from t2 where y = x.x ) 
then 
OUTPUT THE RECORD! 
end if 
end loop

—— This is easier to understand ,t1 It's always a table scan ！ therefore t1 It can't be a big watch , and t2 It can be very big , because y=x.x You can go t2.y The index of .

To sum up the above, we should pay attention to IN/EXISTS The discussion of the , We can come to a general conclusion ：IN It is suitable for the case of large exterior and small interior surface ;EXISTS It is suitable for the case of small appearance and large inner surface .

We should make corresponding optimization according to the actual situation , We can't say whose efficiency is high and whose efficiency is low absolutely , Everything is relative

原网站

版权声明
本文为[Python's path to becoming a God]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203010941366120.html