当前位置:网站首页>Differences between in and not in, exists and not exists in SQL and performance analysis
Differences between in and not in, exists and not exists in SQL and performance analysis
2022-06-12 04:34:00 【Python's path to becoming a God】
1、in and exists
in It's the exterior and the interior hash Connect , and exists It's external work loop loop , Every time loop Loop and query the inner table , It has always been thought that exists Than in The statement of high efficiency is not accurate .
If the two tables in the query are the same size , Then use in and exists Not much difference ; If one of the two tables is smaller and the other is larger , Then the sub query table is very useful exists, Small use of subquery table in;
for example : surface A( Watch ), surface B( The big table )
select * from A where cc in(select cc from B) --> Low efficiency , Yes A On the table cc Column index ;
select * from A where exists(select cc from B where cc=A.cc) --> Efficient , Yes B On the table cc Column index .
Contrary :
select * from B where cc in(select cc from A) --> Efficient , Yes B On the table cc Column index
select * from B where exists(select cc from A where cc=B.cc) --> Low efficiency , Yes A On the table cc Column index .
2、not in and not exists
not in Logically, it's not exactly the same as not exists, If you misuse not in, Be careful your program has fatal BUG, Please see the following example :
create table #t1(c1 int,c2 int);
create table #t2(c1 int,c2 int);
insert into #t1 values(1,2);
insert into #t1 values(1,3);
insert into #t2 values(1,2);
insert into #t2 values(1,null);
select * from #t1 where c2 not in(select c2 from #t2); --> Execution results : nothing
select * from #t1 where not exists(select 1 from #t2 where #t2.c2=#t1.c2) --> Execution results :1 3
As you can see ,not in There's an unexpected result set , There is a logic error . If you look at these two select Statement execution plan , It will be different , The latter uses hash_aj, therefore , Please try not to use not in( It calls the subquery ), And try to use not exists( It calls the associated subquery ).
If any record returned in the subquery contains a null value , Then the query will return no records . If the subquery field has a non empty limit , You can use not in, And you can prompt it to use hasg_aj or merge_aj Connect .
If the query statement uses not in, Then scan both the internal and external tables , No index is used ; and not exists The subquery of can still use the index on the table . So no matter which watch is big , use not exists All ratio not in Be quick .
3、in And = The difference between
select name from student where name in('zhang','wang','zhao');
And
select name from student where name='zhang' or name='wang' or name='zhao'
The result is the same .
Other analysis :
1.EXISTS The implementation process of
select * from t1 where exists ( select null from t2 where y = x )
It can be understood as :
for x in ( select * from t1 ) loop
if ( exists ( select null from t2 where y = x.x ) then
OUTPUT THE RECORD
end if
end loop
about in and exists The difference in performance :
If the subquery results in fewer result set records , When the table in the main query is large and has index, it should be used in, On the contrary, if the outer layer has fewer main query records , Table size in subquery , Use when there is index again exists.
In fact, we distinguish in and exists Mainly caused the change of driving sequence ( This is the key to performance change ), If it is exists, Then take the outer table as the driving table , Be interviewed first , If it is IN, Then execute the subquery first , So we will aim to drive the quick return of the table , Then we will consider the relationship between index and result set
in addition IN It's not the right time NULL To deal with
Such as :select 1 from dual where null in (0,1,2,null) It's empty
2.NOT IN And NOT EXISTS:
NOT EXISTS The implementation process of
select ..... from rollup R where not exists ( select 'Found' from title T where R.source_id = T.Title_ID);
It can be understood as :
for x in ( select * from rollup ) loop
if ( not exists ( that query ) ) then
OUTPUT
end if;
end loop;
Be careful :NOT EXISTS And NOT IN Can't completely replace each other , Look at the specific needs . If the selected column can be empty , Can't be replaced .
In official account, programmer Xiao Le reply. “Java”, obtain Java Surprise package for interview questions and answers .
For example, the following statement , Look at the difference between them :
select x,y from t;
Inquire about x and y The data are as follows :
x y
------ ------
1 3
3 1
1 2
1 1
3 1
5
Use not in and not exists The query results are as follows :
select * from t where x not in (select y from t t2 ) ;
The query has no result :no rows
select * from t where not exists (select null from t t2 where t2.y=t.x ) ;
The query result is :
x y
------ ------
5 NULL
So it's up to specific needs to decide
about not in and not exists The difference in performance :
not in Only when the subquery ,select The fields after the keyword are not null To constrain or imply not in, In addition, if the table in the main query is large , The table in the subquery is small but has many records , Should be used not in, And use anti hash join.
If there are few records in the main query table , There are many records in the subquery table , And there's an index , have access to not exists, in addition not in It's better to use /*+ HASH_AJ */ Or external connection +is null
NOT IN Better in cost based applications
such as :
select .....
from rollup R
where not exists ( select 'Found' from title T
where R.source_id = T.Title_ID);
Change to ( better )
select ......
from title T, rollup R
where R.source_id = T.Title_id(+)
and T.Title_id is null;
perhaps ( better )
sql> select /*+ HASH_AJ */ ...
from rollup R
where ource_id NOT IN ( select ource_id
from title T
where ource_id IS NOT NULL )
Discuss IN and EXISTS.
select * from t1 where x in ( select y from t2 )
In fact, it can be understood as :
select *
from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;
—— If you have a certain SQL Optimization experience , It's natural to think of t2 It can't be a big watch , Because it needs to be right t2 Do a full table “ Unique sort ”, If t2 The big performance of this sort is Intolerable . however t1 It can be very big , Why? ? The most popular understanding is that t1.x=t2.y You can use the index .
But that's not a good explanation . Just imagine , If t1.x and t2.y They all have indexes , We know that index is an ordered structure , therefore t1 and t2 The best solution is to go merge join. in addition , If t2.y There's an index on , Yes t2 The sorting performance of is also greatly improved .
select * from t1 where exists ( select null from t2 where y = x )
It can be understood as :
for x in ( select * from t1 )
loop
if ( exists ( select null from t2 where y = x.x )
then
OUTPUT THE RECORD!
end if
end loop
—— This is easier to understand ,t1 It's always a table scan ! therefore t1 It can't be a big watch , and t2 It can be very big , because y=x.x You can go t2.y The index of .
To sum up the above, we should pay attention to IN/EXISTS The discussion of the , We can come to a general conclusion :IN It is suitable for the case of large exterior and small interior surface ;EXISTS It is suitable for the case of small appearance and large inner surface .
We should make corresponding optimization according to the actual situation , We can't say whose efficiency is high and whose efficiency is low absolutely , Everything is relative
边栏推荐
- SqEL简单上手
- Brief introduction to 44 official cases of vrtk3.3 (combined with steamvr)
- 疫情数据分析平台工作报告【3】网站部署
- [efficient] the most powerful development tool, ctool, is a compilation tool
- Install/Remove of the Service Denied!
- 疫情数据分析平台工作报告【42】CodeNet
- Will subsequent versions support code block search highlighting
- [software tool] [original] tutorial on using VOC dataset class alias batch modification tool
- Enterprise Architect v16
- Street lighting IOT technology scheme, esp32-s3 chip communication application, intelligent WiFi remote control
猜你喜欢

Notes on relevant knowledge points such as original code / inverse code / complement code, size end, etc
![Epidemic data analysis platform work report [2] interface API](/img/63/383d52775790920bd2467d7ecacfe5.png)
Epidemic data analysis platform work report [2] interface API

Zabbix6.0 new feature GEOMAP map marker can you use it?

Oracle's instr()

Detailed explanation of Command Execution Vulnerability
![[efficient] the most powerful development tool, ctool, is a compilation tool](/img/23/a5eb401affd64119590db273d60c23.png)
[efficient] the most powerful development tool, ctool, is a compilation tool

How Windows installs multiple versions of MySQL and starts it at the same time

Construction case of Expressway Precast Beam Yard (with scheme text)

Introduction to distributed locks

2022 fusion welding and thermal cutting recurrent training question bank and simulation examination
随机推荐
请用递归的方法计算下列函数的值:px(x,n)=x-x^2 +x^3- x^4+… ((-1)n-1)(xn) n>0 **输入格式要求:“%lf%d“ 提示信息:“Enter X and N:”
Let me tell you the benefits of code refactoring
QT compile 45 graphic report of security video monitoring system
Encapsulation manuelle d'un foreach et d'une carte
Create a new table in the database. There was no problem before. Today
Tasks in C #
疫情数据分析平台工作报告【42】CodeNet
Recommended system cleaning tools, cocktail Download
数据库新建表,以前没问题的,今天
Function realization and application of trait
Operation of simulated examination platform for 2022 safety officer-b certificate examination questions
疫情数据分析平台工作报告【1】数据采集
SQL safe backup display and zoom font support
Illustrating the use of Apache skywalking UI
PHP and JS remove all spaces
SqEL简单上手
2022 examination questions and simulation examination for crane driver (limited to bridge crane)
Labor
1. Mx6ull learning notes (III) - busybox creates root file system
L1-065 "nonsense code" (5 points)