Oracle中 in 和 not in的巨大区别

kider 发表于 2008-4-10 10:58:36

　　在很多软件系统中，系统的性能很大程度上由数据库的性能决定。以前也曾经做过很多次关于性能方面的各种测试，特别是关于oracle的，我想到也应该记录下来一部分，为大家共享。

　　事情发生在我们的系统从sqlserver移植到oracle，用户在一个查询的操作上等待的时间无法忍受了，我们关于这个查询的处理与原来的方式一下，难道sqlserver 同oracle有什么地方不一样么，让我们来看看oracle有什么地方有问题，或者是我们使用的有问题?

　　业务问题大概可以这样描述，一个父表，一个子表，查询的结果是找到子表中没有使用父表id的记录，这种情况估计很多系统都会牵涉得到。让我们来举一个例子：

　　表一：父表 parent
　　http://publish.it168.com/2005/1029/images/487072.jpg

　　表二：子表 childen
　　http://publish.it168.com/2005/1029/images/487073.jpg
　　父表存储父亲，子表存储孩子，然后通过pid和父表关联，查询需要的结果是找到尚未有孩子的父亲。
　　我们来看一下查询语句的写法：
　　select * from parent where id not in (select pid　from childen)

　　这种标准的写法在子表存在50万条的记录的时候，查询时间超过了10秒，远远大于原来的sql server服务器的一秒。我在解决的时候想到了一个方法：
　　select * from parent where id in
　　( select id from parent minus select pid　from childen )

　　正常理解下，这个语句应该更加费时，但是事实完全出乎意料，这条语句不仅仅在子表存在大量记录的情况下速度良好，在子表少量数据的情况下速度也非常的好，基本在1秒内完成。
　　这个结果可以很明显的证明oracle 在子查询的内部处理的时候，使用 in 和 not in 的巨大区别，希望用到这种方式的用户注意，也期待有人解释其中的问题。

　　附录：测试数据的语句
　　-- create parent table
　　drop table parent;
　　create table parent(id varchar(10),name varchar(100), primary key (id) );
　　-- create childen table
　　drop table childen;
　　create table childen(id varchar(10),pid varchar(10), name varchar(100), primary key (id) );
　　-- Create/Recreate primary, unique and foreign key constraints
　　alter table CHILDEN
　　add constraint fk_123 foreign key (PID)
　　references parent (ID);
　　-- add test date for parent
　　-- Created on 2004-11-29 by GUIP
　　declare
　　-- Local variables here
　　i integer;
　　begin
　　-- Test statements here
　　i := 0;
　　delete from parent;
　　loop
　　i := i + 1;
　　dbms_output.put_line(i);
　　insert into parent(id, name) values(i, 'name ' || i);
　　if (i mod 100=0) then
　　commit;
　　end if;
　　exit when i > 1000;
　　end loop;
　　commit;
　　end;
　　-- add test date for childen
　　-- Created on 2004-11-29 by GUIP
　　declare
　　-- Local variables here
　　i integer;
　　j integer;
　　begin
　　-- Test statements here
　　i := 0;
　　delete from childen ;
　　loop
　　j := 0;
　　loop
　　i := i + 1;
　　j := j + 1;
　　insert into childen(id, pid, name) values(i, j, 'name ' || j);
　　if (i mod 100=0) then
　　commit;
　　end if;
　　exit when j>= 50;
　　end loop;
　　exit when i >= 10000 * 50;
　　end loop;
　　commit;
　　end;

fetch 发表于 2008-8-4 13:58:38

用exists 存在和notexists不存在比较好...

页: [1]

MariaDB社区's Archiver

Oracle中 in 和 not in的巨大区别