Oracle的where子句性能低下

4

我有以下的Oracle表格:

create table my_table(
   start int,
   end int
);

insert into my_table values(1, 3);
insert into my_table values(5, 7);
insert into my_table values(11, 200);
insert into my_table values(311, 5000);
insert into my_table values(60004, 60024);
insert into my_table values(123213, 12312312);

这个表有1M行,存储数字范围(“开始”,“结束”),所有数字都是唯一的,它没有重复的范围,任何数字只能在此表中的一个范围内。我有以下查询,它传递一个变量my_number来确定范围的“开始”。

 execute immediate 
    'select start from my_table where :1 between start and end' using my_number

我已经在两个字段上创建了联合索引。问题是当my_number较小时,查询的性能很好,但当my_number增加时,查询时间持续增加。如果my_number更大,完成这个查询需要相当长的时间。有没有办法改进这个查询?这种方法可以包括重新设计我的表。谢谢。


你的起始和结束有双部分组合键/索引吗? - Scotch
2
我们能否请看一下“解释计划”输出? - 9000
是的,开始和结束上有一个由两部分组成的索引。解释非常简单,索引扫描,start < my_number 和 end < my_number 是过滤器。 - user2166163
5个回答

2
如果您将模式更改为以下内容:
create table my_table(
   start int,
   range_size int
);

insert into my_table values(1, 2);
insert into my_table values(5, 2);
insert into my_table values(11, 189);
insert into my_table values(311, 4689);
insert into my_table values(60004, 20);
insert into my_table values(123213, 12300001);

那么你可以只在start列上创建索引。

execute immediate 
    'select start from (select start, range_size from my_table where start < :1 order by start asc limit 1) tmp where :1 < start+range_size' using my_number

这可能会提高一些性能。

作为额外的奖励,这个解决方案将索引大小减半,因此更有可能完全适合内存。 - 9000
“start < :1” 应该改为 “start > :1”,因为 my_number 应该位于 start 和 end 之间。这个排序语句会增加额外的成本吗?如果是,我可以将表中的行顺序设置为 ASC 吗?这样在执行查询时就不需要进行排序了。 - user2166163
"'start < :1'是正确的。考虑不等式:start < my_number < end。" - Joe Frambach
1
哦,对了。Oracle 没有 LIMIT 运算符。这是什么鬼。 - Joe Frambach

0

你做完了吗?

create table my_table(
start int,
end int
constraint PK_comp primary key (start, end)
) ;

是的,我尝试过这个,甚至在我添加了PK后,它的性能也不如我仅在两个字段上添加组合索引时好。 - user2166163

0
这是一种试图让Oracle表现得像其竞争对手的情况,而我没有访问Oracle,只是在猜测。也许可以使用自连接来实现这个目的?每列都有索引。
SELECT t1.start
FROM my_table t1 JOIN my_table t2
ON t1.start=t2.start AND t2."end"=t1."end"
AND t1.start <= :1
AND t2.end >= :1

这看起来有点傻,但是Joe Frambach提出的解决方案很直接。它让Postgres(我确实有)只进行索引搜索。
顺便说一句,Postgres对于将end作为列名非常不开心。希望你的真实表格中没有使用保留字。

0

为每个列创建索引并使用以下查询:

select start_num
from my_table
where
    start_num =
    (
        --Last start <= number
        select start_num
        from
        (
            select start_num
            from my_table
            where :1 >= start_num
            order by start_num desc
        )
        where rownum = 1
    ) and
    end_num =
    (
        --First end >= number
        select end_num
        from
        (
            select end_num
            from my_table
            where :1 <= end_num
            order by end_num
        )
        where rownum = 1
    );

呃,可能有更好的写法。或者你可能想将其包装在一个函数中。

问题

测试数据(使用非保留字列名):

drop table my_table;
create table my_table(
   start_num int,
   end_num int
);
insert into my_table select level*2,level*2+1 from dual connect by level <= 1000000;
commit;
create index my_table_index on my_table(start_num, end_num);
begin
    dbms_stats.gather_table_stats(user, 'MY_TABLE', no_invalidate => false);
end;
/

小数字几乎是瞬间完成的 - 0.015 秒

select start_num from my_table where 2 between start_num and end_num;

数字越大,速度越慢 - 0.125秒

select start_num from my_table where 1000000 between start_num and end_num;

在范围扫描和全表扫描之间只有一个关键点。

explain plan for select start_num from my_table where 402741 between start_num and end_num;
select * from table(dbms_xplan.display);

Plan hash value: 3804444429

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |   160K|  1570K|   622   (2)| 00:00:08 |
|*  1 |  TABLE ACCESS FULL| MY_TABLE |   160K|  1570K|   622   (2)| 00:00:08 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("START_NUM"<=402742 AND "END_NUM">=402742)


explain plan for select start_num from my_table where 402742 between start_num and end_num;
select * from table(dbms_xplan.display);


Plan hash value: 3804444429

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |   160K|  1570K|   622   (2)| 00:00:08 |
|*  1 |  TABLE ACCESS FULL| MY_TABLE |   160K|  1570K|   622   (2)| 00:00:08 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("START_NUM"<=402742 AND "END_NUM">=402742)

但问题不在于Oracle不使用索引。以天真的方式使用索引并不能帮助解决问题。事实上,这样做甚至更慢,需要0.172秒:

select /*+ index(my_table my_table_index) */ start_num
from my_table
where 1000000 between start_num and end_num;

解决方案

创建新的索引:

drop index my_table_index;
create index my_table_index1 on my_table(start_num);
create index my_table_index2 on my_table(end_num);
begin
    dbms_stats.gather_table_stats(user, 'MY_TABLE', no_invalidate => false);
end;
/

现在,对于任何数字,结果都是即时的:

select start_num
from my_table
where
    start_num =
    (
        --Last start <= number
        select start_num
        from
        (
            select start_num
            from my_table
            where 1000000 >= start_num
            order by start_num desc
        )
        where rownum = 1
    ) and
    end_num =
    (
        --First end >= number
        select end_num
        from
        (
            select end_num
            from my_table
            where 1000000 <= end_num
            order by end_num
        )
        where rownum = 1
    );

计划看起来很棒 - 这可能是您可以获得的最佳性能。

Plan hash value: 522166032

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |     1 |    10 |    10   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID   | MY_TABLE        |     1 |    10 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN             | MY_TABLE_INDEX2 |     1 |       |     3   (0)| 00:00:01 |
|*  3 |    COUNT STOPKEY               |                 |       |       |            |          |
|   4 |     VIEW                       |                 |     3 |    39 |     3   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | MY_TABLE_INDEX2 |     3 |    18 |     3   (0)| 00:00:01 |
|*  6 |   COUNT STOPKEY                |                 |       |       |            |          |
|   7 |    VIEW                        |                 |     2 |    26 |     3   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN DESCENDING| MY_TABLE_INDEX1 |   500K|  2929K|     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("START_NUM"= (SELECT "START_NUM" FROM  (SELECT "START_NUM" "START_NUM" FROM 
              "MY_TABLE" "MY_TABLE" WHERE "START_NUM"<=1000000 ORDER BY "START_NUM" DESC) 
              "from$_subquery$_002" WHERE ROWNUM=1))
   2 - access("END_NUM"= (SELECT "END_NUM" FROM  (SELECT "END_NUM" "END_NUM" FROM 
              "MY_TABLE" "MY_TABLE" WHERE "END_NUM">=1000000 ORDER BY "END_NUM") "from$_subquery$_004" 
              WHERE ROWNUM=1))
   3 - filter(ROWNUM=1)
   5 - access("END_NUM">=1000000)
   6 - filter(ROWNUM=1)
   8 - access("START_NUM"<=1000000)

0
我认为你应该创建两个索引,一个在起始列,另一个在结束列。 然后不要使用between选项选择,而是使用大于起始值和小于结束值的条件选择。 这样每个where子句都可以使用一个索引。
希望这对性能有所帮助。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接