优化SQL查询

3

我有一个庞大的遗留查询,它是我的程序核心, 这个查询需要太长时间,有什么方法可以让它运行得更快? 我使用的是 Oracle 11g。

 SELECT   *
     FROM     ( SELECT  COUNT(*) AS countme,
                       string_value        ,
                       name                ,
                       property_id         ,
                       category_id
              FROM    ( SELECT DISTINCT a.string_value,
                                        a.name        ,
                                        a.property_id ,
                                        b.product_id  ,
                                        a.category_id
                       FROM             filter_criterias a
                                        JOIN product_properties b
                                        ON              (
                                                                          a.property_id = b.property_id
                                                         AND
                                                                          (
                                                                                           (
                                                                                                            isnumber(b.value)        IS NOT NULL
                                                                                           AND              isnumber(a.range_bottom) IS NOT NULL
                                                                                           AND              isnumber(a.range_top)    IS NOT NULL
                                                                                           AND
                                                                                                            (
                                                                                                                             a.range_bottom >a.range_top
                                                                                                            AND              b.value       >= a.range_bottom
                                                                                                            OR               a.range_bottom<=a.range_top
                                                                                                            AND              b.value       >= a.range_bottom
                                                                                                            AND              b.value       <=a.range_top
                                                                                                            )
                                                                                           )
                                                                          )
                                                         )
                                        JOIN PRODUCT_CATEGORY prc
                                        ON               (
                                                                          prc.sku         = b.product_id
                                                         AND              prc.category_id = a.category_id
                                                         )
                                        JOIN PRODUCT pr
                                        ON               (
                                                                          b.product_id = pr.SKU
                                                         AND              pr.visible   = '1'
                                                         )
                       )
              GROUP BY (string_value, name, property_id,category_id)

              UNION

              SELECT   COUNT(*) AS countme,
                       string_value       ,
                       name               ,
                       property_id        ,
                       category_id
              FROM    ( SELECT DISTINCT a.string_value,
                                        a.name        ,
                                        a.property_id ,
                                        b.product_id  ,
                                        a.category_id
                       FROM             filter_criterias a
                                        JOIN product_properties b
                                        ON              (
                                                                          a.property_id = b.property_id
                                                         AND
                                                                          (
                                                                                           (
                                                                                                            a.name= b.value
                                                                                           )
                                                                          )
                                                         )
                                        JOIN PRODUCT_CATEGORY prc
                                        ON               (
                                                                          prc.sku         = b.product_id
                                                         AND              prc.category_id = a.category_id
                                                         )
                                        JOIN PRODUCT pr
                                        ON               (
                                                                          b.product_id = pr.SKU
                                                         AND              pr.visible   = '1'
                                                         )
                       )
              GROUP BY (string_value, name, property_id,category_id)
              )
     ORDER BY 5,4,3,2

这是执行计划。
    "Optimizer" "Cost"  "Cardinality"   "Bytes" "Partition Start"   "Partition Stop"    "Partition Id"  "ACCESS PREDICATES" "FILTER PREDICATES"
"SELECT STATEMENT"  "ALL_ROWS"      "1298"  "2"         "542"   ""  ""  ""  ""  ""
"SORT(ORDER BY)"    ""              "1298"  "2"         "542"   ""  ""  ""  ""  ""
"VIEW"              ""              "1297"  "2"         "542"   ""  ""  ""  ""  ""
"SORT(UNIQUE)"      ""              "1297"  "2"         "74"    ""  ""  ""  ""  ""
"UNION-ALL" ""      ""              ""  ""  ""          ""  ""  ""  ""
"HASH(GROUP BY)"    ""              "661"   "1"         "37"    ""  ""  ""  ""  ""
"VIEW"              ""              "659"   "1"         "37"    ""  ""  ""  ""  ""
"HASH(UNIQUE)"      ""              "659"   "1"         "95"    ""  ""  ""  ""  ""
"NESTED LOOPS"      ""              ""  ""  ""          ""  ""  ""  ""  ""
"NESTED LOOPS"      ""              "658"   "1"         "95"    ""  ""  ""  ""  ""
"HASH JOIN"         ""              "493"   "1"         "81"    ""  ""  ""  ""B"."PRODUCT_ID"=TO_NUMBER("PRC"."SKU") AND "A"."CATEGORY_ID"=SYS_OP_C2C("PRC"."CATEGORY_ID")" ""
"HASH JOIN"         ""              "369"   "2"         "128"   ""  ""  ""  ""B"."PROPERTY_ID"=TO_NUMBER("A"."PROPERTY_ID")"    ""A"."RANGE_BOTTOM">"A"."RANGE_TOP" AND "A"."RANGE_BOTTOM"<=TO_NUMBER("B"."VALUE") OR "A"."RANGE_BOTTOM"<="A"."RANGE_TOP" AND "A"."RANGE_BOTTOM"<=TO_NUMBER("B"."VALUE") AND "A"."RANGE_TOP">=TO_NUMBER("B"."VALUE")"
"TABLE ACCESS(FULL) BNET.B_FILTER_CRITERIAS"    "ANALYZED"  "36"    "28"    "1148"  ""  ""  ""  ""  ""ISNUMBER"(TO_CHAR("A"."RANGE_BOTTOM")) IS NOT NULL AND "ISNUMBER"(TO_CHAR("A"."RANGE_TOP")) IS NOT NULL"
"TABLE ACCESS(FULL) BNET.B_PRODUCT_PROPERTIES"  "ANALYZED"  "332"   "12566" "289018"    ""  ""  ""  ""  ""ISNUMBER"("B"."VALUE") IS NOT NULL"
"TABLE ACCESS(FULL) BNET.WLCS_PRODUCT_CATEGORY" "ANALYZED"  "124"   "129762"    "2205954"   ""  ""  ""  ""  ""
"INDEX(RANGE SCAN) BNET.WLCS_PROD_VISIBLE_IDX"  "ANALYZED"  "12"    "6208"  ""  ""  ""  ""  ""PR"."VISIBLE"='1'"    ""
"TABLE ACCESS(BY INDEX ROWID) BNET.WLCS_PRODUCT"    "ANALYZED"  "164"   "1" "14"    ""  ""  ""  ""  ""B"."PRODUCT_ID"=TO_NUMBER("PR"."SKU")"
"HASH(GROUP BY)"    ""              "637"   "1"         "37"    ""  ""  ""  ""  ""
"VIEW"              ""              "635"   "1"         "37"    ""  ""  ""  ""  ""
"HASH(UNIQUE)"      ""              "635"   "1"         "91"    ""  ""  ""  ""  ""
"HASH JOIN"         ""              "634"   "1"         "91"    ""  ""  ""  ""B"."PRODUCT_ID"=TO_NUMBER("PRC"."SKU") AND "A"."CATEGORY_ID"=SYS_OP_C2C("PRC"."CATEGORY_ID")" ""
"NESTED LOOPS"      ""              ""      ""  ""  ""  ""  ""  ""  ""
"NESTED LOOPS"      ""              "509"   "1"         "74"    ""  ""  ""  ""  ""
"HASH JOIN"         ""              "345"   "1"         "60"    ""  ""  ""  ""B"."PROPERTY_ID"=TO_NUMBER("A"."PROPERTY_ID") AND "A"."NAME"="B"."VALUE"" ""
"TABLE ACCESS(FULL) BNET.B_FILTER_CRITERIAS"    "ANALYZED"  "35"    "11257" "416509"    ""  ""  ""  ""  ""
"TABLE ACCESS(FULL) BNET.B_PRODUCT_PROPERTIES"  "ANALYZED"  "309"   "251319"    "5780337"   ""  ""  ""  ""  ""
"INDEX(RANGE SCAN) BNET.WLCS_PROD_VISIBLE_IDX"  "ANALYZED"  "12"    "6208"  ""  ""  ""  ""  ""PR"."VISIBLE"='1'"    ""
"TABLE ACCESS(BY INDEX ROWID) BNET.WLCS_PRODUCT"    "ANALYZED"  "164"   "1" "14"    ""  ""  ""  ""  ""B"."PRODUCT_ID"=TO_NUMBER("PR"."SKU")"
"TABLE ACCESS(FULL) BNET.WLCS_PRODUCT_CATEGORY" "ANALYZED"  "124"   "129762"    "2205954"   ""  ""  ""  ""  ""

1
很难确定,因为我们不知道有哪些索引。显示执行计划可能是一个好的开始... - Daniel Hilgarth
2
@Daniel - 即使没有计划,我也可以看到由于ISNUMBER而将数字值存储为字符串。这会导致表扫描。这就像试图通过吃豆子并在显眼的地方拿着一个打火机来充气热气球。 - MatBailie
@Lasse V. Karlsen:我认为这个问题并不太局限。尽管有很多细节对他人来说并不重要,但将数字存储为字符串的核心问题适用于广大受众。 - Jon Heller
5个回答

5
一个潜在的巨大问题是你必须使用ISNUMBER函数。如果你将数字值存储为文本,然后使用“x <= y”等操作,会产生很多负面影响:
- 必须将字符串解析成数字才能使用
- 字符串的索引可能与数字的索引没有任何相似之处
- 如果索引无用,则会得到表扫描而不是索引搜索
我强烈建议考虑将值存储为实数,而不是作为字符串。不需要使用ISNUMBER函数,也不需要转换每个值,因此可以使用索引,这样可以获得极高的性能优势。
编辑:
你刚刚添加的计划包括许多TABLE ACCESS(FULL)实例,其中几个与将数字值存储为字符串有关。

2

首先,我建议至少缩进一下,使其更易读。如果你无法阅读,就无法进行优化。你可以使用与表名匹配的别名来使其更易读,例如将filter_criterias改为fc,而不是a。在下面的查询中,我只是稍微调整了一下排版,并去掉了多余的括号。

SELECT   
  *
FROM  
( 
    SELECT  
        COUNT(*) AS countme,
        string_value,
        name,
        property_id,
        category_id
    FROM
    (
        SELECT DISTINCT 
            a.string_value,
            a.name,
            a.property_id,
            b.product_id,
            a.category_id
        FROM
            filter_criterias a
            INNER JOIN product_properties b
                ON a.property_id = b.property_id
                AND isnumber(b.value) IS NOT NULL
                AND isnumber(a.range_bottom) IS NOT NULL
                AND isnumber(a.range_top) IS NOT NULL
                AND ( 
                    a.range_bottom > a.range_top
                    AND b.value >= a.range_bottom
                    OR a.range_bottom <= a.range_top
                    AND b.value >= a.range_bottom
                    AND b.value <=a.range_top
                )
            INNER JOIN PRODUCT_CATEGORY prc
                ON prc.sku = b.product_id
                AND prc.category_id = a.category_id
            INNER JOIN PRODUCT pr
                ON b.product_id = pr.SKU
                AND pr.visible = '1'
    )
GROUP BY
    string_value, 
    name, 
    property_id,
    category_id

UNION

SELECT 
    COUNT(*) AS countme,
    string_value,
    name,
    property_id,
    category_id
FROM
    (
    SELECT DISTINCT 
        a.string_value,
        a.name     ,
        a.property_id ,
        b.product_id  ,
        a.category_id
    FROM
        filter_criterias a
        INNER JOIN product_properties b
            ON a.property_id = b.property_id
            AND a.name = b.value
        INNER JOIN PRODUCT_CATEGORY prc
            ON prc.sku = b.product_id
            AND prc.category_id = a.category_id
        INNER JOIN PRODUCT pr
            ON b.product_id = pr.SKU
            AND pr.visible = '1'
    )
GROUP BY
    string_value, 
    name, 
    property_id,
    category_id
ORDER BY 5,4,3,2

在完成上述操作后,您会发现它包含两个查询,由UNION分隔。如果这些查询各自包含不同的行,则可以使用UNION ALL。只有UNION会在结果上执行另一个DISTINCT,这会使查询变慢。
此外,这两个子查询几乎相同,除了在product_properties b的连接中有一个条件(通过将每个子查询放入WinMerge或类似工具进行检查)。因此,也许您可以完全跳过union,并将两个条件组合为OR加入到连接中,尽管您必须记住,OR会减慢连接!
当您查看查询的解释计划时,这些问题就会出现。查看解释计划以查看哪些连接会给您带来问题总是很好的。有时只是忘记了索引。但重要的是要知道一些操作会减缓查询速度,例如在连接中使用OR(您正在使用),在不需要时使用DISTINCT以及在可以使用UNION ALL时使用UNION。

1
在解释计划中,你应该首先查看基数(估计的行数)。返回1行的最佳计划通常与返回10亿行的最佳计划非常不同。如果Oracle的估计显著错误,你需要尝试弄清楚为什么它是错误的以及你可以采取什么措施。
我同意@Dems的观点,ISNUMBER很可能是你问题的原因,但原因不同。Oracle无法准确猜测使用自定义函数的谓词将过滤掉多少行。虽然你可能知道99.9%的行将通过该过滤器,但Oracle假设只有5%的行将通过。这导致基数非常低,从而导致低效的嵌套循环而不是哈希连接。
你可以通过在ISNUMBER函数上创建扩展统计信息来为优化器提供更有用的信息。这假定你正在使用Oracle 11g,并且ISNUMBER是确定性的:
select dbms_stats.create_extended_stats(null,'product_properties','(isnumber(value))') from dual;
select dbms_stats.create_extended_stats(null,'filter_criterias','(isnumber(range_bottom))') from dual;
select dbms_stats.create_extended_stats(null,'filter_criterias','(isnumber(range_top))') from dual;

--Must re-gather table stats for the extended stats to work
begin
    dbms_stats.gather_Table_stats(user, 'product_properties', no_invalidate => false);
    dbms_stats.gather_Table_stats(user, 'filter_criterias', no_invalidate => false);
end;
/

然而,你的第二个查询没有使用ISNUMBER函数,但是估计的基数仍为1。你的表和索引统计信息是否最新?可以通过执行select last_analyzed, table_name from user_tables;来检查。或者可能是Oracle永远无法正确估计。像/*+ no_use_nl(a b prc pr) */这样的提示可能会有所帮助。

此外,看起来你正在尝试在SQL中实现短路逻辑,但这并不总是有效的。Oracle不一定按照从上到下的顺序处理谓词,你可能会发现当计划更改时,某天查询失败了。


1

逐步执行计划并查看瓶颈所在。首先,您可以尝试以下几个方面:

  • 不要使用SELECT * - 选择您需要的特定列。
  • 检查连接并查看是否有任何方法可以使它们更有效率
  • 尽可能使用派生表替换嵌套查询
  • 在查询顶部使用SET NOCOUNT ON
  • 确保所有表都正确索引

但是,您绝对需要查看执行计划,并从那里开始。


1
SET NOCOUNT ON in Oracle??? - MatBailie

0
一些提示:
  1. 确保您已创建并更新了正确的索引
  2. 确保不执行任何隐式类型转换(pr.visible = '1' 看起来就是这样的情况)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接