在SQL Server中生成序列,使用CROSS APPLY性能差

3

多年前,我从网上获取了下面的代码,它一直为我服务得很好。它只是一个生成从1到您传递进来的任意数字序列的函数。

基本上这是在SQL语句中使用for循环的一种方法。

CREATE FUNCTION [SequenceCreate]
(@MaxValue INT)
RETURNS TABLE
AS

RETURN
WITH
Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
Num5 (n) AS (SELECT 1 FROM Num4 AS X, Num4 AS Y),
Num6 (n) AS (SELECT 1 FROM Num5 AS X, Num5 AS Y),
Nums (n) AS
(SELECT ROW_NUMBER() OVER(ORDER BY n)
FROM Num6)
SELECT n AS [Value] FROM Nums
WHERE n BETWEEN 1 AND @MaxValue;

通常情况下,这个方法运行得非常良好且速度快,但我发现在使用 CROSS APPLY 语句时其性能非常差,例如:

DECLARE @T TABLE(StartNum INT, ItemCount INT)
INSERT INTO @T VALUES (100, 5)
INSERT INTO @T VALUES (110, 7)
INSERT INTO @T VALUES (55, 3)

SELECT Seq.Value + StartNum FROM @T
CROSS APPLY he.SequenceCreate(ItemCount) AS Seq

这在我的机器上非常缓慢。有人知道为什么执行一次时它运行良好,但通过CROSS APPLY执行三次时运行非常糟糕吗?即使@T表只包含1行,性能仍然很差。有更好的方法来编写吗?
提前感谢, Michael

当精细和缓慢时,计划是否有所不同? - Mitch Wheat
在SQL语句中使用for循环的做法很可能会导致性能非常差...也许你可以重新设计循环,采用适当的基于集合的SQL方法,甚至可以完全消除对此过程的需求... - marc_s
是的,虽然我不太理解执行计划。当它运行缓慢时,看起来像是创建了最大理论行数(2^32),然后再从中筛选。在快速情况下,它似乎通过某种方式在进行过滤时保持行数较少(计划并没有真正说明原因)。有趣的是,我有一个函数,基本上做同样的事情,但针对日期,它可以正常工作,即使它调用了上述函数。 - MikeKulls
marc_s,你是否误读了问题?这是一种消除for循环并使用基于集合的SQL方法的方式。 - MikeKulls
3个回答

3
查询优化器认为仅执行一次函数,然后使用结果进行连接以获取所需行更好。这是因为您的函数是内联表值函数。如果您将函数改为多语句值函数,则会为源表中的每一行执行一次函数。但是,我建议您按照Igor的建议创建一个数字表。
CREATE FUNCTION [SequenceCreate]
(@MaxValue INT)
RETURNS @T TABLE ([Value] INT NOT NULL PRIMARY KEY)
AS
BEGIN

  WITH
  Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
  Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
  Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
  Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
  Num5 (n) AS (SELECT 1 FROM Num4 AS X, Num4 AS Y),
  Num6 (n) AS (SELECT 1 FROM Num5 AS X, Num5 AS Y),
  Nums (n) AS
  (SELECT ROW_NUMBER() OVER(ORDER BY n)
  FROM Num6)
  INSERT INTO @T
  SELECT n AS [Value] FROM Nums
  WHERE n BETWEEN 1 AND @MaxValue;

  RETURN
END

谢谢,问题已解决。为什么您建议使用数字表格?似乎函数是更简单的解决方案,并且不受表格行数的限制。 - MikeKulls
@MikeKulls - 我认为使用已经包含数字的表格比为每个查询生成数字要快一些。关于在数字表中保留的行数,我认为11000是一个不错的选择 :). 参考:http://www.sqlservercentral.com/articles/T-SQL/62867/ - Mikael Eriksson
从性能角度考虑,使用表格是有道理的。目前为止,我会继续使用这个函数,因为它的性能已经足够满足我的需求,并且我喜欢它没有实际上限的方式。 - MikeKulls

2
如果您查看两个查询的预估执行计划,您会发现许多输出为Constant Scan并通过Nested Loops连接的数据。
在这种情况下,
select * from dbo.SequenceCreate (100)

Constant Scan每个扫描的预计行数为1。

在这种情况下:

SELECT N.N + StartNum 
FROM @T t
LEFT JOIN Numbers AS N ON N.N <= T.ItemCount

每个常量扫描的预估行数为2,因此这是一个良好的几何级数示例。 最后一个嵌套循环返回4294970000行-36 GB。
我不能说优化器为什么选择了这个计划,但它选择了它。
您可以使用以下方法代替。
首先,创建具有连续数字的表。
CREATE TABLE Numbers(N INT PRIMARY KEY NOT NULL IDENTITY(1,1));
GO

INSERT INTO Numbers DEFAULT VALUES;
GO 1000 -- it takes about 2 minutes for 1000 but you need to execut it just once 

使用以下脚本:

DECLARE @T TABLE(StartNum INT, ItemCount INT)

INSERT INTO @T VALUES (100, 5)
INSERT INTO @T VALUES (110, 7)
INSERT INTO @T VALUES (55, 3)

SELECT N.N + StartNum 
FROM @T t
LEFT JOIN Numbers AS N ON N.N <= T.ItemCount

0

你最大的问题在于以下这行代码...

WHERE n BETWEEN 1 AND @MaxValue;

在他的第二篇文章中,Itzik Ben-Gan指出当他首次想出非常有效的级联CTE(简称cCTE)的想法时,SQL服务器有时会做一些疯狂的事情,并且可以在WHERE子句生效之前生成cCTEs能够生成的所有数字。

据我所知,以下是Itzik最新代码的副本。

----------------------------------------------------------------------
-- © Itzik Ben-Gan
-- For more, see 5-day Advanced T-SQL Course:
-- http://tsql.Lucient.com/t-sql-courses/
----------------------------------------------------------------------

IF OBJECT_ID(N'dbo.GetNums', N'IF') IS NOT NULL DROP FUNCTION dbo.GetNums;
GO
CREATE FUNCTION dbo.GetNums(@low AS BIGINT, @high AS BIGINT) RETURNS TABLE
AS
RETURN
  WITH
    L0   AS (SELECT c FROM (SELECT 1 UNION ALL SELECT 1) AS D(c)),
    L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
    L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
    L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
    L4   AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
    L5   AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
    Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
             FROM L5)
  SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n
  FROM Nums
  ORDER BY rownum;
GO

该代码的源链接如下: http://tsql.lucient.com/SourceCodes/GetNums.txt

这是我所说的文章... https://www.itprotoday.com/open-source-sql/fangraphs-tags-cloud-database-keep-big-show


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接