SQL分页比获取所有数据慢15倍至20倍,这正常吗?

3
我有一个包含大约16k行的视图,获取所有数据需要约5秒钟。
我决定在应用程序中实现“加载”,以便GUI不会冻结,用户可以在DataGridView中工作/查看提供的数据。
我注意到,如果我使用SQL分页来获取所有数据,需要约90秒(1.5分钟),因此这是适得其反的。
现在我想知道这是否正常,为什么有人要使用它?
我尝试了3种SQL分页的方法:

我正在使用160进行测试!

DECLARE @int_percentage AS INT = 1

WHILE @int_percentage <= 100
BEGIN
    SELECT O.*, P.Percentage
    FROM vAppointmentDetailsWithComments O
    LEFT JOIN (SELECT AppointmentID, NTILE(100) OVER(ORDER BY AppointmentID) Percentage
                FROM vAppointmentDetailsWithoutComments) P ON P.AppointmentID = O.AppointmentID
    WHERE P.Percentage = @int_percentage

    SET @int_percentage = @int_percentage + 1
END
---------------------------------------------------------------------------------------------------
DECLARE @int_percentage AS INT = 1, @int_appointmentID AS INT = 0

WHILE @int_percentage <= 100
BEGIN
    SELECT TOP 160 *
    FROM vAppointmentDetailsWithComments
    WHERE AppointmentID > @int_appointmentID

    SET @int_percentage = @int_percentage + 1
    SET @int_appointmentID = @int_appointmentID + 161
END
---------------------------------------------------------------------------------------------------
DECLARE @int_percentage AS INT = 1, @int_currentStartingRowIndex AS INT = 1

WHILE @int_percentage <= 100
BEGIN
    EXEC spGetRows @int_startingRowIndex = @int_currentStartingRowIndex, @int_maxRows = 160

    SET @int_percentage = @int_percentage + 1
    SET @int_currentStartingRowIndex = @int_currentStartingRowIndex + 160
END
---------------------------------------------------------------------------------------------------
SELECT *
FROM vAppointmentDetailsWithComments

步骤:

CREATE PROCEDURE [dbo].[spGetRows] 
(
    @int_startingRowIndex INT,
    @int_maxRows INT
)
AS

DECLARE @int_firstID INT

-- Getting 1'st ID
SET ROWCOUNT @int_startingRowIndex
SELECT @int_firstID = AppointmentID FROM vAppointmentDetailsWithoutComments ORDER BY AppointmentID

-- Setting ROWCOUNT to MAX
SET ROWCOUNT @int_maxRows

-- Getting all data >= @int_firstID
SELECT *
FROM vAppointmentDetailsWithComments
WHERE AppointmentID >= @int_firstID

SET ROWCOUNT 0

GO

带有结果的内容:

Results

表和视图的创建和填充数据:

"vAppointmentDetailsWithComments" 中的 FOR XML PATH 是主要的性能问题。

CREATE TABLE [dbo].[Appointment](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [Number] [int] NOT NULL,
 CONSTRAINT [PK_Appointment] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[Appointment] ADD  CONSTRAINT [DF_Appointment_Number]  DEFAULT ((0)) FOR [Number]
GO
---------------------------------------------------------------------------------------------------
CREATE TABLE [dbo].[Comment](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [Appointment_ID] [int] NOT NULL,
    [Text] [nvarchar](max) NOT NULL,
    [Time] [datetime] NOT NULL,
 CONSTRAINT [PK_Comment] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[Comment]  WITH CHECK ADD  CONSTRAINT [FK_Comment_Appointment] FOREIGN KEY([Appointment_ID])
REFERENCES [dbo].[Appointment] ([ID])
GO

ALTER TABLE [dbo].[Comment] CHECK CONSTRAINT [FK_Comment_Appointment]
GO

ALTER TABLE [dbo].[Comment] ADD  CONSTRAINT [DF_Comment_Text]  DEFAULT (N'Some random Comment for Testing purposes') FOR [Text]
GO

ALTER TABLE [dbo].[Comment] ADD  CONSTRAINT [DF_Comment_Time]  DEFAULT (getdate()) FOR [Time]
GO
---------------------------------------------------------------------------------------------------
CREATE VIEW [dbo].[vAppointmentDetailsWithComments]
AS
SELECT A.ID AppointmentID, (K.Comments + CHAR(13) + CHAR(10)) Comment
FROM Appointment A LEFT JOIN
    (SELECT A.ID,
        (SELECT STUFF
            ((SELECT REPLACE(CHAR(13) + CHAR(10) + K.Text, CHAR(7), '')
        FROM Comment K
        WHERE K.Appointment_ID = A.ID
        AND K.Text != ''
        ORDER BY K.Time FOR XML PATH, TYPE ).value('.[1]', 'NVARCHAR(MAX)'), 1, 1, '')) Comments
    FROM Appointment A) K ON K.ID = A.ID

GO
---------------------------------------------------------------------------------------------------
CREATE VIEW [dbo].[vAppointmentDetailsWithoutComments]
AS
SELECT A.ID AppointmentID
FROM Appointment A

GO
---------------------------------------------------------------------------------------------------
SET NOCOUNT ON 
BEGIN TRAN 
DECLARE @int_appointmentID AS INT = 1,
         @int_tempComment AS INT
WHILE @int_appointmentID <= 16000 
BEGIN 
    INSERT INTO Appointment VALUES (@int_appointmentID)

    SET @int_tempComment = 1

    WHILE @int_tempComment <= 5
    BEGIN
        INSERT INTO Comment (Appointment_ID) VALUES (@int_appointmentID)

        SET @int_tempComment = @int_tempComment + 1
    END

SET @int_appointmentID = @int_appointmentID + 1 
END 
COMMIT TRAN

GO

执行计划: 快速(FetchAll) 慢速(Top)


@Andomar 我不知道为什么变量名和注释很重要,但无论如何,我已经用英语替换了克罗地亚语以取悦您。 - Djiber
3
@Djiber,请提供你正在使用的SQL Server版本。SQL Server 2012引入了ORDER BY语句的OFFSET子句以便于T-SQL基础分页。但是,在执行分页操作之前,请确保视图查询已经完全优化,否则在其之上进行分页将会使情况变得更糟。请发布你的视图查询、CREATE TABLE语句和索引。 - Dan Guzman
1
请提供“快速”情况(获取全部)和至少一个“慢速”情况(TOP将是我的首选)的执行计划。http://use-the-index-luke.com/sql/explain-plan/sql-server/getting-an-execution-plan#ap-explain-mssql-tab 我猜你应该能看到其中的区别。很可能是您的“获取页面”查询诱使优化器使用实际上不好的索引。 - Markus Winand
1
@Djiber,感谢您提供的脚本。关于分页会使事情变得更糟的程度,您发布的第一种方法将要求用户在数据集中翻页时多次触及同一预约行。相比之下,单次遍历只需触及16000个预约行,而总共需要触及808000个预约行。我认为,在此情况下,渲染结果的第一页,同时在后台运行查询,比使用分页更可取。 - Dan Guzman
1
@Djiber,如果您在应用程序代码中执行连接操作,一次获取所有16K行应该可以在几秒钟内完成。这对于您的用户来说不够快吗? - Dan Guzman
显示剩余4条评论
1个回答

1

性能问题的一部分是因为评论表的Appointment_ID列没有索引。通过在Appointment_ID上创建聚集索引,并将主键索引更改为非聚集索引,从vAppointmentDetailsWithComments的选择查询的耗时从我的测试框约5秒降至约3.5秒。下面是一个脚本,用于创建聚集索引并将主键重新创建为非聚集索引。

ALTER TABLE dbo.Comment DROP CONSTRAINT FK_Comment_Appointment;

ALTER TABLE Appointment DROP CONSTRAINT PK_Appointment;

ALTER TABLE Appointment ADD CONSTRAINT PK_Appointment
    PRIMARY KEY NONCLUSTERED(ID);

ALTER TABLE dbo.Comment 
    ADD CONSTRAINT FK_Comment_Appointment FOREIGN KEY(Appointment_ID)
    REFERENCES dbo.Appointment (ID);


CREATE CLUSTERED INDEX cdx_Comment_Appointment_ID ON Comment(Appointment_ID);
GO

在T-SQL中,连接字符串的注释是一项昂贵的操作。我建议您在应用程序端执行此操作,对于16K行而言,预计处理时间不到一秒钟。这将避免在SQL端通过简单的注释连接来进行复杂的跳跃:

CREATE VIEW dbo.vAppointmentDetailsWithIndividualComments
AS
SELECT A.ID AppointmentID, K.Text, K.Time
FROM dbo.Appointment A 
LEFT JOIN dbo.Comment K
        ON K.Appointment_ID = A.ID
        AND K.Text <> '';
GO

SELECT AppointmentID, Text, Time
FROM dbo.vAppointmentDetailsWithIndividualComments
ORDER BY Time;
GO

关于您列出的分页技术,第一种由于扫描预约表,在结果集越深入时性能逐渐变差。
第二个查询缺少ORDER BY Appointment_ID。对于确定性结果,ORDER BYTOP一起使用是必需的。但是,从分页性能的角度来看,该方法确实具有优点,因为它将在预约表上执行索引查找,提供一致的性能,无论结果集中的位置如何。 SET ROWCOUNT已被弃用,但底线是它的性能与第一种查询类似(随着结果集的加深而逐渐变差)。

感谢您提供的建议和测试结果。正如我在回答@MarkusWinand时所提到的,“缺失索引”只出现在生成的(英文)版本中(尽管我不知道为什么以及如何将PK索引更改为非聚集索引,因此如果您能解释一下这部分,我会非常感激)。 - Djiber

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接