Entity Framework的GroupBy到Sql生成

4

我在使用EF时遇到了性能问题。

        using (var context = new CustomDbContext())
        {
            var result = context.
                TransactionLines
                .Where(x => 
                    x.Transaction.TransactionTypeId == 1433 &&
                    (x.Transaction.Eob.EobBatchId == null || x.Transaction.Eob.EobBatch.Status == EobBatchStatusEnum.Completed)
                )
                .GroupBy(x => x.VisitLine.ProcedureId)
                .Select(x => new
                {
                    Id = x.Key,
                    PaidAmount = x.Sum(t => t.PaidAmount),
                    Code = context.Procedures.Where(h => h.Id == x.Key).Select(h => h.Code).FirstOrDefault()
                }).ToArray();
        }

EF会生成下一个SQL语句:

SELECT 
1 AS [C1], 
[Project6].[ProcedureId] AS [ProcedureId], 
[Project6].[C2] AS [C2], 
[Project6].[C1] AS [C3]
FROM ( SELECT 
    [Project5].[ProcedureId] AS [ProcedureId], 
    [Project5].[C1] AS [C1], 
    (SELECT 
        SUM([Extent7].[PaidAmount]) AS [A1]
        FROM     [dbo].[TransactionLines] AS [Extent7]
        INNER JOIN [dbo].[Transactions] AS [Extent8] ON [Extent7].[TransactionId] = [Extent8].[Id]
        LEFT OUTER JOIN [dbo].[Eobs] AS [Extent9] ON [Extent8].[EobId] = [Extent9].[Id]
        LEFT OUTER JOIN [dbo].[EobBatches] AS [Extent10] ON [Extent9].[EobBatchId] = [Extent10].[Id]
        LEFT OUTER JOIN [dbo].[VisitLines] AS [Extent11] ON [Extent7].[VisitLineId] = [Extent11].[Id]
        WHERE (([Extent9].[EobBatchId] IS NULL) OR (1 = [Extent10].[Status])) AND ([Extent8].[TransactionTypeId] = 1433) AND (([Project5].[ProcedureId] = [Extent11].[ProcedureId]) OR (([Project5].[ProcedureId] IS NULL) AND ([Extent11].[ProcedureId] IS NULL)))) AS [C2]
    FROM ( SELECT 
        [Project4].[ProcedureId] AS [ProcedureId], 
        [Project4].[C1] AS [C1]
        FROM ( SELECT 
            [Project2].[ProcedureId] AS [ProcedureId], 
            (SELECT TOP (1) 
                [Extent6].[Code] AS [Code]
                FROM [dbo].[Procedures] AS [Extent6]
                WHERE [Extent6].[Id] = [Project2].[ProcedureId]) AS [C1]
            FROM ( SELECT 
                [Distinct1].[ProcedureId] AS [ProcedureId]
                FROM ( SELECT DISTINCT 
                    [Extent5].[ProcedureId] AS [ProcedureId]
                    FROM     [dbo].[TransactionLines] AS [Extent1]
                    INNER JOIN [dbo].[Transactions] AS [Extent2] ON [Extent1].[TransactionId] = [Extent2].[Id]
                    LEFT OUTER JOIN [dbo].[Eobs] AS [Extent3] ON [Extent2].[EobId] = [Extent3].[Id]
                    LEFT OUTER JOIN [dbo].[EobBatches] AS [Extent4] ON [Extent3].[EobBatchId] = [Extent4].[Id]
                    LEFT OUTER JOIN [dbo].[VisitLines] AS [Extent5] ON [Extent1].[VisitLineId] = [Extent5].[Id]
                    WHERE (([Extent3].[EobBatchId] IS NULL) OR (1 = [Extent4].[Status])) AND ([Extent2].[TransactionTypeId] = 1433)
                )  AS [Distinct1]
            )  AS [Project2]
        )  AS [Project4]
    )  AS [Project5]
)  AS [Project6]

查询耗时约为3秒。如果直接使用Group By编写SQL查询,则查询时间为1.5秒,使用的CPU资源少一半。

    SELECT sq.ProcedureId, SUM(sq.PaidAmount), (SELECT TOP(1) Procedures.Code From Procedures Where Procedures.Id = sq.ProcedureId) as Code
FROM(
    SELECT [Extent5].[ProcedureId] AS [ProcedureId],[Extent1].PaidAmount as [PaidAmount]
    FROM     [dbo].[TransactionLines] AS [Extent1]
    INNER JOIN [dbo].[Transactions] AS [Extent2] ON [Extent1].[TransactionId] = [Extent2].[Id]
    LEFT OUTER JOIN [dbo].[Eobs] AS [Extent3] ON [Extent2].[EobId] = [Extent3].[Id]
    LEFT OUTER JOIN [dbo].[EobBatches] AS [Extent4] ON [Extent3].[EobBatchId] = [Extent4].[Id]
    LEFT OUTER JOIN [dbo].[VisitLines] AS [Extent5] ON [Extent1].[VisitLineId] = [Extent5].[Id]
    WHERE (([Extent3].[EobBatchId] IS NULL) OR (1 = [Extent4].[Status])) AND ([Extent2].[TransactionTypeId] = 1433)
) sq
GROUP BY sq.ProcedureId

我写了不同的Linq,但仍然无法强制EF生成GroupBy而不是子查询。

理想情况下,我不想使用函数或手动编写SQL,因为在构建Linq逻辑时有很多条件。

是否有可能强制EF生成与Linq中编写的完全相同的SQL?


1
首先尝试使用显式连接重写您的LINQ查询。 - Evk
1个回答

1
尝试避免使用:

尽量避免使用

context.Procedures.Where(h => h.Id == x.Key).Select(h => h.Code).FirstOrDefault()

通过将 Code 包含在 GroupBy 子句中 - 我知道这似乎是冗余的,但是 EF 在翻译涉及使用键访问器和聚合之外的分组操作时存在问题。
//...
.GroupBy(x => new { Id = x.VisitLine.ProcedureId, x.VisitLine.Procedure.Code })
.Select(x => new
{
    Id = x.Key.Id,
    PaidAmount = x.Sum(t => t.PaidAmount),
    Code = x.Key.Code
}).ToArray();

更新:在我的测试环境中(最新的EF6.1.3),上述代码将生成以下SQL:

SELECT
    1 AS [C1],
    [GroupBy1].[K1] AS [ProcedureId],
    [GroupBy1].[A1] AS [C2],
    [GroupBy1].[K2] AS [Code]
    FROM ( SELECT
        [Extent5].[ProcedureId] AS [K1],
        [Extent6].[Code] AS [K2],
        SUM([Filter1].[PaidAmount]) AS [A1]
        FROM    (SELECT [Extent1].[VisitLineId] AS [VisitLineId], [Extent1].[PaidAmount] AS [PaidAmount]
            FROM    [dbo].[TransactionLine] AS [Extent1]
            INNER JOIN [dbo].[Transaction] AS [Extent2] ON [Extent1].[TransactionId] = [Extent2].[Id]
            LEFT OUTER JOIN [dbo].[Eob] AS [Extent3] ON [Extent2].[EobId] = [Extent3].[Id]
            LEFT OUTER JOIN [dbo].[EobBatch] AS [Extent4] ON [Extent3].[EobBatchId] = [Extent4].[Id]
            WHERE (1433 = [Extent2].[TransactionTypeId]) AND ([Extent3].[EobBatchId] IS NULL OR [Extent4].[Status] = 1) ) AS [Filter1]
        LEFT OUTER JOIN [dbo].[VisitLine] AS [Extent5] ON [Filter1].[VisitLineId] = [Extent5].[Id]
        LEFT OUTER JOIN [dbo].[Procedure] AS [Extent6] ON [Extent5].[ProcedureId] = [Extent6].[Id]
        GROUP BY [Extent5].[ProcedureId], [Extent6].[Code]
    )  AS [GroupBy1]

我预期的结果比实际更好。

更新2:EF是一个奇怪的东西。使用双重投影可以产生所需的结果:

//...
.GroupBy(x => x.VisitLine.ProcedureId)
.Select(x => new
{
    Id = x.Key,
    PaidAmount = x.Sum(t => t.PaidAmount),
})
.Select(x => new
{
    x.Id,
    x.PaidAmount,
    Code = context.Procedures.Where(h => h.Id == x.Id).Select(h => h.Code).FirstOrDefault()
}).ToArray();

这段话的意思是:“which”会产生以下结果:
SELECT
    1 AS [C1],
    [Project2].[ProcedureId] AS [ProcedureId],
    [Project2].[C1] AS [C2],
    [Project2].[C2] AS [C3]
    FROM ( SELECT
        [GroupBy1].[A1] AS [C1],
        [GroupBy1].[K1] AS [ProcedureId],
        (SELECT TOP (1)
            [Extent6].[Code] AS [Code]
            FROM [dbo].[Procedure] AS [Extent6]
            WHERE [Extent6].[Id] = [GroupBy1].[K1]) AS [C2]
        FROM ( SELECT
            [Extent5].[ProcedureId] AS [K1],
            SUM([Filter1].[PaidAmount]) AS [A1]
            FROM   (SELECT [Extent1].[VisitLineId] AS [VisitLineId], [Extent1].[PaidAmount] AS [PaidAmount]
                FROM    [dbo].[TransactionLine] AS [Extent1]
                INNER JOIN [dbo].[Transaction] AS [Extent2] ON [Extent1].[TransactionId] = [Extent2].[Id]
                LEFT OUTER JOIN [dbo].[Eob] AS [Extent3] ON [Extent2].[EobId] = [Extent3].[Id]
                LEFT OUTER JOIN [dbo].[EobBatch] AS [Extent4] ON [Extent3].[EobBatchId] = [Extent4].[Id]
                WHERE (1433 = [Extent2].[TransactionTypeId]) AND ([Extent3].[EobBatchId] IS NULL OR [Extent4].[Status] = 1) ) AS [Filter1]
            LEFT OUTER JOIN [dbo].[VisitLine] AS [Extent5] ON [Filter1].[VisitLineId] = [Extent5].[Id]
            GROUP BY [Extent5].[ProcedureId]
        )  AS [GroupBy1]
    )  AS [Project2]

P.S. 如果不清楚,你具体问题的答案是:

是否可能强制EF生成与Linq中编写的完全相同的SQL代码?

答案是否定的。相反,您应该以某种方式编写LINQ查询,以获得所需(或更接近)的SQL查询。


将第二列添加到分组中并不会强制EF生成GroupBy,它只会向子查询中添加附加列并向其添加附加过滤器,这会显著降低当前查询的性能。在我的情况下,对于过程代码使用子查询要好得多。 - Itan
问题不在于从过程中获取代码,而是EF生成了几个具有相同过滤器的子查询,这比使用GroupBy方法效率要低得多。 - Itan
我并不是说子查询低效 - 我正在尝试使用 LINQ 构造,以避免 EF 生成不必要的子查询。你确定所建议的修改不会转化为更好的 SQL 吗?不过我无法测试,因为没有模型类。 - Ivan Stoev
是的,我知道这一点,因为我的第一个Linq实现是使用groupBy中的这两列编写的。SQL查询也是相同的。 - Itan
双重选择产生了期望的结果。行为很奇怪。我认为,检查将DbExpressions转换为SQL的翻译器的源代码将是有用的。谢谢。 - Itan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接