LINQ查询优化:如何针对缓慢分组进行优化?

3

我有一个LINQ查询,通过Entity Framework Code First从SQL数据库获取数据。这个方法可以工作,但速度非常慢。

以下是原始查询:

      var tmpResult = from mdv in allMetaDataValues
                  where mdv.Metadata.InputType == MetadataInputType.String && mdv.Metadata.ShowInFilter && !mdv.Metadata.IsHidden && !string.IsNullOrEmpty(mdv.ValueString)
                  group mdv by new
                  {
                    mdv.ValueString,
                    mdv.Metadata
                  } into g
                  let first = g.FirstOrDefault()
                  select new
                  {
                    MetadataTitle = g.Key.Metadata.Title,
                    MetadataID = g.Key.Metadata.ID,
                    CollectionColor = g.Key.Metadata.Collection.Color,
                    CollectionID = g.Key.Metadata.Collection.ID,

                    MetadataValueCount = 0,
                    MetadataValueTitle = g.Key.ValueString,
                    MetadataValueID = first.ID
                  };

这是原始查询生成的SQL语句:
{SELECT 
0 AS [C1], 
[Project4].[Title] AS [Title], 
[Project4].[ID] AS [ID], 
[Extent9].[Color] AS [Color], 
[Project4].[Collection_ID] AS [Collection_ID], 
[Project4].[ValueString] AS [ValueString], 
[Project4].[C1] AS [C2]
FROM   (SELECT 
    [Project2].[ValueString] AS [ValueString], 
    [Project2].[ID] AS [ID], 
    [Project2].[Title] AS [Title], 
    [Project2].[Collection_ID] AS [Collection_ID], 
    (SELECT TOP (1) 
        [Filter4].[ID1] AS [ID]
        FROM ( SELECT [Extent6].[ID] AS [ID1], [Extent6].[ValueString] AS [ValueString], [Extent7].[Collection_ID] AS [Collection_ID1], [Extent8].[ID] AS [ID2], [Extent8].[InputType] AS [InputType], [Extent8].[ShowInFilter] AS [ShowInFilter], [Extent8].[IsHidden] AS [IsHidden1]
            FROM   [dbo].[MetadataValue] AS [Extent6]
            LEFT OUTER JOIN [dbo].[Media] AS [Extent7] ON [Extent6].[Media_ID] = [Extent7].[ID]
            INNER JOIN [dbo].[Metadata] AS [Extent8] ON [Extent6].[Metadata_ID] = [Extent8].[ID]
            WHERE ( NOT (([Extent6].[ValueString] IS NULL) OR (( CAST(LEN([Extent6].[ValueString]) AS int)) = 0))) AND ([Extent7].[IsHidden] <> cast(1 as bit))
        )  AS [Filter4]
        WHERE (2 =  CAST( [Filter4].[InputType] AS int)) AND ([Filter4].[ShowInFilter] = 1) AND ([Filter4].[IsHidden1] <> cast(1 as bit)) AND ([Filter4].[Collection_ID1] = @p__linq__0) AND (([Project2].[ValueString] = [Filter4].[ValueString]) OR (([Project2].[ValueString] IS NULL) AND ([Filter4].[ValueString] IS NULL))) AND (([Project2].[ID] = [Filter4].[ID2]) OR (1 = 0))) AS [C1]
    FROM ( SELECT 
        [Distinct1].[ValueString] AS [ValueString], 
        [Distinct1].[ID] AS [ID], 
        [Distinct1].[Title] AS [Title], 
        [Distinct1].[Collection_ID] AS [Collection_ID]
        FROM ( SELECT DISTINCT 
            [Filter2].[ValueString] AS [ValueString], 
            [Filter2].[ID3] AS [ID], 
            [Filter2].[InputType1] AS [InputType], 
            [Filter2].[Title1] AS [Title], 
            [Filter2].[ShowInFilter1] AS [ShowInFilter], 
            [Filter2].[IsHidden2] AS [IsHidden], 
            [Filter2].[Collection_ID2] AS [Collection_ID]
            FROM ( SELECT [Filter1].[ValueString], [Filter1].[Collection_ID3], [Filter1].[IsHidden3], [Filter1].[ID3], [Filter1].[InputType1], [Filter1].[Title1], [Filter1].[ShowInFilter1], [Filter1].[IsHidden2], [Filter1].[Collection_ID2]
                FROM ( SELECT [Extent1].[ValueString] AS [ValueString], [Extent2].[Collection_ID] AS [Collection_ID3], [Extent4].[IsHidden] AS [IsHidden3], [Extent5].[ID] AS [ID3], [Extent5].[InputType] AS [InputType1], [Extent5].[Title] AS [Title1], [Extent5].[ShowInFilter] AS [ShowInFilter1], [Extent5].[IsHidden] AS [IsHidden2], [Extent5].[Collection_ID] AS [Collection_ID2]
                    FROM     [dbo].[MetadataValue] AS [Extent1]
                    LEFT OUTER JOIN [dbo].[Media] AS [Extent2] ON [Extent1].[Media_ID] = [Extent2].[ID]
                    INNER JOIN [dbo].[Metadata] AS [Extent3] ON [Extent1].[Metadata_ID] = [Extent3].[ID]
                    LEFT OUTER JOIN [dbo].[Metadata] AS [Extent4] ON [Extent1].[Metadata_ID] = [Extent4].[ID]
                    LEFT OUTER JOIN [dbo].[Metadata] AS [Extent5] ON [Extent1].[Metadata_ID] = [Extent5].[ID]
                    WHERE ( NOT (([Extent1].[ValueString] IS NULL) OR (( CAST(LEN([Extent1].[ValueString]) AS int)) = 0))) AND ([Extent2].[IsHidden] <> cast(1 as bit)) AND (2 =  CAST( [Extent3].[InputType] AS int)) AND ([Extent3].[ShowInFilter] = 1)
                )  AS [Filter1]
                WHERE [Filter1].[IsHidden3] <> cast(1 as bit)
            )  AS [Filter2]
            WHERE [Filter2].[Collection_ID3] = @p__linq__0
        )  AS [Distinct1]
    )  AS [Project2] ) AS [Project4]
LEFT OUTER JOIN [dbo].[Collection] AS [Extent9] ON [Project4].[Collection_ID] = [Extent9].[ID]}

如果我们删除 "let first = g.FirstOrDefault()" 并将 "MetadataValueID = first.ID" 改为 "MetadataValueID = 0",这样我们只需要一个固定的 ID = 0 用于测试目的,那么数据加载非常快,生成的查询本身大小仅为原始查询的一半。
因此,看来这部分代码使得查询变得非常缓慢:
let first = g.FirstOrDefault()
...
  MetadataValueID = first.ID
};

如何进行重写?即使我试图重写代码,仍然很慢:

MetadataValueID = g.Select(x => x.ID).FirstOrDefault()

或者

let first = g.Select(x => x.ID).FirstOrDefault()
...
  MetadataValueID = first
};

有什么建议吗?


很难在没有看到架构或生成的SQL的情况下知道,但我认为FirstOrDefault()是有风险的 - 如果没有任何内容,那么 first.ID将出错。也许EF会因此而感到困惑。我建议尝试更改为g.First()并将其移到结果内 - ... into g select new{ ..., MetadataValueID = g.First().ID } - Rhumborl
@Rhumborl 我已经添加了原始生成的 SQL。 - juFo
allMetadataValues代表什么?我期望看到像dbContext.MetadataValues这样的东西。 - Vojtěch Dohnal
它们是先前选择的结果。 - juFo
哪个 EF 版本?旧版本会生成一些可怕的低效代码……这是一个已知的问题,正在努力解决。 - TomTom
实体框架 6.1 - juFo
2个回答

1

使用EF时,我一直感觉它在有效地翻译像g.Key.Metadata.Collection这样的内容时存在问题,因此我尝试更明确地连接并仅包括结果所必需的字段。您可以使用仓储模式中的include代替连接。

然后您的查询将如下所示:

   from mdv in allMetaDataValues.Include("Metadata").Include("Metadata.Collection")
   where mdv.Metadata.InputType == MetadataInputType.String && 
         mdv.Metadata.ShowInFilter && 
         !mdv.Metadata.IsHidden && 
         !string.IsNullOrEmpty(mdv.ValueString)
   group mdv by new
   {
     MetadataID = mdv.Metadata.ID,
     CollectionID = mdv.Metadata.Collection.ID,
     mdv.Metadata.Title,
     mdv.Metadata.Collection.Color,
     mdv.ValueString
   } into g
   let first = g.FirstOrDefault().ID
   select new
   {
     MetadataTitle = g.Key.Title,
     MetadataID = g.Key.MetadataID,
     CollectionColor = g.Key.Color,
     CollectionID = g.Key.CollectionID,
     MetadataValueCount = 0,
     MetadataValueTitle = g.Key.ValueString,
     MetadataValueID = first
   }

一个很好的用于使用linq的工具是LinqPad

问题也在于:

  let first = g.FirstOrDefault().ID

无法轻松地将其翻译为SQL,请参见this answer。但是,这种重写至少简化了底层查询。对于我来说仍然不清楚,为什么需要首先从一组中获取ID而不使用 orderby

可以像这样重写:

let first =  (from f in allMetaDataValues
              where f.Metadata.ID == g.Key.MetadataID && 
                    f.ValuesString == g.Key.ValuesString select f.ID)
             .FirstOrDefault()

这样,您就不需要让 EF 为您编写查询,而是可以准确地指定如何进行选择。 为了加快查询速度,您还可以考虑根据生成的查询向数据库添加索引 - 即使用在此 let first 查询的 where 子句中使用的两个列建立索引。

我正在使用代码库,所以无法直接访问数据库进行连接操作。 - juFo
1
@juFo 编辑了答案以适用于仓储模式,并添加了 let first 子句的重写。 - Vojtěch Dohnal
似乎 g.FirstOrDefault().ID 的重写可以解决最终的问题。 - juFo

0
尝试以下解决方案。
FirstOrDefault()替换为.Take(1)FirstOrDefault()不是延迟加载。
var tmpResult = from mdv in allMetaDataValues
                  where mdv.Metadata.InputType == MetadataInputType.String && mdv.Metadata.ShowInFilter && !mdv.Metadata.IsHidden && !string.IsNullOrEmpty(mdv.ValueString)
                  group mdv by new
                  {
                    mdv.ValueString,
                    mdv.Metadata
                  } into g
                  let first = g.Take(1)
                  select new
                  {
                    MetadataTitle = g.Key.Metadata.Title,
                    MetadataID = g.Key.Metadata.ID,
                    CollectionColor = g.Key.Metadata.Collection.Color,
                    CollectionID = g.Key.Metadata.Collection.ID,

                    MetadataValueCount = 0,
                    MetadataValueTitle = g.Key.ValueString,
                    MetadataValueID = first.ID
                  };

无法使用first.ID,因为Take(1)返回一个IEnumerable<MetadataValue>。 - juFo
此外,let first = g.FirstOrDefault() 已经被翻译为 SELECT TOP (1) - Vojtěch Dohnal

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接