统计 List<List<T>> 中元素的数量

Question

统计 List<List<T>> 中元素的数量

27

我有一个 List<List<T>>，如何以最快的方式将其所有元素计算为单个List<T>？

到目前为止我已经使用了

List<int> result = listOfLists
  .SelectMany(list => list)
  .Distinct()
  .ToList().Count;

但这实际上创建了一个列表，然后计算其中不是很好的元素。

- kasperhj

Linq表达式来自https://dev59.com/RHRB5IYBdhLWcg3w7rmV。 - kasperhj

最快的方法可能是不使用LINQ，而是坚持使用for循环。 - RichK

3

不需要使用.ToList()，可以使用Count()扩展方法。 - Homam

4个回答

16

如果您需要在列表之间消除重复项，我建议使用一个简单的嵌套循环和 HashSet。它将 SelectMany 和 Distinct 操作结合到集合插入逻辑中，并且应该更快，因为 HashSet 具有 O(1) 的查找时间。内部的 Distinct() 可能实际上使用了类似的算法，但这种方法完全避免了构造单个列表。

var set = new HashSet<T>();
foreach (var list in listOfLists)
{
    foreach (var item in list)
    {
        set.Add(item);
    }
}
var result = set.Count;

- tvanfosson

由于一些委托调用被消除，它应该会更快。Distinct 已经使用了一个具有几乎相同逻辑的 HashSet。而且我并没有看到你在这里使用 SelectMany。因此你目前的代码是相当不同的。 - CodesInChaos

@Homam -- 很好的发现，这应该是一个嵌套循环。基本上是一样的，只是不需要构建一个内部组合列表来执行去重操作。这样可以节省内存并提高效率。 - tvanfosson

你可以忽略 Contains。如果项已经存在，HashSet<T>.Add 不会报错。 - CodesInChaos

@CodeInChaos - 我希望微软能够保持一致，一个字典会抛出异常--但是话说回来，我想在那种情况下你会插入一个重复的键而不是一个重复的项。 - tvanfosson

也许可以使用 HashSet<T>.UnionWith 来使代码更短（并且更快？） - Shurdoof

1

@Shurdoof 我非常确定那样会更慢。这个答案的重点是它很快。为了更好的可读性，可以使用Homam的答案。 - CodesInChaos

12

要计算列表中所有列表的元素数量，您可以使用聚合运算符：

int count = listOfLists.Sum(l => l.Distinct().Count());

- Hasanain

3

这并不会消除列表之间的重复项，只会在每个列表内部消除。 - tvanfosson

2

@lejon，为什么你接受一个执行与你原本代码不同的答案呢？tvanfosson和Homam的回答才是正确的。 - CodesInChaos

1

我想有机会回答这个问题，只是为了强调我们何时应该使用linq和经典的for循环。不幸的是，今天人们并不太关心性能，因为我们已经习惯在非常强大的计算机上工作。无论如何，只需尝试下面的代码，您就会发现Linq比经典的for版本慢100倍以上。只有当您需要编写的表达式真正复杂且希望使其更易读时，才应使用Linq。我没有花时间研究下面展示的解决方案，因为我想专注于性能。

public static void Main(string [] arg)
{
    //create the list
    List<List<string>> listOfList = new List<List<string>>()
                                      {
                                          new List<string>()
                                              {
                                                  "1.1","2.2"
                                              }
                                      ,
                                       new List<string>()
                                              {
                                                  "2.1","2.2","2.3"
                                              }
                                      };
    //stopwatch using Linq
    Stopwatch stopwatch=new Stopwatch();
    stopwatch.Start();

    int totalUsingLinq = listOfList.Sum(x => x.Count);

    stopwatch.Stop();
    Console.WriteLine("Using Linq:{0}",stopwatch.Elapsed); //00005713

    int totalUsingFor = 0;
    //stopwatch using classic for 
    stopwatch.Reset();
    stopwatch.Start();
    totalUsingFor = 0;
    for(int i=0;i<listOfList.Count;i++)
    {
       var mainItem = listOfList[i];
        if(mainItem!=null)
        {
            totalUsingFor += mainItem.Count;
        }
    }
    stopwatch.Stop();
    Console.WriteLine("Using for:{0}", stopwatch.Elapsed); //0000010

}

使用 for 进行去重（仅举例）。在这种情况下，我创建了一个非常“瓶颈”的函数来执行此操作，但它仍然更快。

 public class Program
    {
      public static void Main(string[] arg)
        {
            //create the list
            List<List<string>> listOfList = new List<List<string>>()
                                      {
                                          new List<string>()
                                              {
                                                  "1.1","2.2","1.1","1.1","2.2","1.1","1.1","2.2","1.1","1.1"
                                              }
                                      ,
                                       new List<string>()
                                              {
                                                  "2.1","2.2","2.3","2.3","1.1","2.2","1.1","1.1","2.2","1.1","1.1","2.2","1.1","1.1","2.2","1.1","1.1","2.2","1.1"
                                              }
                                      };
            //stopwatch using Linq
            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();

            int totalUsingLinq = listOfList.Sum(l => l.Distinct().Count());


            stopwatch.Stop();
            Console.WriteLine("Using Linq:{0}", stopwatch.Elapsed); //000012150    
            int totalUsingFor = 0;
            //stopwatch using classic for 
            stopwatch.Reset();
            stopwatch.Start();
            totalUsingFor = 0;
            for (int i = 0; i < listOfList.Count; i++)
            {
                var mainItem = listOfList[i];
                if (mainItem != null)
                {
                    for(int y=0;y<mainItem.Count;y++)
                    {
                      if(mainItem[y]!=null)
                      {
                          totalUsingFor++;
                          NullDuplicateItems(y, ref mainItem);
                      }   
                    }
                }
            }
            stopwatch.Stop();
            Console.WriteLine("Using for:{0}", stopwatch.Elapsed); //0009440
        }

        public static void NullDuplicateItems(int index,ref List<string > list)
        {
            var item = list[index];
            for(int i=index+1;i<list.Count;i++)
            {
                if(list[i]==item)
                {
                    list[i] = null;
                }
            }
        }

    }

- Massimiliano Peluso

你缺少“Distinct”功能，因此这是与 OP 问题完全不同的问题。对于像你示例中那样小的列表，这两个示例可能需要几微秒。因此，除非它们成为瓶颈，否则在实践中，这是使用 Linq 而不是经典循环的一个例子。 - CodesInChaos

当我将它们每个运行了100,000次时，因子是40，而不是> 100。 - CodesInChaos

我没有花太多时间在这段代码上，就算你加了distinct也会快至少500倍。在这个例子中，我没有创建一个超大的列表，因为这只是一个例子，但是如果你想试试，可以用一个包含成千上万项的列表再运行我的代码，你会得到同样的结果。 - Massimiliano Peluso

因为你运行的机器是多任务处理的，所以因子为40而不是>100是正常的。这可能是因为窗口在进行某些处理时影响了你的测试。因此，你应该多次运行它，然后计算平均值。但是，经典的for循环始终会更快：因子为40已经很大了！ - Massimiliano Peluso

1

这就是为什么你要按照经典方式编写可能成为性能瓶颈的代码。但如果不是，就别费心了。40倍于几乎为零仍然是几乎为零。 - CodesInChaos

显示剩余4条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Homam · Accepted Answer

通过使用LINQ，我认为您的代码可以进行一些更改，无需使用 .ToList()，只需要调用Count()扩展即可，如下所示：

int result = listOfLists.SelectMany(list => list).Distinct().Count();