如何使用LINQ从列表中获取重复项？

Question

如何使用LINQ从列表中获取重复项？

183

我有一个类似这样的 List<string>:

List<String> list = new List<String>{"6","1","2","4","6","5","1"};

我需要将列表中的重复项放入一个新列表中。现在我使用了一个嵌套的for循环来完成这个任务。

得到的list将包含{"6", "1"}。

是否有使用LINQ或lambda表达式的方法来完成这个任务？

- Thorin Oakenshield

3

如果输入是"1", "1", "1"，那么结果列表中应该有多少个元素？ - Mark Byers

1

@Mark Bayers：结果列表应包含“1”，“1” :-) - Thorin Oakenshield

几乎相同：https://dev59.com/fU_Sa4cB1Zd3GeqP8gra - nawfal

9个回答

184

以下是一种实现方法：

List<String> duplicates = lst.GroupBy(x => x)
                             .Where(g => g.Count() > 1)
                             .Select(g => g.Key)
                             .ToList();

GroupBy按相同的元素进行分组，Where筛选出只出现一次的元素，留下重复的元素。

- Mark Byers

虽然不能提供与问题要求完全相同的结果，但在大多数其他情况下都会很有用。 - Heiner

38

这里还有一种选择：

var list = new List<string> { "6", "1", "2", "4", "6", "5", "1" };

var set = new HashSet<string>();
var duplicates = list.Where(x => !set.Add(x));

- LukeH

我不认为那个点踩的人会解释这个回答有什么问题吧？ - LukeH

2

哈哈，为创新加一分 :) 不仅如此，这正好提供了OP想要的东西。问题在于，如果查询被第二次枚举，则可能会给出错误的答案（为了防止这种情况，每次必须清除集合或初始化一个新集合）。 - nawfal

或者只需在“duplicates”构造末尾添加“.ToList()”。 - Miral

5

这个 downvote 不是我投的，但我真的认为在 .Where 中使用副作用应该被避免，所以这可能是原因。 - Paul Groke

29

我知道这不是原问题的答案，但你可能会在这里遇到同样的问题。

如果你想要结果中所有的重复项，以下方法有效。

var duplicates = list
    .GroupBy( x => x )               // group matching items
    .Where( g => g.Skip(1).Any() )   // where the group contains more than one item
    .SelectMany( g => g );           // re-expand the groups with more than one item

在我的情况下，我需要所有的重复项，以便我可以在用户界面中将它们标记为错误。

- Scott Langham

这是获取（所有）重复项的正确解决方案。 - TaW

19

我根据@Lee对OP的回答编写了这个扩展方法。请注意，默认参数使用了C# 4.0（需要）。然而，在C# 3.0中，重载的方法调用也足够。

/// <summary>
/// Method that returns all the duplicates (distinct) in the collection.
/// </summary>
/// <typeparam name="T">The type of the collection.</typeparam>
/// <param name="source">The source collection to detect for duplicates</param>
/// <param name="distinct">Specify <b>true</b> to only return distinct elements.</param>
/// <returns>A distinct list of duplicates found in the source collection.</returns>
/// <remarks>This is an extension method to IEnumerable&lt;T&gt;</remarks>
public static IEnumerable<T> Duplicates<T>
         (this IEnumerable<T> source, bool distinct = true)
{
     if (source == null)
     {
        throw new ArgumentNullException("source");
     }

     // select the elements that are repeated
     IEnumerable<T> result = source.GroupBy(a => a).SelectMany(a => a.Skip(1));

     // distinct?
     if (distinct == true)
     {
        // deferred execution helps us here
        result = result.Distinct();
     }

     return result;
}

- Michael

11

  List<String> list = new List<String> { "6", "1", "2", "4", "6", "5", "1" };

    var q = from s in list
            group s by s into g
            where g.Count() > 1
            select g.First();

    foreach (var item in q)
    {
        Console.WriteLine(item);

    }

- explorer

10

希望这可以帮助。

int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };

var duplicates = listOfItems 
    .GroupBy(i => i)
    .Where(g => g.Count() > 1)
    .Select(g => g.Key);

foreach (var d in duplicates)
    Console.WriteLine(d);

- Thakur

3

我曾尝试通过对象列表进行解决，但是因为我试图重新装入组列表到原始列表中，所以遇到了问题。因此，我想出了通过循环遍历组来重装具有重复元素的项目到原始列表中。

public List<MediaFileInfo> GetDuplicatePictures()
{
    List<MediaFileInfo> dupes = new List<MediaFileInfo>();
    var grpDupes = from f in _fileRepo
                   group f by f.Length into grps
                   where grps.Count() >1
                   select grps;
    foreach (var item in grpDupes)
    {
        foreach (var thing in item)
        {
            dupes.Add(thing);
        }
    }
    return dupes;
}

- Jamie L.

0

到目前为止，所有提到的解决方案都执行了GroupBy操作。即使我只需要第一个重复项，集合中的所有元素至少被枚举一次。

下面的扩展函数在找到重复项后立即停止枚举。如果请求下一个重复项，它会继续执行。

像LINQ中的其他情况一样，有两个版本，一个带有IEqualityComparer，一个没有。

public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource> source)
{
    return source.ExtractDuplicates(null);
}
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource source,
    IEqualityComparer<TSource> comparer);
{
    if (source == null) throw new ArgumentNullException(nameof(source));
    if (comparer == null)
        comparer = EqualityCompare<TSource>.Default;

    HashSet<TSource> foundElements = new HashSet<TSource>(comparer);
    foreach (TSource sourceItem in source)
    {
        if (!foundElements.Contains(sourceItem))
        {   // we've not seen this sourceItem before. Add to the foundElements
            foundElements.Add(sourceItem);
        }
        else
        {   // we've seen this item before. It is a duplicate!
            yield return sourceItem;
        }
    }
}

使用方法：

IEnumerable<MyClass> myObjects = ...

// check if has duplicates:
bool hasDuplicates = myObjects.ExtractDuplicates().Any();

// or find the first three duplicates:
IEnumerable<MyClass> first3Duplicates = myObjects.ExtractDuplicates().Take(3)

// or find the first 5 duplicates that have a Name = "MyName"
IEnumerable<MyClass> myNameDuplicates = myObjects.ExtractDuplicates()
    .Where(duplicate => duplicate.Name == "MyName")
    .Take(5);

对于所有这些linq语句，集合只会被解析直到找到所请求的项。其余的序列不会被解释。

在我看来，这是一个值得考虑的效率提升。

- Harald Coppoolse

1

只是一个小建议，你可以将 HashSet<T>.Contains + Add 的组合简化为只使用 Add。这样可以避免额外的查找成本。例如，在你的情况下：if (!foundElements.Add(sourceItem)) yield return sourceItem; 这就是你所需要的。 - nawfal

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Lee · Accepted Answer

253

var duplicates = lst.GroupBy(s => s)
    .SelectMany(grp => grp.Skip(1));

请注意，这将返回所有的重复项。因此，如果您只想知道源列表中有哪些项目是重复的，您可以对结果序列应用Distinct或使用Mark Byers提供的解决方案。

- Lee

6

如果你想进行不区分大小写的比较，可以这样做：lst.GroupBy(s => s.ToUpper()).SelectMany(grp => grp.Skip(1)); - John

2

@JohnJB - 在GroupBy中有一个重载函数，它允许您提供一个IEqualityComparer以代替使用ToUpper进行不区分大小写的比较。 - Lee

Skip(1) 跳过了第一个项目 :( 你知道如果我想要所有项目应该怎么做吗？ - ParPar

2

@ParPar - 这个答案是否满足您的需求？（https://dev59.com/-m865IYBdhLWcg3wcOLG#19817834） - Lee

2

正如@ScottLangham所指出的那样，这实际上并没有返回所有重复记录，它返回每个组中第一次出现以外的所有重复记录。因此，如果您想要一个仅包含不同重复值的列表，则使用Distinct方法的这个答案是正确的方式，但是如果您想要所有重复行，则我发现Scott的答案是正确的方式。 - Robert Shattock

显示剩余2条评论