使用linq在列表中去除重复项。

Question

使用linq在列表中去除重复项。

388

我有一个名为Items的类，它包含属性(Id、Name、Code、Price)

Items列表中存在重复的条目。

例如：

1         Item1       IT00001        $100
2         Item2       IT00002        $200
3         Item3       IT00003        $150
1         Item1       IT00001        $100
3         Item3       IT00003        $150

如何使用linq在列表中去除重复项？

- Prasad

我在Items类中有另一个类作为属性。 - Prasad

你也可以这样做 var set = new HashSet<int>(); var uniques = items.Where(x => set.Add(x.Id));。这样做应该是违法的。 - nawfal

我同意，应该使用ForEach :) - Michael Fulton

11个回答

457

var distinctItems = items.Distinct();

要仅匹配某些属性，请创建自定义的相等比较器，例如：

class DistinctItemComparer : IEqualityComparer<Item> {

    public bool Equals(Item x, Item y) {
        return x.Id == y.Id &&
            x.Name == y.Name &&
            x.Code == y.Code &&
            x.Price == y.Price;
    }

    public int GetHashCode(Item obj) {
        return obj.Id.GetHashCode() ^
            obj.Name.GetHashCode() ^
            obj.Code.GetHashCode() ^
            obj.Price.GetHashCode();
    }
}

然后像这样使用：

var distinctItems = items.Distinct(new DistinctItemComparer());

- Christian Hayter

3

比较器类非常有用，它们可以表达除简单的属性名称比较之外的逻辑。我上个月写了一个新的比较器类，以完成GroupBy无法实现的任务。 - Christian Hayter

当我尝试使用Distinct Comparer时，我收到的错误是：“LINQ to Entities不识别方法'System.Linq.IQueryable1 [DataAccess.HR.Dao.CCS_LOCATION_TBL] Distinct [CCS_LOCATION_TBL]（System.Linq.IQueryable1 [DataAccess.HR.Dao.CCS_LOCATION_TBL]，System.Collections.Generic.IEqualityComparer`1 [DataAccess.HR.Dao.CCS_LOCATION_TBL]）'方法，该方法无法转换为存储表达式。” - user8128167

好的，必须添加 AsEnumerable，请参见 http://stackoverflow.com/questions/19424227/error-on-iequalitycomparer - user8128167

这种东西让我爱上了C#。 - Kellen Stuart

@KolobCanyon 它用于哈希表查找。在 https://dev59.com/pW855IYBdhLWcg3w5Igb#4096774 上有一篇很好的文章。 - Christian Hayter

显示剩余4条评论

48

如果有什么东西影响了你的Distinct查询，你可能想要查看MoreLinq并使用DistinctBy运算符通过id选择不同的对象。

var distinct = items.DistinctBy( i => i.Id );

- tvanfosson

1

Linq 中没有 DistinctBy() 方法。 - fbarikzehy

14

@FereydoonBarikzehy 但他并不是在谈论纯粹的 Linq。文章中提到的是针对 MoreLinq 项目的 Linq… - Ademar

2

在.NET 6+中，现在有DistinctBy()函数。 - undefined

在.NET 6+中，现在有DistinctBy()函数。 - HischT

34

这是我如何使用 Linq 进行分组的方法，希望能对你有所帮助。

var query = collection.GroupBy(x => x.title).Select(y => y.FirstOrDefault());

- Victor Juri

3

@nawfal，我建议使用FirstOrDefault()代替First()。 - sobelito

36

如果我没看错的话，如果Select紧随GroupBy之后出现，并且不存在空的分组情况（分组是从集合内容中刚刚派生出来的），那么在这里使用FirstOrDefault没有任何好处。 - Roy Tinker

29

一个通用的扩展方法：

public static class EnumerableExtensions
{
    public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> enumerable, Func<T, TKey> keySelector)
    {
        return enumerable.GroupBy(keySelector).Select(grp => grp.First());
    }
}

使用示例：

var lstDst = lst.DistinctBy(item => item.Key);

- TOL

1

非常干净的方法 - Steven Ryssaert

2

谢谢，这正是我所需要的，运行良好。 - mig_08

完美的附加方案，与@Salah Akbari的解决方案完美结合： result = result.DistinctBy(r => new { r.NDA, r.Event, r.MovementDate, r.EventDate }).ToList(); - Flou

20

您在列表中删除重复项有三种选择：

Use a a custom equality comparer and then use Distinct(new DistinctItemComparer()) as @Christian Hayter mentioned.

Use GroupBy, but please note in GroupBy you should Group by all of the columns because if you just group by Id it doesn't remove duplicate items always. For example consider the following example:

List<Item> a = new List<Item>
{
    new Item {Id = 1, Name = "Item1", Code = "IT00001", Price = 100},
    new Item {Id = 2, Name = "Item2", Code = "IT00002", Price = 200},
    new Item {Id = 3, Name = "Item3", Code = "IT00003", Price = 150},
    new Item {Id = 1, Name = "Item1", Code = "IT00001", Price = 100},
    new Item {Id = 3, Name = "Item3", Code = "IT00003", Price = 150},
    new Item {Id = 3, Name = "Item3", Code = "IT00004", Price = 250}
};
var distinctItems = a.GroupBy(x => x.Id).Select(y => y.First());

The result for this grouping will be:

{Id = 1, Name = "Item1", Code = "IT00001", Price = 100}
{Id = 2, Name = "Item2", Code = "IT00002", Price = 200}
{Id = 3, Name = "Item3", Code = "IT00003", Price = 150}

Which is incorrect because it considers {Id = 3, Name = "Item3", Code = "IT00004", Price = 250} as duplicate. So the correct query would be:

var distinctItems = a.GroupBy(c => new { c.Id , c.Name , c.Code , c.Price})
                     .Select(c => c.First()).ToList();

3.Override Equal and GetHashCode in item class:

public class Item
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Code { get; set; }
    public int Price { get; set; }

    public override bool Equals(object obj)
    {
        if (!(obj is Item))
            return false;
        Item p = (Item)obj;
        return (p.Id == Id && p.Name == Name && p.Code == Code && p.Price == Price);
    }
    public override int GetHashCode()
    {
        return String.Format("{0}|{1}|{2}|{3}", Id, Name, Code, Price).GetHashCode();
    }
}

Then you can use it like this:

var distinctItems = a.Distinct();

- Salah Akbari

17

使用 Distinct()，但请记住它使用默认的相等比较器来比较值，所以如果您需要更多功能，您需要实现自己的比较器。

请参见http://msdn.microsoft.com/en-us/library/bb348436.aspx以获取示例。

- Brian Rasmussen

我应该注意到，如果集合成员类型是值类型之一，则默认比较器起作用。但是，对于引用类型，csc选择的是哪个默认的相等比较器？引用类型必须有自己的比较器。 - Nuri YILMAZ

6

尝试使用这个扩展方法，希望可以帮到你。

public static class DistinctHelper
{
    public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
    {
        var identifiedKeys = new HashSet<TKey>();
        return source.Where(element => identifiedKeys.Add(keySelector(element)));
    }
}

使用方法：

var outputList = sourceList.DistinctBy(x => x.TargetProperty);

- Kent Aguilar

4

List<Employee> employees = new List<Employee>()
{
    new Employee{Id =1,Name="AAAAA"}
    , new Employee{Id =2,Name="BBBBB"}
    , new Employee{Id =3,Name="AAAAA"}
    , new Employee{Id =4,Name="CCCCC"}
    , new Employee{Id =5,Name="AAAAA"}
};

List<Employee> duplicateEmployees = employees.Except(employees.GroupBy(i => i.Name)
                                             .Select(ss => ss.FirstOrDefault()))
                                            .ToList();

- Arun Kumar

0

另一个解决方法，不太美观但可行。

我有一个XML文件，其中有一个名为“MEMDES”的元素，具有两个属性“GRADE”和“SPD”，用于记录RAM模块信息。在“SPD”中有很多重复的项目。

因此，这是我用来删除重复项的代码：

        IEnumerable<XElement> MList =
            from RAMList in PREF.Descendants("MEMDES")
            where (string)RAMList.Attribute("GRADE") == "DDR4"
            select RAMList;

        List<string> sellist = new List<string>();

        foreach (var MEMList in MList)
        {
            sellist.Add((string)MEMList.Attribute("SPD").Value);
        }

        foreach (string slist in sellist.Distinct())
        {
            comboBox1.Items.Add(slist);
        }

- Rex Hsu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Freddy · Accepted Answer

748

var distinctItems = items.GroupBy(x => x.Id).Select(y => y.First());

- Freddy

41

谢谢 - 我希望避免编写比较器类，所以很高兴这个方法可行 :) - Jen

12

这个解决方案甚至允许使用一个决胜规则：按照标准消除重复项！ - Adriano Carneiro

6

但会有一点额外开销！ - Amirhossein Mehrvarzi

2

但是，正如Victor Juri在下面建议的那样：使用FirstOrDefault。难以相信，解决方案可以如此简单（无需自定义相等比较器）。 - CyberHawk

9

您可以使用多个属性进行分组： List<XYZ> MyUniqueList = MyList.GroupBy(x => new { x.Column1, x.Column2 }).Select(g => g.First()).ToList(); - Sumit Joshi

@SumitJoshi 谢谢，这个解决方案也适用于比较我的自定义类型的两个可观察集合！ - Kefka