ConcurrentDictionary 的性能表现

Question

ConcurrentDictionary 的性能表现

c#.netdictionarytuplesconcurrentdictionary

3

我在解决这个问题上遇到了困难，非常感谢任何帮助。

我正在处理一个现有项目。我添加了逻辑来计算值的组合，确保我们不会超过某个限制。例如，给定这个数据表的列：
Name|Age|description
代码确保我们没有超过Name、Age的K种组合。我有包含百万对这样数据的数据。但是在某些情况下，程序会崩溃或卡住，虽然我没有看到任何内存问题或CPU问题。我使用元组（Name，Age）作为键的ConcurrentDictionary实现了此限制，并且我正在使用C＃.NET 6 ..
我可以看到尝试向DS添加元素所需的时间变得非常长。

编辑：添加一些代码片段，虽然这是很多内部实现，但我相信这些是理解问题的主要代码部分：

这是负责限制键的组件：

    protected override Result Process(Row row)
    {
        var valueToLimit = GetValueToLimit(row);
        var result = _values.TryAdd(valueToLimit);
        }
// some logic related to the case of crossing the limit
        return Result.Success;
    }

    protected abstract T GetValueToLimit(Row row);
}

对于我的情况，实现了函数GetValueToLimit：

protected override string[] GetValueToLimit(Row row)
{ // takes the relevant values from an input record, according to the requested columns. 
    return _columnIndices.Select(x => row.GetValue(x)).ToArray();
}

最后，这是并发HashSet实现的一些部分：

    public class BoundedConcurrentHashSet<K> : ConcurrentHashSet<K>
{
 ..
    public override Result TryAdd(K element)
    {
        if (Dictionary.Count() < _maxCapacity)
        {
            return base.TryAdd(element);
        }
        else
        {
            return Contains(element) ? Result.AlreadyInHash : Result.ExceedsCapacity;
        }
    }

使用C# ConcurrentDictionary 实现的ConcurrentHashSet:

public class ConcurrentHashSet<K>
{
    public ConcurrentHashSet(IEqualityComparer<K> equalityComparer)
    {
        Dictionary = new ConcurrentDictionary<K, object>(equalityComparer);
    }

    protected ConcurrentDictionary<K, object> Dictionary { get; }

    public int Count => Dictionary.Count;

    public IEnumerable<K> Elements => Dictionary.Keys;

    public virtual Result TryAdd(K element)
    {
        return Dictionary.TryAdd(element, null) ? dResult.Added : Result.AlreadyInHash;
    }

    public bool Contains(K element)
    {
        return Dictionary.ContainsKey(element);
    }

请分享任何可以帮助的想法。

谢谢

- Nika

2

这个字典有什么价值？你能分享一些与这个字典交互的代码吗？ - jalepi

@jalepi 值是字符串元组。我添加了代码。 - Nika

有点相关：如何提高C#中ConcurrentDictionary.Count的性能 - Theodor Zoulias

这个问题也与以下问题相关：具有上限的线程安全集合。 - Theodor Zoulias

2个回答

1

我发现在迭代和添加元素时使用普通集合，并在此过程中加锁，比使用并发集合要快得多。随着集合中元素的增加，这一点变得越来越明显。

- Charles Owen

1

使用lock枚举非线程安全的集合可能会很棘手，因为枚举大型集合可能需要相当长的时间，并且在此期间想要与集合交互的所有其他线程都将被阻塞。希望您不需要经常枚举它！ - Theodor Zoulias

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Theodor Zoulias · Accepted Answer

这是你的问题：

public override ConcurrentHashSetAddResult TryAdd(K element)
{
    if (Dictionary.Count() < _maxCapacity)
    {
        return base.TryAdd(element);
    }
    //...

...其中Dictionary是底层的ConcurrentDictionary<K, object>对象。

Count()是一个LINQ方法，它要么从开始到结束枚举可枚举序列，要么返回Count属性，前提是序列实现了ICollection<TSource>接口。 ConcurrentDictionary<K, V>实现了此接口，因此确实使用了Count属性。这是此属性文档中的内容：

此属性具有快照语义，并表示在访问该属性时 ConcurrentDictionary<TKey,TValue> 中的项目数。

“快照语义”是重要的部分。这意味着为了获取“Count”，字典必须被完全锁定，暂时性地。当一个线程读取“Count”时，所有其他线程都必须等待。没有并发。

在GitHub上曾经提出过一个ApproximateCount属性，但它没有得到足够的关注，现在已经关闭。该属性将允许您使用大大减少的开销实现BoundConcurrentHashSet功能，但行为也不太准确：可能会超出_maxCapacity配置。

我的建议是放弃ConcurrentDictionary<K, object>，并使用一个带有lock保护的HashSet<T>作为底层存储。