Task.Factory.StartNew与Parallel.Invoke的区别

Question

Task.Factory.StartNew与Parallel.Invoke的区别

c#parallel-processing

38

我的应用程序中我以并行方式（没有返回值）执行几十个到几百个动作。

哪种方法最优：

在循环中使用 Action[] 并使用 Task.Factory.StartNew 执行操作

Task.Factory.StartNew(() => someAction());
使用 Parallel 类以及 Action[] 执行操作

Parallel.Invoke(actions);

这两种方法等效吗？有什么性能影响吗？

编辑

我进行了一些性能测试，在我的机器上（2个CPU，每个CPU 2个核心），结果似乎非常相似。我不确定在其他机器上（例如1个CPU的机器）会怎样。也不确定（不知道如何进行精确测试）内存占用情况。

- Alexandar

3

我认为它们大致相等。你的分析仪告诉你什么？ - Robert Harvey

你考虑过基准测试吗？ - Mitch Wheat

1

看看这篇关于2种方法的不错文章：http://blackrabbitcoder.net/archive/2012/12/20/c.net-little-wonders-the-parallel.invoke-method.aspx。我认为它们在性能上没有区别，只是Parallel.Invoke更容易些。 - Leon Cullens

1

针对您描述的情况，我同意其他评论者的观点，两者基本相同；不过，启动自己的任务将让您对诸如单独继续、取消等事项具有更精细的控制，是吗？ - JerKimball

3个回答

14

我使用了StriplingWarror的测试来查找差异的来源。我之所以这样做，是因为当我用Reflector查看代码时，Parallel类与创建一堆任务并让它们运行没有任何区别。

从理论上讲，这两种方法在运行时间上应该是等效的。但是，(不太现实的)空动作测试表明，Parallel类要快得多。

任务版本几乎花费所有时间都在创建新任务，这会导致许多垃圾收集。你看到的速度差异纯粹是由于你创建了许多任务，这些任务很快就变成了垃圾。

相比之下，Parallel类会创建自己的派生任务类，在所有CPU上并发运行。在所有核心上只有一个物理任务正在运行。现在同步是在任务委托内部发生的，这解释了Parallel类速度更快的原因。

ParallelForReplicatingTask task2 = new ParallelForReplicatingTask(parallelOptions, delegate {
        for (int k = Interlocked.Increment(ref actionIndex); k <= actionsCopy.Length; k = Interlocked.Increment(ref actionIndex))
        {
            actionsCopy[k - 1]();
        }
    }, TaskCreationOptions.None, InternalTaskOptions.SelfReplicating);
task2.RunSynchronously(parallelOptions.EffectiveTaskScheduler);
task2.Wait();

那么什么比这更好呢？最好的任务是从未运行的任务。如果你需要创建许多任务，以至于它们成为垃圾回收器的负担，那么你应该远离任务API并坚持使用Parallel类，它可以直接在所有核心上进行并行执行而不创建新任务。

如果你需要变得更快，可能手动创建线程并使用手动优化的数据结构，以给你最大的速度和访问模式的性能解决方案是最高效的。但是由于TPL和Parallel API已经被大量调整过了，你成功做到这一点的可能性很小。通常，你需要使用其中的一个重载来配置你的运行任务或Parallel类，以用更少的代码实现相同的功能。

但是，如果你有非标准的线程模式，可能最好不要使用TPL，以充分利用你的核心。即使Stephen Toub也提到，TPL API并不是为了超高的性能而设计的，而主要目的是为“普通”程序员让线程更容易。要在特定情况下击败TPL，你需要非常高超的水平，需要了解CPU缓存行、线程调度、内存模型、JIT代码生成等方面的知识，在你的具体场景中提出更好的解决方案。

- Alois Kraus

我对这句话的意思不太理解：Parallel类创建了自己的派生任务类，以便在所有CPU上并发运行。所有核心只有一个物理任务在运行。现在，同步是在任务委托内发生的，这解释了Parallel类速度更快的原因。 - JohnB

14

从整体来看，在处理大量任务的情况下，两种方法的性能差异微不足道。

Parallel.Invoke基本上为您执行了Task.Factory.StartNew()。所以我认为可读性在这里更加重要。

此外，正如StriplingWarrior所提到的，Parallel.Invoke为您执行了一个WaitAll（阻止代码直到所有任务完成），因此您也不需要进行这样的操作。如果您希望任务在后台运行而不关心它们何时完成，则需要使用Task.Factory.StartNew()。

- Colin Mackay

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- StriplingWarrior · Accepted Answer

这两者最重要的区别在于Parallel.Invoke会等待所有操作完成后再继续执行代码，而StartNew则会继续执行下一行代码，允许任务以自己的方式完成。这种语义上的区别应该是您首要（也可能是唯一）考虑的因素。但出于信息目的，此处提供一个基准测试：

/* This is a benchmarking template I use in LINQPad when I want to do a
 * quick performance test. Just give it a couple of actions to test and
 * it will give you a pretty good idea of how long they take compared
 * to one another. It's not perfect: You can expect a 3% error margin
 * under ideal circumstances. But if you're not going to improve
 * performance by more than 3%, you probably don't care anyway.*/
void Main()
{
    // Enter setup code here
    var actions2 =
    (from i in Enumerable.Range(1, 10000)
    select (Action)(() => {})).ToArray();

    var awaitList = new Task[actions2.Length];
    var actions = new[]
    {
        new TimedAction("Task.Factory.StartNew", () =>
        {
            // Enter code to test here
            int j = 0;
            foreach(var action in actions2)
            {
                awaitList[j++] = Task.Factory.StartNew(action);
            }
            Task.WaitAll(awaitList);
        }),
        new TimedAction("Parallel.Invoke", () =>
        {
            // Enter code to test here
            Parallel.Invoke(actions2);
        }),
    };
    const int TimesToRun = 100; // Tweak this as necessary
    TimeActions(TimesToRun, actions);
}


#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
    Stopwatch s = new Stopwatch();
    int length = actions.Length;
    var results = new ActionResult[actions.Length];
    // Perform the actions in their initial order.
    for(int i = 0; i < length; i++)
    {
        var action = actions[i];
        var result = results[i] = new ActionResult{Message = action.Message};
        // Do a dry run to get things ramped up/cached
        result.DryRun1 = s.Time(action.Action, 10);
        result.FullRun1 = s.Time(action.Action, iterations);
    }
    // Perform the actions in reverse order.
    for(int i = length - 1; i >= 0; i--)
    {
        var action = actions[i];
        var result = results[i];
        // Do a dry run to get things ramped up/cached
        result.DryRun2 = s.Time(action.Action, 10);
        result.FullRun2 = s.Time(action.Action, iterations);
    }
    results.Dump();
}

public class ActionResult
{
    public string Message {get;set;}
    public double DryRun1 {get;set;}
    public double DryRun2 {get;set;}
    public double FullRun1 {get;set;}
    public double FullRun2 {get;set;}
}

public class TimedAction
{
    public TimedAction(string message, Action action)
    {
        Message = message;
        Action = action;
    }
    public string Message {get;private set;}
    public Action Action {get;private set;}
}

public static class StopwatchExtensions
{
    public static double Time(this Stopwatch sw, Action action, int iterations)
    {
        sw.Restart();
        for (int i = 0; i < iterations; i++)
        {
            action();
        }
        sw.Stop();

        return sw.Elapsed.TotalMilliseconds;
    }
}
#endregion

结果：

Message               | DryRun1 | DryRun2 | FullRun1 | FullRun2
----------------------------------------------------------------
Task.Factory.StartNew | 43.0592 | 50.847  | 452.2637 | 463.2310
Parallel.Invoke       | 10.5717 |  9.948  | 102.7767 | 101.1158

正如你所看到的，使用Parallel.Invoke可以比等待一堆新建的任务完成快大约4.5倍。当然，这只是在你的操作什么都不做的情况下。每个操作做得越多，你会注意到的差异就越小。