我需要在数百万行数据上运行一次C#计算,并将结果保存在另一个表中。我已经有几年没有使用过C#中的线程了。我正在使用.NET v4.5和EF v5。
原始代码大致如下:
public static void Main()
{
Stopwatch sw = new Stopwatch();
sw.Start();
Entities db = new Entities();
DoCalc(db.Clients.ToList());
sw.Stop();
Console.WriteLine(sw.Elapsed);
}
private static void DoCalc(List<Client> clients)
{
Entities db = new Entities();
foreach(var c in clients)
{
var transactions = db.GetTransactions(c);
var result = calulate(transactions); //the actual calc
db.Results.Add(result);
db.SaveChanges();
}
}
这是我尝试进行多线程的努力:
private static int numberOfThreads = 15;
public static void Main()
{
Stopwatch sw = new Stopwatch();
sw.Start();
Entities db = new Entities();
var splitUpClients = SplitUpClients(db.Clients());
Task[] allTasks = new Task[numberOfThreads];
for (int i = 0; i < numberOfThreads; i++)
{
Task task = Task.Factory.StartNew(() => DoCalc(splitupClients[i]));
allTasks[i] = task;
}
Task.WaitAll(allTasks);
sw.Stop();
Console.WriteLine(sw.Elapsed);
}
private static void DoCalc(List<Client> clients)
{
Entities db = new Entities();
foreach(var c in clients)
{
var transactions = db.GetTransactions(c);
var result = calulate(transactions);
db.Results.Add(result);
db.SaveChanges();
}
}
//splits the list of clients into n subgroups
private static List<List<Client>> SplitUpClients(List<Client> clients)
{
int maxPerGroup = (int)Math.Ceiling((double)clients.Count() / numberOfThreads);
return ts.Select((s, i) => new { Str = s, Index = i }).
GroupBy(o => o.Index / maxPerGroup, o => o.Str).
Select(coll => coll.ToList()).
ToList();
}
我的问题是:
这样做是否安全和正确,是否存在任何明显的缺陷(特别是关于EF方面)?
此外,如何找到最佳线程数?是越多越好吗?
using
,例如using (Entities db = new Entities()) { ... }
。 - H H