我从.NET诞生时就开始使用它,也早在并行编程方面有了一定经验...但是,我仍然无法解释这种现象。此代码在生产系统中运行,并且一直在完成其工作,只是希望更好地理解。
我将10个URL传入以下代码以进行并发处理:
public static void ProcessInParellel(IEnumerable<ArchivedStatus> statuses,
StatusRepository statusRepository,
WaitCallback callback,
TimeSpan timeout)
{
List<ManualResetEventSlim> manualEvents = new List<ManualResetEventSlim>(statuses.Count());
try
{
foreach (ArchivedStatus status in statuses)
{
manualEvents.Add(new ManualResetEventSlim(false));
ThreadPool.QueueUserWorkItem(callback,
new State(status, manualEvents[manualEvents.Count - 1], statusRepository));
}
if (!(WaitHandle.WaitAll((from m in manualEvents select m.WaitHandle).ToArray(), timeout, false)))
throw ThreadPoolTimeoutException(timeout);
}
finally
{
Dispose(manualEvents);
}
}
回调函数类似于:
public static void ProcessEntry(object state)
{
State stateInfo = state as State;
try
{
using (new LogTimer(new TimeSpan(0, 0, 6)))
{
GetFinalDestinationForUrl(<someUrl>);
}
}
catch (System.IO.IOException) { }
catch (Exception ex)
{
}
finally
{
if (stateInfo.ManualEvent != null)
stateInfo.ManualEvent.Set();
}
}
每个回调函数都会查看一个URL并遵循一系列重定向(AllowAutoRedirect故意设置为false以处理cookies):
public static string GetFinalDestinationForUrl(string url, string cookie)
{
if (!urlsToIgnore.IsMatch(url))
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.AllowAutoRedirect = false;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.Method = "GET";
request.KeepAlive = false;
request.Pipelined = false;
request.Timeout = 5000;
if (!string.IsNullOrEmpty(cookie))
request.Headers.Add("cookie", cookie);
try
{
string html = null, location = null, setCookie = null;
using (WebResponse response = request.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
html = reader.ReadToEnd();
location = response.Headers["Location"];
setCookie = response.Headers[System.Net.HttpResponseHeader.SetCookie];
}
if (null != location)
return GetFinalDestinationForUrl(GetAbsoluteUrlFromLocationHeader(url, location),
(!string.IsNullOrEmpty(cookie) ? cookie + ";" : string.Empty) + setCookie);
return CleanUrl(url);
}
catch (Exception ex)
{
if (AttemptRetry(ex, url))
throw;
}
}
return ProcessedEntryFlag;
}
我会在递归的GetFinalDestinationForUrl调用周围使用高精度的StopWatch,阈值为6秒,通常完成回调所需的时间都在这个范围内。然而,WaitAll对于10个线程的慷慨超时时间(0,0,60)仍然经常超时。异常打印出类似以下信息: System.Exception: Not all threads returned in 60 seconds: Max Worker:32767, Max I/O:1000, Available Worker:32764, Available I/O:1000 at Work.Threading.ProcessInParellel(IEnumerable`1 statuses,StatusRepository statusRepository, WaitCallback callback, TimeSpan timeout) at Work.UrlExpanderWorker.SyncAllUsers() 这是在.NET 4上运行的,所有URL的maxConnections设置为100。我的唯一理论是同步的HttpWebRequest调用可能会阻塞比指定的超时时间更长时间。这是唯一合理的解释。问题是如何以及最好如何在该操作上强制执行真正的超时?是的,我知道递归调用在每次调用时都指定了5秒的超时时间,但是处理给定URL可能需要多次调用。但我几乎从不见到StopWatch警告。对于我看到的20-30个WaitAll超时错误,我可能会看到一个消息表明特定线程花费的时间超过了6秒。如果真正的问题是10个线程累计需要超过60秒,那么我应该看到至少1:1的相关性(如果不是更高)。
Uri uri = new Uri(url);
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(uri);
request.AllowAutoRedirect = false;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.Method = "GET";
request.KeepAlive = false;
request.Pipelined = false;
request.Timeout = 7000;
request.CookieContainer = cookies;
try
{
string html = null, location = null;
using (new LogTimer("GetFinalDestinationForUrl", url, new TimeSpan(0, 0, 10)))
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
html = reader.ReadToEnd();
location = response.Headers["Location"];
cookies = Combine(cookies, response.Cookies);
if (response.ContentLength > 150000 && !response.ContentType.ContainsIgnoreCase("text/html"))
log.Warn(string.Format("Large request ({0} bytes, {1}) detected at {2} on level {3}.", response.ContentLength, response.ContentType, url, level));
}
这段代码通常记录了需要5-6分钟完成且大小不超过150000的条目。我不是在说这只发生在某个孤立的服务器上,而是在一些随机的(高知名度的)媒体网站上。
到底发生了什么事情,我们该如何确保代码在合理的时间内退出呢?
m.WaitHandle
。在ProcessEntry
中,您使用了stateInfo.ManualEvent
。那是打字错误吗? - Chris Shain