WebRequest无法正确下载大文件（~1GB）

Question

WebRequest无法正确下载大文件（~1GB）

16

我正在尝试从公共URL下载一个大文件。一开始似乎工作正常，但有 1/10 的计算机会超时。我最初尝试使用WebClient.DownloadFileAsync，但由于它永远无法完成，我回退到使用WebRequest.Create并直接读取响应流。

我第一次使用WebRequest.Create的版本发现与WebClient.DownloadFileAsync相同的问题。操作超时，文件未完成。

我的下一个版本增加了重试功能以在下载超时时重新尝试。这里变得奇怪了。下载确实最终完成，只需重试即可完成最后的7092字节。因此，文件的大小完全相同，但该文件已损坏并且与源文件不同。现在我希望损坏在最后的7092个字节中，但事实并非如此。

使用BeyondCompare，我发现损坏的文件中缺少2个字节块，共计缺少7092个字节！这些缺失的字节位于1CA49FF0和1E31F380处，远远在下载超时并重新启动之前。

这里可能出了什么问题？有没有提示进一步追踪此问题的方法？

以下是相关代码。

public void DownloadFile(string sourceUri, string destinationPath)
{
    //roughly based on: https://dev59.com/OXE95IYBdhLWcg3wi-Yc
    //not using WebClient.DownloadFileAsync as it seems to stall out on large files rarely for unknown reasons.

    using (var fileStream = File.Open(destinationPath, FileMode.Create, FileAccess.Write, FileShare.Read))
    {
        long totalBytesToReceive = 0;
        long totalBytesReceived = 0;
        int attemptCount = 0;
        bool isFinished = false;

        while (!isFinished)
        {
            attemptCount += 1;

            if (attemptCount > 10)
            {
                throw new InvalidOperationException("Too many attempts to download. Aborting.");
            }

            try
            {
                var request = (HttpWebRequest)WebRequest.Create(sourceUri);

                request.Proxy = null;//https://dev59.com/J3RA5IYBdhLWcg3w_DLF#935728
                _log.AddInformation("Request #{0}.", attemptCount);

                //continue downloading from last attempt.
                if (totalBytesReceived != 0)
                {
                    _log.AddInformation("Request resuming with range: {0} , {1}", totalBytesReceived, totalBytesToReceive);
                    request.AddRange(totalBytesReceived, totalBytesToReceive);
                }

                using (var response = request.GetResponse())
                {
                    _log.AddInformation("Received response. ContentLength={0} , ContentType={1}", response.ContentLength, response.ContentType);

                    if (totalBytesToReceive == 0)
                    {
                        totalBytesToReceive = response.ContentLength;
                    }

                    using (var responseStream = response.GetResponseStream())
                    {
                        _log.AddInformation("Beginning read of response stream.");
                        var buffer = new byte[4096];
                        int bytesRead = responseStream.Read(buffer, 0, buffer.Length);
                        while (bytesRead > 0)
                        {
                            fileStream.Write(buffer, 0, bytesRead);
                            totalBytesReceived += bytesRead;
                            bytesRead = responseStream.Read(buffer, 0, buffer.Length);
                        }

                        _log.AddInformation("Finished read of response stream.");
                    }
                }

                _log.AddInformation("Finished downloading file.");
                isFinished = true;
            }
            catch (Exception ex)
            {
                _log.AddInformation("Response raised exception ({0}). {1}", ex.GetType(), ex.Message);
            }
        }
    }
}

以下是来自损坏下载的日志输出：

Request #1.
Received response. ContentLength=939302925 , ContentType=application/zip
Beginning read of response stream.
Response raised exception (System.Net.WebException). The operation has timed out.
Request #2.
Request resuming with range: 939295833 , 939302925
Received response. ContentLength=7092 , ContentType=application/zip
Beginning read of response stream.
Finished read of response stream.
Finished downloading file.

- Spish

1

我能想到两件事情。a）如果可能的话，增加大文件的超时时间；b）你的数据编码和解码是否会出现损坏？我曾经在另一个项目中遇到过这个问题。尝试使用UTF-8进行编码。 - Steven

这不应该是编码问题，而是一个二进制 blob（zip 文件）。 - Spish

5

听起来你正在试图在错误的端点上调试服务器故障。 - Hans Passant

Hans，看起来你是正确的。我们通过使用 Chrome 下载成功地重现了该问题（文件损坏丢失字节）。 - Spish

7

感谢您发布完整的 DownloadFile 方法解决了我遇到的一个完全无关的问题，给您点个赞。 - cod3monk3y

你为什么要重试呢？直接禁用超时就行了。对于长时间运行的下载来说，超时是没有意义的，你不需要超时。 - usr

4个回答

0

对我来说，你使用缓冲区读取文件的方法看起来很奇怪。也许问题在于你

while(bytesRead > 0)

如果由于某种原因，流在某个时刻没有返回任何字节，但仍未完成下载，则它将退出循环并永远不会返回。您应该获取Content-Length，并通过bytesRead增加一个变量totalBytesReceived。最后，您需要更改循环为：

while(totalBytesReceived < ContentLength)

- johmarjac

0

您应该更改超时设置。似乎有两个可能的超时问题：

客户端超时 - 尝试更改WebClient中的超时时间。我发现对于大文件下载，有时需要这样做。
服务器端超时 - 尝试更改服务器上的超时时间。您可以使用另一个客户端（例如PostMan）验证这是问题所在。

- A X

0

分配比预期文件大小更大的缓冲区。

byte[] byteBuffer = new byte[65536];

因此，如果文件的大小为1GiB，则需要分配1 GiB的缓冲区，然后尝试在一个调用中填充整个缓冲区。这种填充可能会返回较少的字节，但您仍然已经分配了整个缓冲区。请注意，在.NET中单个数组的最大长度是32位数字，这意味着即使您重新编译程序为64位并且实际上有足够的可用内存。

- Mrunalini

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sizons · Accepted Answer

这是我通常使用的方法，对于你需要的相同类型的加载，到目前为止它还没有让我失望过。尝试使用我的代码来改变你的代码，看看是否有帮助。

if (!Directory.Exists(localFolder))
{
    Directory.CreateDirectory(localFolder);   
}


try
{
    HttpWebRequest httpRequest = (HttpWebRequest)WebRequest.Create(Path.Combine(uri, filename));
    httpRequest.Method = "GET";

    // if the URI doesn't exist, exception gets thrown here...
    using (HttpWebResponse httpResponse = (HttpWebResponse)httpRequest.GetResponse())
    {
        using (Stream responseStream = httpResponse.GetResponseStream())
        {
            using (FileStream localFileStream = 
                new FileStream(Path.Combine(localFolder, filename), FileMode.Create))
            {
                var buffer = new byte[4096];
                long totalBytesRead = 0;
                int bytesRead;

                while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0)
                {
                    totalBytesRead += bytesRead;
                    localFileStream.Write(buffer, 0, bytesRead);
                }
            }
        }
    }
}
catch (Exception ex)
{        
    throw;
}