C# 如何检查 URL 是否存在/有效？

Question

C# 如何检查 URL 是否存在/有效？

c#.neturl-validation

129

我正在使用Visual C# 2005制作一个简单的程序，它可以在Yahoo! Finance上查找股票代码，下载历史数据，并为指定的股票代号绘制价格历史图表。

我知道需要获取数据的确切URL，如果用户输入一个现有的股票代号（或者至少是在Yahoo! Finance上有数据的），那么程序就能正常工作。但是，如果用户编造一个股票代号，程序会在运行时出现错误，因为程序尝试从不存在的网页中获取数据。

我正在使用WebClient类，并使用DownloadString函数。我浏览了WebClient类的所有其他成员函数，但没有看到任何我可以用来测试URL的东西。

我该怎么办？

- Daniel Waltrip

1

更新以显示C# 2.0（VS2005）的用法 - Marc Gravell

14个回答

121

你可以使用 "HEAD" 请求而不是 "GET" 请求来测试一个 URL，这样就可以避免下载内容带来的成本：

// using MyClient from linked post
using(var client = new MyClient()) {
    client.HeadOnly = true;
    // fine, no content downloaded
    string s1 = client.DownloadString("http://google.com");
    // throws 404
    string s2 = client.DownloadString("http://google.com/silly");
}

在DownloadString周围使用try/catch来检查错误；没有错误？它就存在了...

使用C# 2.0（VS2005）：

private bool headOnly;
public bool HeadOnly {
    get {return headOnly;}
    set {headOnly = value;}
}

并且

using(WebClient client = new MyClient())
{
    // code as before
}

- Marc Gravell

FWIW - 不确定这是否真正解决了问题（除了可能会在客户端上产生不同的行为），因为您只是更改了HTTP方法。服务器的响应将严重依赖于逻辑编码方式，并且对于像股票价格这样的动态服务可能效果不佳。对于静态资源（例如图像、文件等），HEAD通常按预期工作，因为它已经内置到服务器中。许多程序员通常不会显式地使用HEAD请求，因为重点通常放在POST和GET上。YMMV - David Taylor

抱歉花了这么长时间才选出一个答案...我被学校和工作分心了，有点忘记了这个帖子。顺便说一下，我无法让你的解决方案起作用，因为我正在使用没有“var”类型的Visual Studio 2005。我已经好几个月没做这个项目了，但是有没有简单的方法来解决这个问题呢？当我尝试实现你的解决方案时，我记得它会因为在“get”和“set”定义中没有代码而对我发火，试图定义HeadOnly属性。或者我只是做错了什么。还是谢谢你的帮助！ - Daniel Waltrip

2

MyClient 是什么？ - Kiquenet

@Kiquenet 此处正文中有一个链接，指向此处：https://dev59.com/JnVC5IYBdhLWcg3w4Vf6 - Marc Gravell

41

这些解决方案很不错，但它们忽略了可能存在200 OK以外的其他状态码。这是我在生产环境中用于状态监测等方面的解决方案。

如果目标页面存在重定向或其他条件，则使用此方法将返回true。同时，GetResponse（）将抛出异常，因此您将无法获得StatusCode。您需要捕获异常并检查是否为ProtocolError。

任何400或500状态码都将返回false。所有其他状态码都将返回true。此代码可以轻松修改以适应特定状态码的需求。

/// <summary>
/// This method will check a url to see that it does not return server or protocol errors
/// </summary>
/// <param name="url">The path to check</param>
/// <returns></returns>
public bool UrlIsValid(string url)
{
    try
    {
        HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
        request.Timeout = 5000; //set the timeout to 5 seconds to keep the user from waiting too long for the page to load
        request.Method = "HEAD"; //Get only the header information -- no need to download any content

        using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
        {
            int statusCode = (int)response.StatusCode;
            if (statusCode >= 100 && statusCode < 400) //Good requests
            {
                return true;
            }
            else if (statusCode >= 500 && statusCode <= 510) //Server Errors
            {
                //log.Warn(String.Format("The remote server has thrown an internal error. Url is not valid: {0}", url));
                Debug.WriteLine(String.Format("The remote server has thrown an internal error. Url is not valid: {0}", url));
                return false;
            }
        }
    }
    catch (WebException ex)
    {
        if (ex.Status == WebExceptionStatus.ProtocolError) //400 errors
        {
            return false;
        }
        else
        {
            log.Warn(String.Format("Unhandled status [{0}] returned for url: {1}", ex.Status, url), ex);
        }
    }
    catch (Exception ex)
    {
        log.Error(String.Format("Could not test url {0}.", url), ex);
    }
    return false;
}

- jsmith

1

我想补充一点，3xx 范围内的某些状态码实际上会导致错误被抛出，例如 304 Not Modified，在这种情况下，您应该在 catch 块中处理它。 - RobV

3

刚刚遇到了一个抓狂的问题：使用这种方法时，HttpWebRequest 如果在尝试下载其他内容之前不 .Close() response 对象，就会出现问题。找了好几个小时才发现这个问题！ - jbeldock

4

由于HttpWebResponse对象实现了IDisposable接口，所以应该将其包含在using块中，这样可以确保连接被关闭。如果不这样做，可能会出现问题，就像@jbeldock遇到的那样。 - Habib

2

它在浏览器中正常工作的URL上抛出404未找到的错误...？ - MikeT

@MichaelTranchida 网络服务器因不支持某些方法而出现404错误是众所周知的。在您的情况下，该资源可能不支持 Head 方法，但可能支持 Get 方法。它应该抛出405错误。 - Sriram Sakthivel

显示剩余2条评论

9

如果我理解你的问题正确，你可以使用以下这种小方法来获取你URL测试的结果：

WebRequest webRequest = WebRequest.Create(url);  
WebResponse webResponse;
try 
{
  webResponse = webRequest.GetResponse();
}
catch //If exception thrown then couldn't get response from address
{
  return 0;
} 
return 1;

你可以将上述代码包装在一个方法中，并使用它来执行验证。希望这回答了你的问题。

- Calendar Software

1

是的，也许您可以通过区分不同情况（TCP连接失败-主机拒绝连接，5xx-发生了致命错误，404-未找到资源等）来完善解决方案。看一下WebException的Status属性;) - David Taylor

非常好的观点，David！这将为我们提供更详细的反馈，以便我们可以更明智地处理错误。 - Calendar Software

1

谢谢。我的观点是，这个问题有几个层面，每个层面都可能会对工作造成影响（.Net Framework、DNS解析、TCP连接、目标Web服务器、目标应用程序等）。在我看来，一个好的设计应该能够区分不同的故障条件，提供信息反馈和可用的诊断。同时，我们也不要忘记HTTP有状态码存在的原因;) - David Taylor

8

很多答案比HttpClient（我认为它是在Visual Studio 2013中引入的）更老，或者没有async/await功能，因此我决定发布自己的解决方案：

private static async Task<bool> DoesUrlExists(String url)
{
    try
    {
        using (HttpClient client = new HttpClient())
        {
            //Do only Head request to avoid download full file
            var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Head, url));

            if (response.IsSuccessStatusCode) {
                //Url is available is we have a SuccessStatusCode
                return true;
            }
            return false;
        }                
    } catch {
            return false;
    }
}

我使用HttpClient.SendAsync和HttpMethod.Head一起，进行只发送Head请求而不下载整个文件。就像David和Marc所说，不仅有http 200表示OK，因此我使用IsSuccessStatusCode来允许所有成功的状态码。

- Daniel W.

7

我一直发现异常处理速度较慢。

也许有一种更简单的方法可以得到更好、更快的结果呢？

public bool IsValidUri(Uri uri)
{

    using (HttpClient Client = new HttpClient())
    {

    HttpResponseMessage result = Client.GetAsync(uri).Result;
    HttpStatusCode StatusCode = result.StatusCode;

    switch (StatusCode)
    {

        case HttpStatusCode.Accepted:
            return true;
        case HttpStatusCode.OK:
            return true;
         default:
            return false;
        }
    }
}

然后只需要使用以下代码：

IsValidUri(new Uri("http://www.google.com/censorship_algorithm"));

- Rusty Nail

为什么不使用 result.IsSuccessStatusCode 而不是 switch？ - Dan Diplo

6

试试这个（确保你使用System.Net）：

public bool checkWebsite(string URL) {
   try {
      WebClient wc = new WebClient();
      string HTMLSource = wc.DownloadString(URL);
      return true;
   }
   catch (Exception) {
      return false;
   }
}

当调用checkWebsite()函数时，它尝试获取传入的URL的源代码。如果成功获取源代码，则返回true；否则返回false。

代码示例：

//The checkWebsite command will return true:
bool websiteExists = this.checkWebsite("https://www.google.com");

//The checkWebsite command will return false:
bool websiteExists = this.checkWebsite("https://www.thisisnotarealwebsite.com/fakepage.html");

- user6909992

5

WebRequest request = WebRequest.Create("http://www.google.com");
try
{
     request.GetResponse();
}
catch //If exception thrown then couldn't get response from address
{
     MessageBox.Show("The URL is incorrect");`
}

- Praveen Dasare

1

请在您的答案中添加一些解释。仅有代码的答案往往会令未来的读者感到困惑，对他们没有帮助，并可能因此吸引负评。 - Jesse

4

这个解决方案看起来很容易跟随：

public static bool isValidURL(string url) {
    WebRequest webRequest = WebRequest.Create(url);
    WebResponse webResponse;
    try
    {
        webResponse = webRequest.GetResponse();
    }
    catch //If exception thrown then couldn't get response from address
    {
        return false ;
    }
    return true ;
}

- abobjects.com

1

不要忘记关闭webResponse，否则每次调用方法时响应时间都会增加。 - Madagaga

3

这里有另外一个选项。

public static bool UrlIsValid(string url)
{
    bool br = false;
    try {
        IPHostEntry ipHost = Dns.Resolve(url);
        br = true;
    }
    catch (SocketException se) {
        br = false;
    }
    return br;
}

- Zain Ali

3

这可能对检查主机是否存在有用。显然，该问题并不担心主机是否存在，而是关注如何处理一个错误的HTTP路径，假设已知主机存在且正常。 - binki

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- BigJoe714 · Accepted Answer

这是另一种实现此解决方案的方法：

using System.Net;

///
/// Checks the file exists or not.
///
/// The URL of the remote file.
/// True : If the file exits, False if file not exists
private bool RemoteFileExists(string url)
{
    try
    {
        //Creating the HttpWebRequest
        HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
        //Setting the Request method HEAD, you can also use GET too.
        request.Method = "HEAD";
        //Getting the Web Response.
        HttpWebResponse response = request.GetResponse() as HttpWebResponse;
        //Returns TRUE if the Status code == 200
        response.Close();
        return (response.StatusCode == HttpStatusCode.OK);
    }
    catch
    {
        //Any exception will returns false.
        return false;
    }
}

来源： http://www.dotnetthoughts.net/2009/10/14/how-to-check-remote-file-exists-using-c/