尝试使用HttpWebRequest获取身份验证cookie(s)

4

我需要从一个安全的网站上爬取表格,但是我在登录页面时遇到了麻烦,无法获取身份验证令牌和任何其他相关的cookie。这里有什么问题吗?

public NameValueCollection LoginToDatrose()
{
    var loginUriBuilder = new UriBuilder();
    loginUriBuilder.Host = DatroseHostName;
    loginUriBuilder.Path = BuildURIPath(DatroseBasePath, LOGIN_PAGE);
    loginUriBuilder.Scheme = "https";

    var boundary = Guid.NewGuid().ToString();
    var postData = new NameValueCollection();
    postData.Add("LoginName", DatroseUserName);
    postData.Add("Password", DatrosePassword);

    var data = Encoding.ASCII.GetBytes(postData.ToQueryString(false));
    var request = WebRequest.Create(loginUriBuilder.Uri) as HttpWebRequest;
    request.Method = "POST";
    request.ContentType = "application/x-www-form-urlencoded";
    request.ContentLength = data.Length;
    using (var d = request.GetRequestStream())
    {
        d.Write(data, 0, data.Length);
    }

    var response = request.GetResponse() as HttpWebResponse;
    var responseCookies = new NameValueCollection();
    foreach (var nvp in response.Cookies.OfType<Cookie>())
    {
        responseCookies.Add(nvp.Name, nvp.Value);
    }

    //using (var responseData = response.GetResponseStream())
    //using (var responseReader = new StreamReader(responseData))
    //{
    //    var theResponse = responseReader.ReadToEnd();
    //    Debug.WriteLine(theResponse);
    //}

    return responseCookies;

}

返回对象中没有值。它没有失败。当theResponse(未注释时)的值似乎是登录页面的HTML。

任何帮助将不胜感激。

1个回答

10

好的,这里的问题似乎与在传递凭据后发生的302重定向有关。 HttpWebRequest 将自动跟随302。

最终,我采取了稍微不同的做法。 首先,我按照以下方式子类化了 WebClient 类:

public class CookiesAwareWebClient : WebClient
{
    private CookieContainer outboundCookies = new CookieContainer();
    private CookieCollection inboundCookies = new CookieCollection();

    public CookieContainer OutboundCookies
    {
        get
        {
            return outboundCookies;
        }
    }
    public CookieCollection InboundCookies
    {
        get
        { 
            return inboundCookies; 
        }
    }

    public bool IgnoreRedirects { get; set; }

    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest request = base.GetWebRequest(address);
        if (request is HttpWebRequest)
        {
            (request as HttpWebRequest).CookieContainer = outboundCookies;
            (request as HttpWebRequest).AllowAutoRedirect = !IgnoreRedirects;
        }
        return request;
    }

    protected override WebResponse GetWebResponse(WebRequest request)
    {
        WebResponse response = base.GetWebResponse(request);
        if (response is HttpWebResponse)
        {
            inboundCookies = (response as HttpWebResponse).Cookies ?? inboundCookies;
        }
        return response;
    }
}

这使我能够使用一个支持cookies并且可以控制重定向的WebClient类。然后,我按以下方式重写了我的登录代码:

public NameValueCollection LoginToDatrose()
{
    var loginUriBuilder = new UriBuilder();
    loginUriBuilder.Host = DatroseHostName;
    loginUriBuilder.Path = BuildURIPath(DatroseBasePath, LOGIN_PAGE);
    loginUriBuilder.Scheme = "https";

    var postData = new NameValueCollection();
    postData.Add("LoginName", DatroseUserName);
    postData.Add("Password", DatrosePassword);

    var responseCookies = new NameValueCollection();

    using (var client = new CookiesAwareWebClient())
    {
        client.IgnoreRedirects = true;
        var clientResponse = client.UploadValues(loginUriBuilder.Uri, "POST", postData);
        foreach (var nvp in client.InboundCookies.OfType<Cookie>())
        {
            responseCookies.Add(nvp.Name, nvp.Value);
        }
    }

    return responseCookies;
}

...然后一切都运行得非常顺利。


3
你本可以使用HttpWebRequest,并且它有一个allowredirect属性,你可以将其设置为false。 - kuhajeyan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接