使用C#编程自动登录网站

17

所以,我一直在搜索网络,尝试了解如何使用C#编程方式登录网站。我不想使用Web客户端。我想使用类似HttpWebRequest和HttpWebResponse的东西,但我不知道这些类是如何工作的。

我猜我正在寻找有人能够解释它们的工作原理,并说明成功登录到WordPress、电子邮件帐户或任何需要填写用户名和密码表单的网站所需的步骤。

以下是我的其中一次尝试:

// Declare variables
        string url = textBoxGetSource.Text;
        string username = textBoxUsername.Text;
        string password = PasswordBoxPassword.Password;

        // Values for site login fields - username and password html ID's
        string loginUsernameID = textBoxUsernameID.Text;
        string loginPasswordID = textBoxPasswordID.Text;
        string loginSubmitID = textBoxSubmitID.Text;

        // Connection parameters
        string method = "POST";
        string contentType = @"application/x-www-form-urlencoded";
        string loginString = loginUsernameID + "=" + username + "&" + loginPasswordID + "=" + password + "&" + loginSubmitID;
        CookieContainer cookieJar = new CookieContainer();
        HttpWebRequest request;

        request = (HttpWebRequest)WebRequest.Create(url);
        request.CookieContainer = cookieJar;
        request.Method = method;
        request.ContentType = contentType;
        request.KeepAlive = true;
        using (Stream requestStream = request.GetRequestStream())
        using (StreamWriter writer = new StreamWriter(requestStream))
        {
            writer.Write(loginString, username, password);
        }

        using (var responseStream = request.GetResponse().GetResponseStream())
        using (var reader = new StreamReader(responseStream))
        {
            var result = reader.ReadToEnd();
            Console.WriteLine(result);
            richTextBoxSource.AppendText(result);
        }

        MessageBox.Show("Successfully logged in.");

我不知道自己是否做对了,总是被返回到我尝试的网站的登录界面。我已经下载了Fiddler并能够获取一些有关发送到服务器的信息,但我感到完全迷失。如果有人能在这里提供一些帮助,我将不胜感激。


可能是C#通过程序登录网站的重复问题。 - RyBolt
请参考以下链接中的答案:https://stackoverflow.com/a/66477695/3298930 - Jose Manuel Abarca Rodríguez
2个回答

37

以编程方式登录网站很困难,与该网站实现其登录过程的方式密切相关。您的代码无法正常工作的原因是您在请求/响应中没有处理任何这些内容。

让我们以fif.com为例。当您输入用户名和密码时,将发送以下POST请求:

POST https://fif.com/login?task=user.login HTTP/1.1
Host: fif.com
Connection: keep-alive
Content-Length: 114
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Origin: https://fif.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.103 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: https://fif.com/login?return=...==
Accept-Encoding: gzip,deflate
Accept-Language: en-US,en;q=0.8
Cookie: 34f8f7f621b2b411508c0fd39b2adbb2=gnsbq7hcm3c02aa4sb11h5c87f171mh3; __utma=175527093.69718440.1410315941.1410315941.1410315941.1; __utmb=175527093.12.10.1410315941; __utmc=175527093; __utmz=175527093.1410315941.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmv=175527093.|1=RegisteredUsers=Yes=1

username=...&password=...&return=aHR0cHM6Ly9maWYuY29tLw%3D%3D&9a9bd5b68a7a9e5c3b06ccd9b946ebf9=1

注意cookie(特别是第一个,即您的会话令牌)。 注意发送的加密的URL编码返回值。 如果服务器注意到这些缺失,它将不允许您登录。

HTTP/1.1 400 Bad Request

更糟糕的是,登录页面返回200响应,并在某处隐藏错误消息。

但是假设您能够收集所有这些魔术值并将它们传递给HttpWebRequest对象。站点不会知道任何区别。可能会响应以下内容:

HTTP/1.1 303 See other
Server: nginx
Date: Wed, 10 Sep 2014 02:29:09 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Location: https://fif.com/
希望您已经预料到了。但是如果您已经走到这一步,现在您可以使用已验证的会话令牌编写程序请求服务器,并获取预期的HTML响应。
GET https://fif.com/ HTTP/1.1
Host: fif.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.103 Safari/537.36
Referer: https://fif.com/login?return=aHR0cHM6Ly9maWYuY29tLw==
Accept-Encoding: gzip,deflate
Accept-Language: en-US,en;q=0.8
Cookie: 34f8f7f621b2b411508c0fd39b2adbb2=gnsbq7hcm3c02aa4sb11h5c87f171mh3; __utma=175527093.69718440.1410315941.1410315941.1410315941.1; __utmb=175527093.12.10.1410315941; __utmc=175527093; __utmz=175527093.1410315941.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmv=175527093.|1=RegisteredUsers=Yes=1

这些都是关于fif.com的事情——其他网站的cookie、token和重定向处理将完全不同。根据我的经验(特别是该网站),您有三个选项可以通过登录墙:

  1. 编写一个非常复杂且脆弱的脚本来绕过该网站的程序
  2. 手动使用浏览器登录到该网站,获取魔术值,并将它们插入到您的请求对象中或者
  3. 创建一个脚本来自动化 selenium以代表您执行操作。

Selenium可以处理所有这些操作,并且最终您可以提取cookie并正常发送请求。以下是fif的示例:

//Run selenium
ChromeDriver cd = new ChromeDriver(@"chromedriver_win32");
cd.Url = @"https://fif.com/login";
cd.Navigate();
IWebElement e = cd.FindElementById("username");
e.SendKeys("...");
e = cd.FindElementById("password");
e.SendKeys("...");
e = cd.FindElementByXPath(@"//*[@id=""main""]/div/div/div[2]/table/tbody/tr/td[1]/div/form/fieldset/table/tbody/tr[6]/td/button");
e.Click();

CookieContainer cc = new CookieContainer();

//Get the cookies
foreach(OpenQA.Selenium.Cookie c in cd.Manage().Cookies.AllCookies)
{
    string name = c.Name;
    string value = c.Value;
    cc.Add(new System.Net.Cookie(name,value,c.Path,c.Domain));
}

//Fire off the request
HttpWebRequest hwr = (HttpWebRequest) HttpWebRequest.Create("https://fif.com/components/com_fif/tools/capacity/values/");
hwr.CookieContainer = cc;
hwr.Method = "POST";
hwr.ContentType = "application/x-www-form-urlencoded";
StreamWriter swr = new StreamWriter(hwr.GetRequestStream());
swr.Write("feeds=35");
swr.Close();

WebResponse wr = hwr.GetResponse();
string s = new System.IO.StreamReader(wr.GetResponseStream()).ReadToEnd();

好的,我明白你的意思。这个练习是我第一次涉足Web编程。我更熟悉连接数据库,而这与那完全不同。看起来它带来的麻烦比它值得的还要多。 - DGarrett01
3
Selenium就是我所需要的,它轻松解决了我的难题。 - minnow
10
对我来说,登录Azure以获取积分效果很好。但是缺少CookieContainer cc = new CookieContainer();。 - MrBeanzy

3

请查看这篇文章。它是实现该功能的另一种方法,你不需要安装任何软件包,尽管使用Selenium可能更容易。

"You can continue using WebClient to POST (instead of GET, which is the HTTP verb you're currently using with DownloadString), but I think you'll find it easier to work with the (slightly) lower-level classes WebRequest and WebResponse.

There are two parts to this - the first is to post the login form, the second is recovering the "Set-cookie" header and sending that back to the server as "Cookie" along with your GET request. The server will use this cookie to identify you from now on (assuming it's using cookie-based authentication which I'm fairly confident it is as that page returns a Set-cookie header which includes "PHPSESSID").


POSTing to the login form

Form posts are easy to simulate, it's just a case of formatting your post data as follows:

field1=value1&field2=value2

Using WebRequest and code I adapted from Scott Hanselman, here's how you'd POST form data to your login form:

string formUrl = "http://www.mmoinn.com/index.do?PageModule=UsersAction&Action=UsersLogin";

NOTE: This is the URL the form POSTs to, not the URL of the form (you can find this in the "action" attribute of the HTML's form tag

string formParams = string.Format("email_address={0}&password={1}", "your email", "your password");
string cookieHeader;
WebRequest req = WebRequest.Create(formUrl);
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
byte[] bytes = Encoding.ASCII.GetBytes(formParams);
req.ContentLength = bytes.Length;
using (Stream os = req.GetRequestStream())
{
    os.Write(bytes, 0, bytes.Length);
}
WebResponse resp = req.GetResponse();
cookieHeader = resp.Headers["Set-cookie"];

Here's an example of what you should see in the Set-cookie header for your login form:

PHPSESSID=c4812cffcf2c45e0357a5a93c137642e; path=/; domain=.mmoinn.com,wowmine_referer=directenter; path=/;

domain=.mmoinn.com,lang=en; path=/;domain=.mmoinn.com,adt_usertype=other,adt_host=-


GETting the page behind the login form

Now you can perform your GET request to a page that you need to be logged in for.

string pageSource;
string getUrl = "the url of the page behind the login";
WebRequest getRequest = WebRequest.Create(getUrl);
getRequest.Headers.Add("Cookie", cookieHeader);
WebResponse getResponse = getRequest.GetResponse();
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream()))
{
    pageSource = sr.ReadToEnd();
}

EDIT:

If you need to view the results of the first POST, you can recover the HTML it returned with:

using (StreamReader sr = new StreamReader(resp.GetResponseStream()))
{
    pageSource = sr.ReadToEnd();
}

Place this directly below cookieHeader = resp.Headers["Set-cookie"]; and then inspect the string held in pageSource."


1
仅仅复制粘贴别人的答案并不是适当的行为(即使你附上了链接)。 - nkjt
1
@nkjt 我无法比他更好地解释,但仍然希望能帮助到来到这个页面的人们... - DFSFOT
1
请使用引用块(选择文本并在编辑时使用“"”按钮)来向读者展示您发布的所有内容都是由他人撰写的。 - Edward
这是我所谓的“脆弱脚本舞蹈”,本质上是http请求的编排。但是,只要站点更改其登录表单/流程,该脚本就会崩溃。而且,HTTPS使情况变得更加复杂,而该帖子并未解决这个问题。我的建议是(仍然是),与其在网络层面上编程来规避应用程序层面的问题,还不如使用Selenium在应用程序层面上“编程”。 - xavier
@xavier Selenium 有点烦人,因为人们需要安装你编写代码所需的浏览器,并且浏览器实际上会弹出并执行操作,这需要更多的时间... - DFSFOT
对 WebRequest 中一些机制的很好的解释,但这并不适用于任何正确实现的登录表单。它假设该站点没有防范跨站请求伪造。例如,Asp.net Web Forms 页面将需要有效的视图状态令牌,而 MVC 则需要匹配的 cookie 和表单验证令牌。那些允许你随意 POST 的网站是不安全的。 - Matthew

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接