不使用DoEvents从WebBrowser控件获取ReadyState

4

这个问题在这里和其他网站上已经有很多次回答并且是有效的,但我希望能够了解其他的方法:

如何在使用导航或者post之后获取ReadyState = Complete,而不使用DoEvents,因为它具有诸多缺点。

同时,我需要指出的是,在这种情况下,使用DocumentComplete事件也无济于事,因为我将不仅在一个页面上导航,而是像这样进行一个接一个的导航。

wb.navigate("www.microsoft.com")
//dont use DoEvents loop here
wb.Document.Body.SetAttribute(textbox1, "login")
//dont use DoEvents loop here
if (wb.documenttext.contais("text"))
//do something

今天的做法是使用DoEvents。我想知道有没有更好的方法来等待浏览器方法的异步调用,然后再进行其余的逻辑处理。仅此而已。
提前感谢。

1
必须使用DocumentCompleted事件。你所需要做的就是跟踪完成了什么。该事件已经告诉你,你可以得到e.Url属性。如果你需要更多信息,那么只需使用一个变量来跟踪状态。一个简单的整数或枚举就可以胜任。 - Hans Passant
3个回答

2
以下是一个基础的WinForms应用程序代码,演示如何使用async/await异步等待DocumentCompleted事件,并依次导航到多个页面。所有操作都在主UI线程上进行。
与其调用this.webBrowser.Navigate(url),不如模拟表单按钮点击来触发POST式导航。
webBrowser.IsBusy异步循环逻辑是可选的,其目的是考虑(非确定性地)页面的动态AJAX代码,这些代码可能会在window.onload事件之后执行。
using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace WebBrowserApp
{
    public partial class MainForm : Form
    {
        WebBrowser webBrowser;

        public MainForm()
        {
            InitializeComponent();

            // create a WebBrowser
            this.webBrowser = new WebBrowser();
            this.webBrowser.Dock = DockStyle.Fill;
            this.Controls.Add(this.webBrowser);

            this.Load += MainForm_Load;
        }

        // Form Load event handler
        async void MainForm_Load(object sender, EventArgs e)
        {
            // cancel the whole operation in 30 sec
            var cts = new CancellationTokenSource(30000);

            var urls = new String[] { 
                    "http://www.example.com", 
                    "http://www.gnu.org", 
                    "http://www.debian.org" };

            await NavigateInLoopAsync(urls, cts.Token);
        }

        // navigate to each URL in a loop
        async Task NavigateInLoopAsync(string[] urls, CancellationToken ct)
        {
            foreach (var url in urls)
            {
                ct.ThrowIfCancellationRequested();
                var html = await NavigateAsync(ct, () => 
                    this.webBrowser.Navigate(url));
                Debug.Print("url: {0}, html: \n{1}", url, html);
            }
        }

        // asynchronous navigation
        async Task<string> NavigateAsync(CancellationToken ct, Action startNavigation)
        {
            var onloadTcs = new TaskCompletionSource<bool>();
            EventHandler onloadEventHandler = null;

            WebBrowserDocumentCompletedEventHandler documentCompletedHandler = delegate
            {
                // DocumentCompleted may be called several time for the same page,
                // if the page has frames
                if (onloadEventHandler != null)
                    return;

                // so, observe DOM onload event to make sure the document is fully loaded
                onloadEventHandler = (s, e) =>
                    onloadTcs.TrySetResult(true);
                this.webBrowser.Document.Window.AttachEventHandler("onload", onloadEventHandler);
            };

            this.webBrowser.DocumentCompleted += documentCompletedHandler;
            try
            {
                using (ct.Register(() => onloadTcs.TrySetCanceled(), useSynchronizationContext: true))
                {
                    startNavigation();
                    // wait for DOM onload event, throw if cancelled
                    await onloadTcs.Task;
                }
            }
            finally
            {
                this.webBrowser.DocumentCompleted -= documentCompletedHandler;
                if (onloadEventHandler != null)
                    this.webBrowser.Document.Window.DetachEventHandler("onload", onloadEventHandler);
            }

            // the page has fully loaded by now

            // optional: let the page run its dynamic AJAX code,
            // we might add another timeout for this loop
            do { await Task.Delay(500, ct); }
            while (this.webBrowser.IsBusy);

            // return the page's HTML content
            return this.webBrowser.Document.GetElementsByTagName("html")[0].OuterHtml;
        }
    }
}

如果您想从控制台应用程序中进行类似的操作,这里有一个示例。请参考此处

你为什么用C语法编写Visual Basic .NET? - Md Ashraful Islam

2
解决方案很简单:
    // MAKE SURE ReadyState = Complete
            while (WebBrowser1.ReadyState.ToString() != "Complete") {
                Application.DoEvents();         
            }

// 接下来进入您的子序列代码...


简单快速...我是一个VBA程序员,这个逻辑已经运行了很长时间,但我花了几天时间在C#中找不到类似的解决方案,最终我自己解决了。

以下是我的完整函数,目的是从网页中获取一段信息:

private int maxReloadAttempt = 3;
    private int currentAttempt = 1;

    private string GetCarrier(string webAddress)
    {
        WebBrowser WebBrowser_4MobileCarrier = new WebBrowser();
        string innerHtml;
        string strStartSearchFor = "subtitle block pull-left\">";
        string strEndSearchFor = "<";

        try
        {
            WebBrowser_4MobileCarrier.ScriptErrorsSuppressed = true;
            WebBrowser_4MobileCarrier.Navigate(webAddress); 

            // MAKE SURE ReadyState = Complete
            while (WebBrowser_4MobileCarrier.ReadyState.ToString() != "Complete") {
                Application.DoEvents();         
            }

            // LOAD HTML
            innerHtml = WebBrowser_4MobileCarrier.Document.Body.InnerHtml;  

            // ATTEMPT (x3) TO EXTRACT CARRIER STRING
            while (currentAttempt <=  maxReloadAttempt) {
                if (innerHtml.IndexOf(strStartSearchFor) >= 0)
                {
                    currentAttempt = 1; // Reset attempt counter
                    return Sub_String(innerHtml, strStartSearchFor, strEndSearchFor, "0"); // Method: "Sub_String" is my custom function
                }
                else
                {
                    currentAttempt += 1;    // Increment attempt counter
                    GetCarrier(webAddress); // Recursive method call
                } // End if
            } // End while
        }   // End Try

        catch //(Exception ex)
        {
        }
        return "Unavailable";
    }

1
这是一个“快速而不完美”的解决方案。它并不是100%可靠的,但它不会阻塞UI线程,对于原型WebBrowser控件自动化程序应该是令人满意的。
    private async void testButton_Click(object sender, EventArgs e)
    {
        await Task.Factory.StartNew(
            () =>
            {
                stepTheWeb(() => wb.Navigate("www.yahoo.com"));
                stepTheWeb(() => wb.Navigate("www.microsoft.com"));
                stepTheWeb(() => wb.Navigate("asp.net"));
                stepTheWeb(() => wb.Document.InvokeScript("eval", new[] { "$('p').css('background-color','yellow')" }));
                bool testFlag = false;
                stepTheWeb(() => testFlag = wb.DocumentText.Contains("Get Started"));
                if (testFlag) {    /* TODO */ }
                // ... 
            }
        );
    }

    private void stepTheWeb(Action task)
    {
        this.Invoke(new Action(task));

        WebBrowserReadyState rs = WebBrowserReadyState.Interactive;
        while (rs != WebBrowserReadyState.Complete)
        {
            this.Invoke(new Action(() => rs = wb.ReadyState));
            System.Threading.Thread.Sleep(300);
        }
   }

这是一个更通用的testButton_Click方法:

    private async void testButton_Click(object sender, EventArgs e)
    {
        var actions = new List<Action>()
            {
                () => wb.Navigate("www.yahoo.com"),
                () => wb.Navigate("www.microsoft.com"),
                () => wb.Navigate("asp.net"),
                () => wb.Document.InvokeScript("eval", new[] { "$('p').css('background-color','yellow')" }),
                () => {
                         bool testFlag = false;
                         testFlag  = wb.DocumentText.Contains("Get Started"); 
                         if (testFlag)  {   /*  TODO */  }
                       }
                //... 
            };

        await Task.Factory.StartNew(() => actions.ForEach((x)=> stepTheWeb (x)));  
    }

[更新]

我已经通过借鉴和轻微重构@Noseratio在这个主题中的NavigateAsync方法来改进我的“快速且简单”的示例。 新代码版本将异步自动执行UI线程上下文中的导航操作以及Javascript / AJAX调用 - 任何“lamdas”/一次自动化步骤任务实现方法。

所有的代码审查/评论都非常欢迎。 特别是来自@Noseratio的意见。 我们将共同让这个世界变得更好 ;)

    public enum ActionTypeEnumeration
    {
        Navigation = 1,
        Javascript = 2,
        UIThreadDependent = 3,
        UNDEFINED = 99
    }

    public class ActionDescriptor
    {
        public Action Action { get; set; }
        public ActionTypeEnumeration ActionType { get; set; }
    }

    /// <summary>
    /// Executes a set of WebBrowser control's Automation actions
    /// </summary>
    /// <remarks>
    ///  Test form shoudl ahve the following controls:
    ///    webBrowser1 - WebBrowser,
    ///    testbutton - Button,
    ///    testCheckBox - CheckBox,
    ///    totalHtmlLengthTextBox - TextBox
    /// </remarks> 
    private async void testButton_Click(object sender, EventArgs e)
    {
        try
        {
            var cts = new CancellationTokenSource(60000);

            var actions = new List<ActionDescriptor>()
            {
                new ActionDescriptor() { Action = ()=>  wb.Navigate("www.yahoo.com"), ActionType = ActionTypeEnumeration.Navigation}  ,
                new ActionDescriptor() { Action = () => wb.Navigate("www.microsoft.com"), ActionType = ActionTypeEnumeration.Navigation}  ,
                new ActionDescriptor() { Action = () => wb.Navigate("asp.net"), ActionType = ActionTypeEnumeration.Navigation}  ,
                new ActionDescriptor() { Action = () => wb.Document.InvokeScript("eval", new[] { "$('p').css('background-color','yellow')" }), ActionType = ActionTypeEnumeration.Javascript}, 
                new ActionDescriptor() { Action =
                () => {
                         testCheckBox.Checked = wb.DocumentText.Contains("Get Started"); 
                       },
                       ActionType = ActionTypeEnumeration.UIThreadDependent} 
                //... 
            };

            foreach (var action in actions)
            {
               string html = await ExecuteWebBrowserAutomationAction(cts.Token, action.Action, action.ActionType);
               // count HTML web page stats - just for fun
               int totalLength = 0;
               Int32.TryParse(totalHtmlLengthTextBox.Text, out totalLength);
               totalLength += !string.IsNullOrWhiteSpace(html) ? html.Length : 0;
               totalHtmlLengthTextBox.Text = totalLength.ToString();   
            }
        }
        catch (Exception ex)
        {
            MessageBox.Show(ex.Message, "Error");   
        }
    }

    // asynchronous WebBroswer control Automation
    async Task<string> ExecuteWebBrowserAutomationAction(
                            CancellationToken ct, 
                            Action runWebBrowserAutomationAction, 
                            ActionTypeEnumeration actionType = ActionTypeEnumeration.UNDEFINED)
    {
        var onloadTcs = new TaskCompletionSource<bool>();
        EventHandler onloadEventHandler = null;

        WebBrowserDocumentCompletedEventHandler documentCompletedHandler = delegate
        {
            // DocumentCompleted may be called several times for the same page,
            // if the page has frames
            if (onloadEventHandler != null)
                return;

            // so, observe DOM onload event to make sure the document is fully loaded
            onloadEventHandler = (s, e) =>
                onloadTcs.TrySetResult(true);
            this.wb.Document.Window.AttachEventHandler("onload", onloadEventHandler);
        };


        this.wb.DocumentCompleted += documentCompletedHandler;
        try
        {
            using (ct.Register(() => onloadTcs.TrySetCanceled(), useSynchronizationContext: true))
            {
                runWebBrowserAutomationAction();

                if (actionType == ActionTypeEnumeration.Navigation)
                {
                    // wait for DOM onload event, throw if cancelled
                    await onloadTcs.Task;
                }
            }
        }
        finally
        {
            this.wb.DocumentCompleted -= documentCompletedHandler;
            if (onloadEventHandler != null)
                this.wb.Document.Window.DetachEventHandler("onload", onloadEventHandler);
        }

        // the page has fully loaded by now

        // optional: let the page run its dynamic AJAX code,
        // we might add another timeout for this loop
        do { await Task.Delay(500, ct); }
        while (this.wb.IsBusy);

        // return the page's HTML content
        return this.wb.Document.GetElementsByTagName("html")[0].OuterHtml;
    }

1
不要误会,但这是一个糟糕的设计。它仅使用后台线程通过“Control.Invoke”在UI线程上操作“WebBrowser”对象。这个任务不需要额外的线程。而且,“Thread.Sleep(300)”循环...有“DocumentCompleted”事件可以代替。 - noseratio - open to work
@Noseratio,我刚刚在这里发布了一个新的代码版本,借鉴并调整了你在这个主题中的代码示例的一部分,你觉得可以吗? - ShamilS
我看不出你的ExecuteWebBrowserAutomationAction和我的NavigateAsync有什么区别。虽然这段代码本身并不特别,但从邻近答案借用关键部分在SO上我记得并不经常见到。 - noseratio - open to work
@Noseratio,这个话题的原始问题不仅涉及到WebBrowser控件自动化URL导航,还有一个请求执行代码行wb.Document.Body.SetAttribute(textbox1, "login")。你当前版本的NavigateAsync对于该代码行会抛出运行时错误。我进行了一些更改,使其正常工作,以展示区别。请随意“借回”修正后的代码,以使其更加稳定。如果您在使用中遇到任何问题,请告知我,我将删除包含您的NavigateAsync的部分答案。 - ShamilS
@Noseratio,AFAIS,SO的“用户贡献在署名-相同方式共享 3.0 协议下许可”,我借用您的代码并进行了修正,在此处以修正后的形式重新发布,并注明了所有原始代码的来源,我认为这并不违反任何SO版权规定。谢谢。 - ShamilS
显示剩余3条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接