HtmlUnit的CSS未正确应用

9

我尝试使用HtmlUnit保存谷歌页面。但是我无法获得适当的用户界面。当我检查保存的页面代码时,样式标签为空。

我的代码在这里。

public static void main(String[] args) throws IOException {

    FileUtils.cleanDirectory(new File("/home/user1/Documents/Aaa")); 
    WebClient webClient = new WebClient(BrowserVersion.CHROME);
    webClient.getOptions().setCssEnabled(true);
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.waitForBackgroundJavaScriptStartingBefore(1000);
    webClient.waitForBackgroundJavaScript(1000);
    webClient.getOptions().setTimeout(5000);
    System.out.println("******************loaded**********************************");
    try {
        HtmlPage page = webClient.getPage("https://www.google.com");
        page.save(new File("/home/user1/Documents/Aaa/index.html"));
    } catch (Exception e) {
        System.out.println("******************catch***********************************");
        e.printStackTrace();
    }
    webClient.close();
    System.out.println("******************finished********************************");
}

我的页面外观如下:

在此输入图片描述

控制台日志

Dec 10, 2016 3:47:45 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Dec 10, 2016 3:47:46 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[TypeError: object is not iterable] sourceName=[https://www.google.co.in/xjs/_/js/k=xjs.s.en.igGBAtxEWN0.O/m=sx,c,sb,cdos,cr,elog,hsm,jsa,r,qsm,j,p,d,csi/am=AAiUPF6wAOL_ISBuIRxBasDAoA/rt=j/d=1/t=zcms/rs=ACT90oGjQTdwqicso-l4vNE-7GeAqTtjtw] line=[10] lineSource=[null] lineOffset=[0]
Dec 10, 2016 3:47:46 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'https://www.google.co.in/?gfe_rd=cr&ei=RtZLWL3cDsmL8QeW4Yb4AQ' [1:14018] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <URI>, <FUNCTION>, "progid:".)
Dec 10, 2016 3:47:46 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'https://www.google.co.in/?gfe_rd=cr&ei=RtZLWL3cDsmL8QeW4Yb4AQ' [1:14042] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <URI>, <FUNCTION>, "progid:".)
Dec 10, 2016 3:47:46 PM com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine handleJavaScriptException
INFO: Caught script exception
======= EXCEPTION START ========
EcmaError: lineNumber=[551] column=[0] lineSource=[null] name=[TypeError] sourceName=[https://www.google.co.in/xjs/_/js/k=xjs.s.en.igGBAtxEWN0.O/m=sx,c,sb,cdos,cr,elog,hsm,jsa,r,qsm,j,p,d,csi/am=AAiUPF6wAOL_ISBuIRxBasDAoA/rt=j/d=1/t=zcms/rs=ACT90oGjQTdwqicso-l4vNE-7GeAqTtjtw] message=[TypeError: Cannot read property "Kf" from undefined (https://www.google.co.in/xjs/_/js/k=xjs.s.en.igGBAtxEWN0.O/m=sx,c,sb,cdos,cr,elog,hsm,jsa,r,qsm,j,p,d,csi/am=AAiUPF6wAOL_ISBuIRxBasDAoA/rt=j/d=1/t=zcms/rs=ACT90oGjQTdwqicso-l4vNE-7GeAqTtjtw#551)]
com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot read property "Kf" from undefined (https://www.google.co.in/xjs/_/js/k=xjs.s.en.igGBAtxEWN0.O/m=sx,c,sb,cdos,cr,elog,hsm,jsa,r,qsm,j,p,d,csi/am=AAiUPF6wAOL_ISBuIRxBasDAoA/rt=j/d=1/t=zcms/rs=ACT90oGjQTdwqicso-l4vNE-7GeAqTtjtw#551)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:921)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:852)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:824)
    at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216)
    at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeEventListeners(EventListenersContainer.java:258)
    at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:322)
    at com.gargoylesoftware.htmlunit.javascript.host.event.EventTarget.fireEvent(EventTarget.java:206)
    at com.gargoylesoftware.htmlunit.javascript.host.Window.dispatchEvent(Window.java:2033)
    at com.gargoylesoftware.htmlunit.javascript.host.Window$2$1.run(Window.java:2119)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
    at com.gargoylesoftware.htmlunit.javascript.host.Window$2.execute(Window.java:2124)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:966)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.access$500(JavaScriptEngine.java:101)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:916)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:852)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:824)
    at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216)
    at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeEventListeners(EventListenersContainer.java:258)
    at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:322)
    at com.gargoylesoftware.htmlunit.javascript.host.event.EventTarget.fireEvent(EventTarget.java:191)
    at com.gargoylesoftware.htmlunit.html.DomElement$2.run(DomElement.java:1190)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
    at com.gargoylesoftware.htmlunit.html.DomElement.fireEvent(DomElement.java:1195)
    at com.gargoylesoftware.htmlunit.html.DomElement.fireEvent(DomElement.java:1163)
    at com.gargoylesoftware.htmlunit.InteractivePage.setFocusedElement(InteractivePage.java:100)
    at com.gargoylesoftware.htmlunit.InteractivePage.setFocusedElement(InteractivePage.java:66)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.detach(HtmlElement.java:1254)
    at com.gargoylesoftware.htmlunit.html.DomNode.remove(DomNode.java:1132)
    at com.gargoylesoftware.htmlunit.html.DomNode.replace(DomNode.java:1210)
    at com.gargoylesoftware.htmlunit.javascript.host.dom.Node.replaceChild(Node.java:444)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:153)
    at net.sourceforge.htmlunit.corejs.javascript.FunctionObject.call(FunctionObject.java:448)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1540)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
    at net.sourceforge.htmlunit.corejs.javascript.NativeArray.iterativeMethod(NativeArray.java:1694)
    at net.sourceforge.htmlunit.corejs.javascript.NativeArray.execIdCall(NativeArray.java:405)
    at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call(IdFunctionObject.java:94)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.applyOrCall(ScriptRuntime.java:2575)
    at net.sourceforge.htmlunit.corejs.javascript.BaseFunction.execIdCall(BaseFunction.java:321)
    at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call(IdFunctionObject.java:94)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1540)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:413)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3264)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:794)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:906)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:803)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:779)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:975)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:352)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:238)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:966)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.access$500(JavaScriptEngine.java:101)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:916)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:852)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:824)
    at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:52)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:102)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:426)
    at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:157)
    at java.lang.Thread.run(Thread.java:745)
Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot read property "Kf" from undefined (https://www.google.co.in/xjs/_/js/k=xjs.s.en.igGBAtxEWN0.O/m=sx,c,sb,cdos,cr,elog,hsm,jsa,r,qsm,j,p,d,csi/am=AAiUPF6wAOL_ISBuIRxBasDAoA/rt=j/d=1/t=zcms/rs=ACT90oGjQTdwqicso-l4vNE-7GeAqTtjtw#551)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3915)

这个案例有什么想法吗?

编辑:我认为这个错误是在解析CSS3过滤属性时出现的。在谷歌页面中,代码如下。

filter:progid:DXImageTransform.Microsoft.gradient(startColorstr=#4387fd,endColorstr=#4683ea,GradientType=1)

这个过滤器中有一个冒号,progrid中也有一个冒号。所以这导致了CSS解析错误。有什么解决方法吗?


我认为您想要显示index.html而不是webClient...并尝试注释掉webClient.close();... - Donald Wu
当注释掉这一行 webClient.close() 时,结果也相同。 - Anbu
您遇到了“调用构造函数时出现异常”的问题... - Donald Wu
从布局上看,浏览器版本似乎没有应用。尝试在构造函数之后设置 webClient.getBrowserVersion().setUserAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36");,它是否改变了输出? - Fedor Losev
我更新了用户代理。但输出仍然是相同的视图。再试一次,还是得到相同的错误。 - Anbu
显示剩余2条评论
1个回答

3

首先:

WARNING: CSS error: 'https://www.google.co.in/?gfe_rd=cr&ei=hyxFWPuPJMyL8QfigajQCw'

这是一个警告,意味着CSS解析器不喜欢其中一个给定的CSS规则。这并没有实际影响,因为如果页面上的js代码要求HTML元素的样式,则可能会导致一些缺少格式信息。

第二段

webClient.waitForBackgroundJavaScriptStartingBefore(1000);
webClient.waitForBackgroundJavaScript(1000);

这两个调用没有配置选项 - 在给定位置进行这些调用是毫无意义的。
但这两个调用只是小问题,你真正面临的唯一问题是


com.gargoylesoftware.htmlunit.ScriptException: Exception invoking constructor

异常。这通常会停止页面上的js处理(但在您的情况下不会,因为您已将ThrowExceptionOnScriptError设置为false)。

要分析问题,我们至少需要构造函数抛出的异常的堆栈跟踪。通常,此堆栈跟踪也是日志的一部分。

但也许有一个更简单的解决方案。您没有提供您正在使用的HtmlUnit版本的任何信息。我建议使用最新可用的SNAPSHOT版本构建(您可以在http://htmlunit.sourceforge.net/gettingLatestCode.html找到一些获取信息)。我使用最新的htmlunit代码进行了快速测试,并成功获取了页面而没有任何异常。

    public static void main(String[] args) throws IOException {

    FileUtils.cleanDirectory(new File("c://temp/htmlunit/aa")); 
    WebClient webClient = new WebClient(BrowserVersion.CHROME);
    webClient.getOptions().setCssEnabled(true);
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setTimeout(5000);
    System.out.println("******************loaded**********************************");
    try {
        HtmlPage page = webClient.getPage("https://www.google.com");
        webClient.waitForBackgroundJavaScriptStartingBefore(1000);
        webClient.waitForBackgroundJavaScript(1000);
        page.save(new File("c://temp/htmlunit/aa/index.html"));
    } catch (Exception e) {
        System.out.println("******************catch***********************************");
        e.printStackTrace();
    }
    webClient.close();
    System.out.println("******************finished********************************");
}

嘿,我将HtmlUnit更新到了最新的版本2.23。但是我仍然遇到了progrid异常,并且CSS也没有应用。我为更新版本更新了控制台日志。 - Anbu
(1) 我的建议是更新到最新的快照版本...... (2) 再次提醒,ccs 的问题只是一个警告...... (3) 最后,com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot read property 是一个问题。这意味着页面上的脚本(至少部分)无法正常工作。如果您在使用最新的 HtmlUnit 快照版本仍然遇到此问题,则必须打开一个问题。但首先请查看 http://htmlunit.sourceforge.net/submittingJSBugs.html。 - RBRi
另一个可能的问题可能是转储页面。该页面仍然包含一些链接到您的本地文件系统和一些链接到互联网。也许其中一个链接已经损坏了。 - RBRi

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接