AWS EMR 上出现奇怪的 Spark 错误

25

我有一个非常简单的PySpark脚本,它从S3上的一些parquet数据创建一个dataframe,然后调用count()方法并打印出记录数。

我在AWS EMR群集上运行脚本,但看到以下奇怪的WARN信息:

17/12/04 14:20:26 WARN ServletHandler: 
javax.servlet.ServletException: java.util.NoSuchElementException: None.get
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
    at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
    at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
    at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)
    at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
    at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
    at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
    at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
    at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
    at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:461)
    at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
    at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at org.spark_project.jetty.server.Server.handle(Server.java:524)
    at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)
    at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
    at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
    at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)
    at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
    at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
    at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
    at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
    at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
    at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.NoSuchElementException: None.get
    at scala.None$.get(Option.scala:347)
    at scala.None$.get(Option.scala:345)
    at org.apache.spark.status.api.v1.MetricHelper.submetricQuantiles(AllStagesResource.scala:313)
    at org.apache.spark.status.api.v1.AllStagesResource$$anon$1.build(AllStagesResource.scala:178)
    at org.apache.spark.status.api.v1.AllStagesResource$.taskMetricDistributions(AllStagesResource.scala:181)
    at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:71)
    at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:62)
    at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:130)
    at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:126)
    at org.apache.spark.status.api.v1.OneStageResource.withStage(OneStageResource.scala:97)
    at org.apache.spark.status.api.v1.OneStageResource.withStageAttempt(OneStageResource.scala:126)
    at org.apache.spark.status.api.v1.OneStageResource.taskSummary(OneStageResource.scala:62)
    at sun.reflect.GeneratedMethodAccessor153.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
    at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
    ... 28 more
17/12/04 14:20:26 WARN HttpChannel: //ip-172-31-81-10.ec2.internal:4040/api/v1/applications/application_1512395256824_0002/stages/3/0/taskSummary?proxyapproved=true
javax.servlet.ServletException: java.util.NoSuchElementException: None.get
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
    at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
    at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
    at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)
    at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
    at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
    at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
    at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
    at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
    at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:461)
    at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
    at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at org.spark_project.jetty.server.Server.handle(Server.java:524)
    at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)
    at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
    at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
    at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)
    at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
    at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
    at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
    at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
    at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
    at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.NoSuchElementException: None.get
    at scala.None$.get(Option.scala:347)
    at scala.None$.get(Option.scala:345)
    at org.apache.spark.status.api.v1.MetricHelper.submetricQuantiles(AllStagesResource.scala:313)
    at org.apache.spark.status.api.v1.AllStagesResource$$anon$1.build(AllStagesResource.scala:178)
    at org.apache.spark.status.api.v1.AllStagesResource$.taskMetricDistributions(AllStagesResource.scala:181)
    at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:71)
    at org.apache.spark.status.api.v1.OneStageResource$$anonfun$taskSummary$1.apply(OneStageResource.scala:62)
    at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:130)
    at org.apache.spark.status.api.v1.OneStageResource$$anonfun$withStageAttempt$1.apply(OneStageResource.scala:126)
    at org.apache.spark.status.api.v1.OneStageResource.withStage(OneStageResource.scala:97)
    at org.apache.spark.status.api.v1.OneStageResource.withStageAttempt(OneStageResource.scala:126)
    at org.apache.spark.status.api.v1.OneStageResource.taskSummary(OneStageResource.scala:62)
    at sun.reflect.GeneratedMethodAccessor153.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
    at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)

看起来似乎工作没有出现问题,我成功地得到了计数。

不知道有没有人知道为什么会这样,以及如何消除它。

谢谢


不,我只有一个SparkContext。 - seiya
我也遇到了这个错误,如果您找到解决方法,我很想知道。谢谢! - Wolf Rendall
3
我刚升级到emr 5.10.0后遇到了这个问题。在5.9.0上运行完全相同的代码没有出现错误。在我的情况下,我正在使用scala,并实际上在一个选项上调用.get方法,所以该错误更有意义,但我总是以安全的方式进行调用,不应该出现此错误。此外,我的代码似乎仍然可以正常工作。 - Tim Gautier
1
当我在使用连接到我们公司数据中心的VPN连接的VPC中运行EMR集群时,我也遇到了这个错误。当我在默认VPC上运行相同的代码时,错误不会出现。我怀疑是一些DNS配置问题。 - Marius Ion
3
@Traveler,我在使用 EMR 5.11.0 时仍然看到这个错误。 - seiya
显示剩余8条评论
2个回答

18

如果要抑制那些警告信息,可以通过在/etc/spark/conf/log4j.properties中添加以下行来实现:

log4j.logger.org.spark_project.jetty.server.HttpChannel=ERROR

log4j.logger.org.spark_project.jetty.servlet.ServletHandler=ERROR

对于工作表现和稳定性,我没有看到任何影响。 现在我的日志更易读了 :)


@Guy Cohen 尽管在 EMR 5.12.0 中添加了上述行,我仍然收到这些警告消息(Spark 2.3.0)。我想告诉你,我已经发现这些行已经存在于文件中:log4j.logger.org.spark_project.jetty=WARNlog4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR。我应该删除它们吗? - y2k-shubham

6
如果你在使用AWS EMR与Spark进行操作,当执行Spark作业时,可能会遇到相同的错误信息:
- emr-5.10.0 - emr-5.11.0 - emr-5.11.1 - emr-5.12.0
只要降级到 emr-5.9.0,问题就可以解决。希望对你有所帮助。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接