应用程序报告对于Spark Submit(使用Spark 1.2.0 on YARN)的应用程序(状态:已接受)永远不会结束。

51

我正在运行 Kinesis 加 Spark 应用程序 https://spark.apache.org/docs/1.2.0/streaming-kinesis-integration.html

以下是在 EC2 实例上的运行命令:

 ./spark/bin/spark-submit --class org.apache.spark.examples.streaming.myclassname --master yarn-cluster --num-executors 2 --driver-memory 1g --executor-memory 1g --executor-cores 1  /home/hadoop/test.jar 

我已在EMR上安装了Spark。

EMR details
Master instance group - 1   Running MASTER  m1.medium   
1

Core instance group - 2 Running CORE    m1.medium

我收到以下INFO并且它从未停止。

15/06/14 11:33:23 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
15/06/14 11:33:23 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2048 MB per container)
15/06/14 11:33:23 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
15/06/14 11:33:23 INFO yarn.Client: Setting up container launch context for our AM
15/06/14 11:33:23 INFO yarn.Client: Preparing resources for our AM container
15/06/14 11:33:24 INFO yarn.Client: Uploading resource file:/home/hadoop/.versions/spark-1.3.1.e/lib/spark-assembly-1.3.1-hadoop2.4.0.jar -> hdfs://172.31.13.68:9000/user/hadoop/.sparkStaging/application_1434263747091_0023/spark-assembly-1.3.1-hadoop2.4.0.jar
15/06/14 11:33:29 INFO yarn.Client: Uploading resource file:/home/hadoop/test.jar -> hdfs://172.31.13.68:9000/user/hadoop/.sparkStaging/application_1434263747091_0023/test.jar
15/06/14 11:33:31 INFO yarn.Client: Setting up the launch environment for our AM container
15/06/14 11:33:31 INFO spark.SecurityManager: Changing view acls to: hadoop
15/06/14 11:33:31 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/06/14 11:33:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/06/14 11:33:31 INFO yarn.Client: Submitting application 23 to ResourceManager
15/06/14 11:33:31 INFO impl.YarnClientImpl: Submitted application application_1434263747091_0023
15/06/14 11:33:32 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:32 INFO yarn.Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1434281611893
         final status: UNDEFINED
         tracking URL: http://172.31.13.68:9046/proxy/application_1434263747091_0023/
         user: hadoop
15/06/14 11:33:33 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:34 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:35 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:36 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:37 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:38 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:39 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:40 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:41 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)

有人可以告诉我为什么它不起作用吗?


也许移除 setMaster("local[*]") - Hille
13个回答

0
在一个实例中,我遇到了这个问题,因为我请求了太多的资源。这是在一个小的独立集群上。原始命令是:
spark-submit --driver-memory 4G --executor-memory 7G -class "my.class" --master yarn --deploy-mode cluster --conf spark.yarn.executor.memoryOverhead my.jar

我通过更改代码成功地从“已接受”进入“运行”状态。

spark-submit --driver-memory 1G --executor-memory 3G -class "my.class" --master yarn --deploy-mode cluster --conf spark.yarn.executor.memoryOverhead my.jar

在其他情况下,我遇到了这个问题,是因为代码编写的方式不同。我们在使用它的类中实例化了Spark上下文,并且没有关闭它。我们通过首先实例化上下文,将其传递给数据并行化等类,然后在调用类中关闭上下文(sc.close())来解决了这个问题。

3
--conf spark.yarn.executor.memoryOverhead。没有值? - tokland

0

遇到了类似的问题

正如其他答案所指出的,这是一个资源可用性问题

在我的情况下,我正在进行一个 ETL 过程,旧数据每次都会被删除。然而,新删除的数据被存储在控制用户的 /user/myuser/.Trash 文件夹中。查看 Ambari 仪表板,我可以看到整个 HDFS 磁盘使用率接近容量极限,这导致了资源问题。

因此,在这种情况下,使用 -skipTrash 选项来 hadoop fs -rm ... 旧数据文件(否则将占用垃圾箱中所有存储在 ETL 存储目录中的数据大小相当的空间(有效地将应用程序使用的总空间翻倍并导致资源问题))。


-1

当我尝试执行pyspark shell时,在clouera quickstart VM中遇到了相同的问题。当我在资源管理器中查看作业日志时,我看到了...

17/02/18 22:20:53 ERROR yarn.ApplicationMaster: Failed to connect to driver at RM IP. 

这意味着作业无法连接到 RM(资源管理器),因为默认情况下 pyspark 尝试在 Cloudera VM 中以 yarn 模式启动。
pyspark --master local 

对我有用。甚至启动 RM 也解决了问题。

谢谢


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接