错误:SparkContext在Apache Spark 2.1.1中添加文件失败。

4

我已经使用Apache Spark有一段时间了,但现在在执行以下示例时(我刚升级到Spark 2.1.1),出现了以前从未发生过的错误:

./opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/bin/run-example SparkPi

这里是实际的堆栈跟踪信息:

    17/07/05 10:50:54 ERROR SparkContext: Failed to add file:/opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-warehouse/ to Spark environment
java.lang.IllegalArgumentException: Directory /opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-warehouse is not allowed for addJar
        at org.apache.spark.SparkContext.liftedTree1$1(SparkContext.scala:1735)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1729)
        at org.apache.spark.SparkContext$$anonfun$11.apply(SparkContext.scala:466)
        at org.apache.spark.SparkContext$$anonfun$11.apply(SparkContext.scala:466)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:466)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
        at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
        at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Pi is roughly 3.1433757168785843

我不确定这个是否是一个错误或者我有所遗漏,因为示例仍然被执行,你可以在结尾看到Pi is roughly...的结果。

下面是spark-env.sh的配置行:

export SPARK_MASTER_IP=X.X.X.X
export SPARK_MASTER_WEBUI_PORT=YYYY
export SPARK_WORKER_CORES=4
export SPARK_WORKER_MEMOiRY=7g

这里是 spark-defaults.sh 的配置行:
spark.master local[*]
spark.driver.cores 4
spark.driver.memory 2g
spark.executor.cores 4
spark.executor.memory 4g
spark.ui.showConsoleProgress false
spark.driver.extraClassPath /opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/lib/postgresql-9.4.1207.jar
spark.eventLog.enabled true
spark.eventLog.dir file:///opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/logs
spark.history.fs.logDirectory file:///opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/logs

Apache Spark版本:2.1.1

Java版本:1.8.0_91

Python版本:2.7.5

我尝试使用这个配置,但没有成功:

spark.sql.warehouse.dir file:///c:/tmp/spark-warehouse

很奇怪,因为当我编译脚本并使用spark-submit启动时,我没有遇到这个错误。没有找到任何JIRA票据或类似的东西。

2个回答

2

我的Java Spark代码也碰到过类似的问题。虽然你的问题是在Python-Spark中,但是这个方法或许能帮到你/其他人。

我需要使用--jar选项来指定一些依赖的jar包给Spark。最开始,我给了一个目录路径(即--jars <path-to-dependency>/),里面有所有的依赖jar包,结果出现了上述错误。

--jars选项(spark-submit的选项)似乎只接受实际的jar文件路径(<path-to-directory>/<name>.jar),而不是仅仅目录路径(<path-to-directory>/)。

当我将所有的依赖都放入单个依赖jar包中,并按如下将其指定给--jar选项时,问题得以解决:

bash ~/spark/bin/spark-submit --class "<class-name>" --jars '<path-to-dependency-jars/<dependency-jar>.jar' --master local <dependency-jar>.jar <input-val1> <input-vale2>


0

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接