Spark 2.0 - java.io.IOException: 无法运行程序"jupyter": 错误=2,没有那个文件或目录

5

我正在使用jupyter笔记本尝试spark。

一旦在我的笔记本中,我尝试了Kmean:

from pyspark.ml.clustering import KMeans
from sklearn               import datasets
import pandas as pd

spark = SparkSession\
        .builder\
        .appName("PythonKMeansExample")\
        .getOrCreate()

iris       = datasets.load_iris()
pd_df      = pd.DataFrame(iris['data'])
spark_df   = spark.createDataFrame(pd_df, ["features"])
estimator  = KMeans(k=3, seed=1)

一切都很顺利,然后我拟合模型:

estimator.fit(spark_df)

我遇到了一个错误:
16/08/16 22:39:58 ERROR Executor: Exception in task 0.2 in stage 0.0 (TID 24)
java.io.IOException: Cannot run program "jupyter": error=2, No such file or directory

Caused by: java.io.IOException: error=2, No such file or directory

Spark在哪里寻找Jupyter?如果我可以使用Jupyter Notebook,为什么它找不到它?应该怎么办?

1个回答

5

根据代码在https://github.com/apache/spark/blob/master/python/pyspark/context.py#L180中的描述

self.pythonExec = os.environ.get("PYSPARK_PYTHON", 'python')

我认为这个错误是由环境变量PYSPARK_PYTHON引起的,它指示每个Spark节点的Python位置。当启动pyspark时,来自sys env的PYSPARK_PYTHON将被注入到所有spark节点中,以便...

  1. it can be solved by

    export PYSPARK_PYTHON=/usr/bin/python
    

    which are the same version on diff nodes. and then start:

    pyspark
    
  2. if there is diff versions of python among local and diff nodes of cluster, another version conflicts error will occur.

  3. the version of the interactive python which you work in should be the same version with other nodes in cluster.


3
当我在YARN上运行Spark时,遇到了相同的错误。我的情况下export PYSPARK_PYTHON无效,有什么解决办法吗? - rosefun

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接