Spark 2.0 - java.io.IOException: 无法运行程序"jupyter": 错误=2，没有那个文件或目录

Question

Spark 2.0 - java.io.IOException: 无法运行程序"jupyter": 错误=2，没有那个文件或目录

5

我正在使用jupyter笔记本尝试spark。

一旦在我的笔记本中，我尝试了Kmean：

from pyspark.ml.clustering import KMeans
from sklearn               import datasets
import pandas as pd

spark = SparkSession\
        .builder\
        .appName("PythonKMeansExample")\
        .getOrCreate()

iris       = datasets.load_iris()
pd_df      = pd.DataFrame(iris['data'])
spark_df   = spark.createDataFrame(pd_df, ["features"])
estimator  = KMeans(k=3, seed=1)

一切都很顺利，然后我拟合模型：

estimator.fit(spark_df)

我遇到了一个错误:

16/08/16 22:39:58 ERROR Executor: Exception in task 0.2 in stage 0.0 (TID 24)
java.io.IOException: Cannot run program "jupyter": error=2, No such file or directory

Caused by: java.io.IOException: error=2, No such file or directory

Spark在哪里寻找Jupyter？如果我可以使用Jupyter Notebook，为什么它找不到它？应该怎么办？

- Romain Jouin

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- fandyst · Accepted Answer

根据代码在https://github.com/apache/spark/blob/master/python/pyspark/context.py#L180中的描述

self.pythonExec = os.environ.get("PYSPARK_PYTHON", 'python')

我认为这个错误是由环境变量PYSPARK_PYTHON引起的，它指示每个Spark节点的Python位置。当启动pyspark时，来自sys env的PYSPARK_PYTHON将被注入到所有spark节点中，以便...

it can be solved by
```
export PYSPARK_PYTHON=/usr/bin/python
```
which are the same version on diff nodes. and then start:
```
pyspark
```
if there is diff versions of python among local and diff nodes of cluster, another version conflicts error will occur.
the version of the interactive python which you work in should be the same version with other nodes in cluster.