Spark: 如何从Spark shell中运行Spark文件

Question

Spark: 如何从Spark shell中运行Spark文件

scalaapache-sparkcloudera-cdhcloudera-manager

74

我正在使用CDH 5.2。我可以使用spark-shell运行命令。

如何运行包含spark命令的文件（file.spark）。
在CDH 5.2中有没有不使用sbt运行/编译Scala程序的方法？

- Ramakrishna

6个回答

116

从spark-shell加载外部文件只需执行以下操作：

从spark-shell加载外部文件只需执行以下操作

:load PATH_TO_FILE

这将调用您文件中的所有内容。

不过，对于您的SBT问题，我没有解决方案，很抱歉 :-)

- Steve

2

嗨，如果我在本地机器上有一个文件，这个命令可以工作，但是是否可以将该位置引用为HDFS路径呢？即：加载hdfs://localhost:9000/file。 - ǨÅVËĔŊ RĀǞĴĄŅ

它对我不起作用。我正在使用CDH 5.7快速启动虚拟机。 - Alex Raj Kaliamoorthy

12

您可以使用sbt或maven编译Spark程序，只需将Spark作为Maven依赖项添加即可。

<repository>
      <id>Spark repository</id>
      <url>http://www.sparkjava.com/nexus/content/repositories/spark/</url>
</repository>

然后是依赖项:

<dependency>
      <groupId>spark</groupId>
      <artifactId>spark</artifactId>
      <version>1.2.0</version>
</dependency>

在运行使用spark命令的文件时，您可以简单地执行以下操作:

echo"
   import org.apache.spark.sql.*
   ssc = new SQLContext(sc)
   ssc.sql("select * from mytable").collect
" > spark.input

现在运行命令脚本：

cat spark.input | spark-shell

- WestCoastProjects

2

对一个明显有用的回答进行负评至少应该给出你关注的解释。 - WestCoastProjects

9

为了更好地理解答案：

Spark-shell是一个scala repl

您可以输入:help以查看在scala shell中可能的操作列表。

scala> :help
All commands can be abbreviated, e.g., :he instead of :help.
:edit <id>|<line>        edit history
:help [command]          print this summary or command-specific help
:history [num]           show the history (optional num is commands to show)
:h? <string>             search the history
:imports [name name ...] show import history, identifying sources of names
:implicits [-v]          show the implicits in scope
:javap <path|class>      disassemble a file or class name
:line <id>|<line>        place line(s) at the end of history
:load <path>             interpret lines in a file
:paste [-raw] [path]     enter paste mode or paste a file
:power                   enable power user mode
:quit                    exit the interpreter
:replay [options]        reset the repl and replay all previous commands
:require <path>          add a jar to the classpath
:reset [options]         reset the repl to its initial state, forgetting all session entries
:save <path>             save replayable session to a file
:sh <command line>       run a shell command (result is implicitly => List[String])
:settings <options>      update compiler options, if possible; see reset
:silent                  disable/enable automatic printing of results
:type [-v] <expr>        display the type of an expression without evaluating it
:kind [-v] <expr>        display the kind of expression's type
:warnings                show the suppressed warnings from the most recent line which had any

:load 解释文件中的行

- loneStar

8

在 spark-shell 版本1.6.3 和 spark2-shell 版本2.3.0.2.6.5.179-4 上测试过，你可以直接将内容作为标准输入传递给shell，例如：

spark-shell <<< "1+1"

或者在您的使用场景中，

spark-shell < file.spark

- Phu Ngo

它可以工作，但是标准输出的输出基本上是您在输入spark-shell并输入文件中的所有行时看到的一切的重播。 - Merlin

0

您可以像运行Shell脚本一样运行它。以下是在命令行环境下的运行示例： ./bin/spark-shell：这是bin路径下spark-shell的路径 /home/fold1/spark_program.py：这是您的Python程序所在的路径。

因此：

./bin.spark-shell /home/fold1/spark_prohram.py

- amarnath pimple

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ziyao Li · Accepted Answer

164

在命令行中，你可以使用

spark-shell -i file.scala

运行写在 file.scala 中的代码

- Ziyao Li

9

谢谢，因为这不在spark shell -h中。 - hbogert

8

我尝试了这个命令，但它并没有运行文件中的代码，而是启动了Scala shell。 - Alex Raj Kaliamoorthy

7

@AlexRajKaliamoorthy 我可能会晚一些。只是想帮助你的评论/问题。它确实执行了，但您需要在脚本结尾处包含System.exit(0)，以便退出spark-shell。 - letsBeePolite

2

在Scala文件中，如果你定义了一个对象SparkTest{...}，你需要调用main SparkTest.main(args = Array())并像上面提到的那样使用System.exit(0)。 - Invincible

@Ziyao Li，当我输入spark-shell --help时，它没有显示-i选项，为什么？ - loneStar

显示剩余5条评论