如何在Zeppelin/Spark/Scala中美观地打印数据框？

Question

如何在Zeppelin/Spark/Scala中美观地打印数据框？

25

我在Zeppelin 0.7笔记本中使用Spark 2和Scala 2.11。我有一个数据框，可以像这样打印：

dfLemma.select("text", "lemma").show(20,false)

输出结果如下：

+---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|text                                                                                                                       |lemma                                                                                                                                                                  |
+---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|RT @Dope_Promo: When you and your crew beat your high scores on FUGLY FROG  https://time.com/Sxp3Onz1w8                    |[rt, @dope_promo, :, when, you, and, you, crew, beat, you, high, score, on, FUGLY, FROG, https://time.com/sxp3onz1w8]                                                      |
|RT @axolROSE: Did yall just call Kermit the frog a lizard?  https://time.com/wDAEAEr1Ay                                        |[rt, @axolrose, :, do, yall, just, call, Kermit, the, frog, a, lizard, ?, https://time.com/wdaeaer1ay]                                                                     |

我正在尝试通过以下方式优化Zeppelin的输出：

val printcols= dfLemma.select("text", "lemma")
println("%table " + printcols)

它会产生以下输出：

printcols: org.apache.spark.sql.DataFrame = [text: string, lemma: array<string>]

以及一个新的空白Zeppelin段落标题

[text: string, lemma: array]

有没有一种方法可以让数据框以漂亮格式的表格形式显示？TIA！

- schoon

3个回答

1

我知道这是一个旧的线程，但以防有所帮助...

以下是我能够显示部分 df 的唯一方法。尝试像评论中建议的那样添加第二个参数到 .show() 中会导致错误。

z.show(df.limit(10))

- Kilian O Carroll

0

在你的笔记本中添加以下行将在使用.show()方法时添加一个水平滚动条。这类似于Jupyter笔记本的样式技巧。

%sh echo "%html <style>.text.plainTextContent {white-space: pre;}<style>"

类似这样的内容：

- Samuel

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Daniel de Paula · Accepted Answer

81

在Zeppelin中，您可以使用z.show(df)来显示一个漂亮的表格。以下是一个示例：

val df = Seq(
  (1,1,1), (2,2,2), (3,3,3)
).toDF("first_column", "second_column", "third_column")

z.show(df)

- Daniel de Paula

不错。我不知道这一点，所以我为pyspark编写了自己的漂亮打印函数（利用“%table”）。然而，我在文档中找不到它... - akoeltringer

1

@TwUxTLi51Nus 这部分文档确实不是特别好。您可以在此处找到有关ZeppelinContext的一些信息：（https://zeppelin.apache.org/docs/latest/interpreter/spark.html#zeppelincontext），并且在代码中（https://github.com/apache/zeppelin/blob/branch-0.7/spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java）您可以查看所有可用的功能。此外，在笔记本上，您可以使用ctrl + space来检查z变量。 - Daniel de Paula

3

@schoon 没有问题！您可以使用第二个参数来限制行数：z.show(df, 10) - Daniel de Paula

1

我总是找到更复杂的方法。我做了这个：z.show(dfLemma.select("racist", "lemma").limit(20))。会尝试你的方法。 - schoon

在这里问了更多关于如何在Zeppelin Spark Scala中漂亮地打印WrappedArray的问题。 - schoon

显示剩余3条评论