我正在使用pyspark dataframe。 我有一个名为words
(array<string>
)的列,如下所示:
+---+--------------------------------------------------------------------------------+
| id| words|
----+--------------------------------------------------------------------------------+
|012|[content, type, multipart, alternative, boundaries, nextpart, nextpart, drama,..|
|013|[received, from, am5eur02ht120, eop, eur02, prod, protection, outlook by, pro...|
|014|[data, care, much, important, information, summer, care, send, faraway, forget..|
此外,我有一个单词列表:
list = ["protection", "content", "received"]
我希望筛选出含有列表中任何值的行。
期望输出:
+---+--------------------------------------------------------------------------------+
| id| words|
----+--------------------------------------------------------------------------------+
|012|[content, type, multipart, alternative, boundaries, nextpart, nextpart, drama,..|
|013|[received, from, am5eur02ht120, eop, eur02, prod, protection, outlook by, pro...|
Py4JJavaError: 在调用z:org.apache.spark.sql.functions.lit时发生错误。: java.lang.RuntimeException: 不支持的字面类型class java.util.ArrayList
。 - Samikshalist_col = F.array(*[F.lit(cl) for cl in list])
发生了什么? - Samiksha