我试图将一个字典:
为了实现这个目标,我尝试了以下方法:
有人可以告诉我如何在PySpark中将字典转换为Spark DataFrame吗?
data_dict = {'t1': '1', 't2': '2', 't3': '3'}
转换成一个数据框:key | value|
----------------
t1 1
t2 2
t3 3
为了实现这个目标,我尝试了以下方法:
schema = StructType([StructField("key", StringType(), True), StructField("value", StringType(), True)])
ddf = spark.createDataFrame(data_dict, schema)
但是我遇到了以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/session.py", line 748, in createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/session.py", line 413, in _createFromLocal
data = list(data)
File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/session.py", line 730, in prepare
verify_func(obj)
File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/types.py", line 1389, in verify
verify_value(obj)
File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/types.py", line 1377, in verify_struct
% (obj, type(obj))))
TypeError: StructType can not accept object 't1' in type <class 'str'>
我尝试了不指定模式,只指定列数据类型的方式:
ddf = spark.createDataFrame(data_dict, StringType())
ddf = spark.createDataFrame(data_dict, StringType(), StringType())
但是两种方法都会得到下面只包含字典键值的一个列的数据框:
+-----+
|value|
+-----+
|t1 |
|t2 |
|t3 |
+-----+
有人可以告诉我如何在PySpark中将字典转换为Spark DataFrame吗?