如何在pyspark中关闭科学计数法？

Question

如何在pyspark中关闭科学计数法？

14

由于某些聚合操作的结果，我得到了以下的SparkDataFrame：

 ------------+-----------------+-----------------+
|sale_user_id|gross_profit     |total_sale_volume|
+------------+-----------------+-----------------+
|       20569|       -3322960.0|     2.12569482E8|
|       24269|       -1876253.0|      8.6424626E7|
|        9583|              0.0|       1.282272E7|
|       11722|          18229.0|        5653149.0|
|       37982|           6077.0|        1181243.0|
|       20428|           1665.0|        7011588.0|
|       41157|          73227.0|        1.18631E7|
|        9993|              0.0|        1481437.0|
|        9030|           8865.0|      4.4133791E7|
|         829|              0.0|          11355.0|
+------------+-----------------+-----------------+

数据帧的模式为：

root
 |-- sale_user_id: string (nullable = true)
 |-- tapp_gross_profit: double (nullable = true)
 |-- total_sale_volume: double (nullable = true)

如何在毛利和总销售量列中禁用科学计数法？

- chessosapiens

2个回答

-4

DecimalType在Spark 3.0+中已被弃用。

如果是字符串类型，先转换为Double类型，然后最终转换为BigInt类型。不需要设置精度：

df.withColumn('total_sale_volume', df.total_sale_volume.cast(StringType).cast(BigIntType))

或者另一种方式是不需要导入:

df.withColumn('total_sale_volume', df.total_sale_volume.cast('string').cast('bigint'))

- navin.senguttuvan

在 Spark 3.0+ 中，DecimalType 并未被弃用。 - Tim Gautier

请参考Spark 3.0+中的DecimalType，详见此链接：https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.types.DecimalType.html。 - samkart

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mariusz · Accepted Answer

24

最简单的方法是将双列转换为十进制，给定适当的精度和比例：

df.withColumn('total_sale_volume', df.total_sale_volume.cast(DecimalType(18, 2)))

- Mariusz

有没有什么方法可以在不告知小数位数（指数）的情况下实现这一点？我的意思是，使其可以被推断出来？ - Bruno Ambrozio

@BrunoAmbrozio 你可以始终使用 .collect() 函数将DataFrame转化为纯Python对象，并且可以更加灵活地控制它们的打印输出（https://dev59.com/MXRB5IYBdhLWcg3wUV1d）。 - Mariusz

1

现在我需要做的事情基本上相同，但是要将值持久化到文件中，然而我无法设置精度。如果有人有解决方案，将不胜感激。这里是新问题的链接：https://stackoverflow.com/questions/64772851/how-to-load-big-double-numbers-in-a-pyspark-dataframe-and-persist-it-back-withou/64773207#64773207 - Bruno Ambrozio

1

DecimalType 也可能会使用科学计数法，这取决于其精度和比例。 - sabacherli