Spark scala将Unix时间转换为时间戳失败

4

我在将UNIX时间转换为时间戳方面遇到了问题。

我有一个数据框,其中一列是PosTime。我想将它转换为Timestamp,但它只能半成功。你能帮忙吗?

scala> adsb.printSchema()
root
 |-- Icao: string (nullable = true)
 |-- Alt: long (nullable = true)
 |-- Lat: double (nullable = true)
 |-- Long: double (nullable = true)
 |-- PosTime: long (nullable = true)
 |-- Spd: double (nullable = true)
 |-- Trak: double (nullable = true)
 |-- Type: string (nullable = true)
 |-- Op: string (nullable = true)
 |-- Cou: string (nullable = true)

scala> adsb.show(50)
+------+------+---------+----------+-------------+-----+-----+----+--------------------+--------------------+
|  Icao|   Alt|      Lat|      Long|      PosTime|  Spd| Trak|Type|                  Op|                 Cou|
+------+------+---------+----------+-------------+-----+-----+----+--------------------+--------------------+
|ABECE7|  4825|40.814442| -111.9776|1506875131778|197.0|356.0|B739|     Delta Air Lines|       United States|
|4787B0| 38000|     null|      null|         null| null| null|B738|           Norwegian|              Norway|
|D3B18A|  4222|     null|      null|         null| null| null|null|                null|Unknown or unassi...|
|3C3F78|118400|     null|      null|         null| null| null|null|                null|             Germany|
|AA1C45|   -75|40.695969|-74.166321|1506875131747|157.4| 25.6|null|                null|       United States|
 scala> val adsb1 = adsb.withColumn("PosTime", $"PosTime".cast(TimestampType))

scala> adsb_sort.show(100)
+------+-------+---------+---------+--------------------+-------+-------+----+----+--------------------+
|  Icao|    Alt|      Lat|     Long|             PosTime|    Spd|   Trak|Type|  Op|                 Cou|
+------+-------+---------+---------+--------------------+-------+-------+----+----+--------------------+
|FFFFFF|   null|     null|     null|                null|   null|   null|null|null|Unknown or unassi...|
|FFFFFF|1049093|      0.0|      0.0|49800-05-04 14:39...|28672.0| 1768.7|null|null|Unknown or unassi...|
|FFFFFF|  12458|      0.0|      0.0|49800-12-11 06:39...|    0.0| 2334.4|null|null|Unknown or unassi...|
1个回答

13

Spark 将 Long 解释为以秒为单位的时间戳,但数据看起来是以毫秒为单位的:

scala> spark.sql("SELECT CAST(1506875131778 / 1000 AS timestamp)").show
+-------------------------------------------------------------------------+
|CAST((CAST(1506875131778 AS DOUBLE) / CAST(1000 AS DOUBLE)) AS TIMESTAMP)|
+-------------------------------------------------------------------------+
|                                                     2017-10-01 18:25:...|
+-------------------------------------------------------------------------+

如果我是正确的,只需除以1000:

adsb.withColumn("PosTime", ($"PosTime" / 1000).cast(TimestampType))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接