Pandas datetime64列的中位数

Question

Pandas datetime64列的中位数

12

有没有一种方法可以计算并以datetime格式返回datetime列的中位数？我想在Python中计算一个datetime64[ns]格式的列的中位数。以下是该列的样本：

df['date'].head()

0   2017-05-08 13:25:13.342
1   2017-05-08 16:37:45.545
2   2017-01-12 11:08:04.021
3   2016-12-01 09:06:29.912
4   2016-06-08 03:16:40.422

名称：recency，数据类型：datetime64[ns]

我的目标是使中位数与上面的日期列具有相同的日期时间格式：

尝试转换为np.array：

median_ = np.median(np.array(df['date']))

但是那会抛出错误：

TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')

将数据类型转换为int64，然后计算中位数并尝试将返回格式转换为datetime是无效的。

df['date'].astype('int64').median().astype('datetime64[ns]')

- T-Jay

3个回答

6

那么只取中间值怎么样？

dates = list(df.sort('date')['date'])
print dates[len(dates)//2]

如果表格已经排序，你甚至可以跳过一行。

- kabanus

谢谢@kabanus。这个很好用。我没有想到要排序并使用列的长度。 - T-Jay

4

你很接近了，median() 返回一个 float，所以先将其转换为 int：

import math

median = math.floor(df['date'].astype('int64').median())

然后将表示日期的int转换为datetime64：

result = np.datetime64(median, "ns") #unit: nanosecond

- SalaryNotFound

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user394430 · Accepted Answer

你还可以尝试quantile(0.5)：

df['date'].astype('datetime64[ns]').quantile(0.5, interpolation="midpoint")