使用pandas的apply函数处理日期和位移后的日期

Question

使用pandas的apply函数处理日期和位移后的日期

3

我有一个包含日期的DataFrame：

               Daycount   
Date                                                                       
2020-05-01         0      
2020-06-01         0        
2020-07-01         0          
2020-08-01         0         
2020-09-01         0

我尝试使用以下公式提取从一天到下一天的日计数：

def days360(start_date, end_date, method_eu=False):
        start_day = start_date.day
    start_month = start_date.month
    start_year = start_date.year
    end_day = end_date.day
    end_month = end_date.month
    end_year = end_date.year

    if start_day == 31 or (method_eu is False and start_month == 2 and (start_day == 29 or (start_day == 28 and calendar.isleap(start_year) is False))):
        start_day = 30

    if end_day == 31:
        if method_eu is False and start_day != 30:
            end_day = 1

            if end_month == 12:
                end_year += 1
                end_month = 1
            else:
                end_month += 1
        else:
            end_day = 30

    return end_day + end_month * 30 + end_year * 360 - start_day - start_month * 30 - start_year * 360

然而，我尝试按照以下方式使用apply函数，但是出现了以下错误：

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

在DataFrame中只传递一个值集合时，它可以工作，因此我的公式肯定是正确的。创建另一个带有移动日期的列，然后应用该公式确实可行，但我正在寻找更简洁的方法。不过，我对apply函数还不确定。我应该为所有daycount获得30天。

hypo["Daycount"] = hypo.apply(lambda x: days360(x.index,x.index.shift(-1)))

期望的输出应该是下面的表格：

        Date  Daycount
0 2020-05-01      30.0
1 2020-06-01      30.0
2 2020-07-01      30.0
3 2020-08-01      30.0
4 2020-09-01      30.0

- a.hilary

添加了目标输出。 - a.hilary

提供错误信息会有所帮助。我认为有问题的代码是您使用index.shift的方式。请检查df.index.shift方法签名——它有两个参数，periods和freq（默认为None，使用索引的freq），但如果您的索引没有设置freq属性，您将会收到一个错误提示，您可能需要指定freq='D'并检查您的DatetimeIndex索引是否已经设置了freq属性。 - predmod

我的频率基于“MS”，将其添加到移位函数中会导致与上面发布的相同的错误。 “ValueError：具有多个元素的数组的真值是模糊的。使用a.any（）或a.all（）”。 - a.hilary

你计算天数的逻辑是什么？我猜天数应该是 [0, 31.0, 30.0, 31.0, 31.0]。 - Shubham Sharma

@ShubhamSharma 我的日计数是基于一年360天。然而确实有一个错误，第一个应该是0，因为我向左移动了-1而不是1，尽管这不应该是问题。最终我必须得到所有日计数都是30，或者如果我按照你指出的正确移动，则为-30。 - a.hilary

显示剩余3条评论

2个回答

0

如果你想使用 .apply，你需要修改你的函数（或者基于你已经有的函数添加另一个函数），使其操作于 Series 对象上（而不是它们的元素）。请参考 pandas DataFrame apply 文档字符串 "Objects passed to the function are Series objects whose index is either ..."。

你可以通过使用列表推导式来避免使用 .apply 和 lambda。

df['derived'] = [ yourfunction(a,b) for a,b in zip(df.index, df.index.shift(-1)) ]

我相信有另一种方法可以将您的函数向量化，但至少这样做可以使您的代码工作。曾经有一段时间，Python 的关键人物强烈反对 lambda 表达式，并主张将其删除，因为总是有其他方法可以实现。

- predmod

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Shubham Sharma · Accepted Answer

使用pd.to_datetime将系列转换为类似于日期时间的系列，然后使用Series.dt访问系列的日期时间属性，然后在year，month和day组件上使用Series.diff以获得所需的结果：

df = df.reset_index()
dates = pd.to_datetime(df['Date'])
df['Daycount'] = (
    (dates.dt.year.diff() * 360 + dates.dt.month.diff() * 30 + dates.dt.day.diff()).fillna(0)
)

# print(df)
         Date  Daycount
0  2020-05-01       0.0
1  2020-06-01      30.0
2  2020-07-01      30.0
3  2020-08-01      30.0
4  2020-09-01      30.0

考虑另一个更复杂的数据帧的例子：

# Given dataframe
# print(df)
            Daycount
Date                
2020-05-01         0
2020-06-03         0
2020-07-01         0
2021-07-02         0
2022-08-03         0

# Desired result
# print(df)
         Date  Daycount
0  2020-05-01       0.0
1  2020-06-03      32.0
2  2020-07-01      28.0
3  2021-07-02     361.0
4  2022-08-03     391.0