我已经写好了下面的代码,并且它能够很好地工作并产生应有的结果:如果没有给出,则使用计算<前一个c * b>填充。问题是,我必须将其应用于一个更大的数据集len(df.index) = ca. 10,000,因此我目前的函数不合适,因为我需要写几千次:
df['c'] = df.apply(func, axis =1)
。针对这个数据集大小,pandas
不支持使用while
循环。有什么建议吗?import pandas as pd
import numpy as np
import datetime
randn = np.random.randn
rng = pd.date_range('1/1/2011', periods=10, freq='D')
df = pd.DataFrame({'a': [None] * 10, 'b': [2, 3, 10, 3, 5, 8, 4, 1, 2, 6]},index=rng)
df["c"] =np.NaN
df["c"][0] = 1
df["c"][2] = 3
def func(x):
if pd.notnull(x['c']):
return x['c']
else:
return df.iloc[df.index.get_loc(x.name) - 1]['c'] * x['b']
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)