注意:这不是提出一种新方法的答案,而是比较每种方法所需的执行时间
所有回答中的提议都相当“神奇”,都可以通过pandas/numpy的一行代码完成工作。无论如何,能够完成任务就是好的,但能够快速完成就更好了,因此我想比较每种方法的执行时间。
这是我的程序,在循环中,我修改数据框两次,以保持从一个回合到下一个回合不变(如果做法有问题,我不是你们Python程序员,所以提前抱歉):
import pandas as pd
import numpy as np
import time
df=pd.DataFrame({'ID' : [i for i in range(1,1000)],
'Area' : ['P' if (i & 1) else 'Q' for i in range(1,1000)],
'Stage' : [ 'X' if (i & 2) else 'Y' for i in range(1,1000)]})
t0=time.process_time()
for i in range(1,100):
df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('Q','q')
df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('q','Q')
print("Quang Hoang", '%.2f' % (time.process_time() - t0))
t0=time.process_time()
for i in range(1,100):
df.loc[df['Stage'] == 'X', 'Area'] = 'q'
df.loc[df['Stage'] == 'X', 'Area'] = 'Q'
print("Joe Ferndz", '%.2f' % (time.process_time() - t0))
t0=time.process_time()
for i in range(1,100):
df.loc[df['Area'].eq("Q") & df['Stage'].eq('X'),'Area']='q'
df.loc[df['Area'].eq("q") & df['Stage'].eq('X'),'Area']='Q'
print("anky 1", '%.2f' % (time.process_time() - t0))
t0=time.process_time()
for i in range(1,100):
df['Area'] = np.where(df['Area'].eq("Q") & df['Stage'].eq('X'),'q',df['Area'])
df['Area'] = np.where(df['Area'].eq("q") & df['Stage'].eq('X'),'Q',df['Area'])
print("anky 2", '%.2f' % (time.process_time() - t0))
t0=time.process_time()
for i in range(1,100):
df['Area']=np.where(df['Stage']=='X','q',df['Area'])
df['Area']=np.where(df['Stage']=='X','Q',df['Area'])
print("RavinderSingh13", '%.2f' % (time.process_time() - t0))
在我的树莓派4上,结果是:
Quang Hoang 1.60
Joe Ferndz 1.12
anky 1 1.55
anky 2 0.86
RavinderSingh13 0.38
如果我使用的数据框有10万行而不是1000行,结果会是:
Quang Hoang 10.79
Joe Ferndz 6.61
anky 1 10.91
anky 2 9.64
RavinderSingh13 4.75
请注意,Joe Ferndz和RavinderSingh13的提案假设Area仅为“P”或“Q”。