给定这个数据框:
df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
我希望按照这个顺序对A列进行排序:七月,八月,九月。是否有一种方法可以使用类似于“sort_values”这样的排序函数,但预先定义排序顺序?
使用Categorical
df.A=pd.Categorical(df.A,categories=['July', 'August', 'Sept'])
df=df.sort_values('A')
df
Out[310]:
A B
1 July 3
0 August 2
2 Sept 6
在字典中定义顺序并根据它进行排序。
sort_dict = {'July':0,'August':1,'Sept':2}
df.iloc[df['A'].map(sort_dict).sort_values().index]
输出
A B
1 July 3
0 August 2
2 Sept 6
df = df.sort_values('A', key=lambda s: s.apply(['July', 'August', 'Sept'].index), ignore_index=True)
s.apply
结构,而文档中也没有提供示例! - Praveendf = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
df
import calendar
df = df.replace({'Sept':'September'})
calendar.month_name[1:]
输出:
['January',
'February',
'March',
'April',
'May',
'June',
'July',
'August',
'September',
'October',
'November',
'December']
df['A'] = pd.Categorical(df.A, categories=calendar.month_name[1:], ordered=True)
df.sort_values('A')
输出:
A B
1 July 3
0 August 2
2 September 6
或者使用calendar.month_abbr
calendar.month_abbr[1:]
输出:
['Jan',
'Feb',
'Mar',
'Apr',
'May',
'Jun',
'Jul',
'Aug',
'Sep',
'Oct',
'Nov',
'Dec']
df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
value_map = {'August': 1, 'July': 0, 'Sept': 2}
def sort_by_key(df, col, value_map):
df = df.assign(sort = lambda df: df[col].map(value_map))
return df.sort_values('sort') \
.drop('sort', axis='columns')
sort_by_key(df, 'A', value_map)
A B
1 July 3
0 August 2
2 Sept 6
暂时将字符串月份转换为日期时间并排序
df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
df['tmp'] = pd.to_datetime(df['A'].str[:3], format='%b').dt.month
df.sort_values(by = ['tmp']).drop('tmp', 1)
A B
1 July 3
0 August 2
2 Sept 6
import pandas as pd
df = pd.DataFrame([['August', 2], ['July', 3], ['September', 6]], columns=['A', 'B'])
full_month_list = pd.date_range('2018-01-01','2019-01-01', freq='MS').strftime("%B").tolist()
partial_month_list = [x for x in month_list if x in df['A'].values]
df['A'] = pd.Categorical(df['A'], partial_month_list)
df.sort_values('A')
结果为:
A B
1 July 3
0 August 2
2 September 6
df.A=df.A.astype('category')
@sparrowpd.DataFrame(data=df.A, dtype='category')
- BENY