如何按照一列对pandas dataframe进行排序

Question

如何按照一列对pandas dataframe进行排序

645

我有一个数据框，像这样：

print(df)

        0          1     2
0   354.7      April   4.0
1    55.4     August   8.0
2   176.5   December  12.0
3    95.5   February   2.0
4    85.6    January   1.0
5     152       July   7.0
6   238.7       June   6.0
7   104.8      March   3.0
8   283.5        May   5.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0

如您所见，月份并没有按照日历的顺序排列。因此我创建了第二列来获取每个月对应的月份数字（1-12）。从那里开始，我该如何根据日历月份的顺序对这个数据框进行排序？

- Sachila Ranawaka

14个回答

287

我尝试了上面的解决方案，但没有得到结果，所以我找到了一个不同的解决方案，适用于我。 ascending=False是将数据框按降序排序，默认情况下为True。我使用的是Python 3.6.6和Pandas 0.23.4版本。

final_df = df.sort_values(by=['2'], ascending=False)

您可以在pandas文档这里中查看更多细节。

- Joel Carneiro

75

使用列名对我有效。

sorted_df = df.sort_values(by=['Column_name'], ascending=True)

- Niraj

40

熊猫的sort_values可以完成这项工作。有各种参数可以传递，例如ascending（bool或bool列表）：

升序还是降序。指定多个排序顺序的列表。如果这是布尔列表，则必须与by的长度匹配。

由于默认为升序，且OP的目标是升序排序，因此不需要指定该参数（有关解决降序的方法，请参见下面的最后一条注释），因此可以使用以下其中一种方式：

Performing the operation in-place, and keeping the same variable name. This requires one to pass inplace=True as follows:

df.sort_values(by=['2'], inplace=True)

# or

df.sort_values(by = '2', inplace = True)

# or

df.sort_values('2', inplace = True)

If doing the operation in-place is not a requirement, one can assign the change (sort) to a variable:
- With the same name of the original dataframe, df as
```
df = df.sort_values(by=['2'])
```
- With a different name, such as df_new, as
```
df_new = df.sort_values(by=['2'])
```

所有这些先前的操作将产生以下输出。

        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5     152       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

最后，可以使用pandas.DataFrame.reset_index重置索引，得到以下结果：

df.reset_index(drop = True, inplace = True)

# or

df = df.reset_index(drop = True)

[Out]:

        0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

一行代码可以实现升序排序并重置索引，如下所示：

df = df.sort_values(by=['2']).reset_index(drop = True)

[Out]:

        0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

注意：

If one is not doing the operation in-place, forgetting the steps mentioned above may lead one (as this user) to not be able to get the expected result.
There are strong opinions on using inplace. For that, one might want to read this.
One is assuming that the column 2 is not a string. If it is, one will have to convert it:
- Using pandas.to_numeric
```
 df['2'] = pd.to_numeric(df['2'])
```
- Using pandas.Series.astype
```
 df['2'] = df['2'].astype(float)
```

If one wants in descending order, one needs to pass ascending=False as

 df = df.sort_values(by=['2'], ascending=False)

 # or

 df.sort_values(by = '2', ascending=False, inplace=True)

 [Out]:

        0          1     2
2   176.5   December  12.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0
1    55.4     August   8.0
5     152       July   7.0
6   238.7       June   6.0
8   283.5        May   5.0
0   354.7      April   4.0
7   104.8      March   3.0
3    95.5   February   2.0
4    85.6    January   1.0

- Gonçalo Peres

28

就像另一种解决方案：

不需要创建第二列，您可以对字符串数据（月份名称）进行分类，并像这样进行排序：

df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)

在创建 Categorical 对象时，它将按照你指定的月份名称为你提供有序的数据。

- alireza yazdandoost

12

只需对数据进行一些额外的操作。假设我们有一个数据框df，我们可以执行几个操作以获得所需的输出。

ID         cost      tax    label
1       216590      1600    test      
2       523213      1800    test 
3          250      1500    experiment

(df['label'].value_counts().to_frame().reset_index()).sort_values('label', ascending=False)

将会给出一个标签按照顺序排列的数据框输出

    index   label
0   test        2
1   experiment  1

- Hari_pb

11

这对我有效

df.sort_values(by='Column_name', inplace=True, ascending=False)

- suzanne chen

9

你可能需要在排序后重置索引：

df = df.sort_values('2')
df = df.reset_index(drop=True)

- mojtaba rezaei

1

df.sort_values('2', inplace=True)会完成这个任务。 - undefined

8

以下是 Pandas 文档中关于 sort_values 的模板。

DataFrame.sort_values(by, axis=0,
                          ascending=True,
                          inplace=False,
                          kind='quicksort',
                          na_position='last',
                          ignore_index=False, key=None)[source]

在这种情况下，代码如下所示：

df.sort_values(by=['2'])

API 参考文献：pandas.DataFrame.sort_values

- Nafees Ahmad

7

再添加一些见解

df=raw_df['2'].sort_values() # will sort only one column (i.e 2)

但是，

df =raw_df.sort_values(by=["2"] , ascending = False)  # this  will sort the whole df in decending order on the basis of the column "2"

- Prateek Mohapatra

如果 ['2'] 能够工作，那么 2 就是一个字符；而如果 [2] 能够工作，那么 2 就是整数。这是唯一的区别。 - Prateek Mohapatra

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- EdChum · Accepted Answer

使用sort_values按特定列的值对数据框进行排序：

In [18]:
df.sort_values('2')

Out[18]:
        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5   152.0       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

如果您想按两列排序，请将列标签的列表传递给sort_values，其中列标签根据排序优先级进行排序。如果使用df.sort_values(['2', '0'])，则结果将按列2然后按列0排序。尽管如此，在这个例子中这样做并没有什么实际意义，因为df['2']中的每个值都是唯一的。