按列名字母顺序对pandas数据框的子集进行排序

3

我遇到了一个相当简单的问题,但是却找不到解决方案。

我想要按字母顺序排列一个 pandas 数据帧中的某些列,该数据帧有超过100列(即太多了,我不想手动列出它们)。

示例数据帧:

import pandas as pd

subject = [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,4,4,4,4,4,4]
timepoint = [1,2,3,4,5,6,1,2,3,4,5,6,1,2,4,1,2,3,4,5,6]
c = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
d = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
a = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
b = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]

df = pd.DataFrame({'subject':subject,
                   'timepoint':timepoint,
                   'c':c,
                   'd':d,
                   'a':a,
                   'b':b})

df.head()

   subject  timepoint  c  d  a  b
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6


我该如何重新排列列名,以生成类似于df.head()的以下输出:
   subject  timepoint  a  b  c  d
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6

保留前两列的位置,将其余列按字母顺序排列。谢谢。
3个回答

6

您可以使用普通索引运算符[]基于列名对数据框进行拆分,使用sort_index(axis=1)对其他列按字母顺序排序,然后concat将它们重新连接起来:

>>> pd.concat([df[['subject','timepoint']],
           df[df.columns.difference(['subject', 'timepoint'])]\
               .sort_index(axis=1)],ignore_index=False,axis=1)

    subject  timepoint  a  b  c  d
0         1          1  2  2  2  2
1         1          2  3  3  3  3
2         1          3  4  4  4  4
3         1          4  5  5  5  5
4         1          5  6  6  6  6
5         1          6  7  7  7  7
6         2          1  3  3  3  3
7         2          2  4  4  4  4
8         2          3  1  1  1  1
9         2          4  2  2  2  2
10        2          5  3  3  3  3
11        2          6  4  4  4  4
12        3          1  5  5  5  5
13        3          2  4  4  4  4
14        3          4  5  5  5  5
15        4          1  8  8  8  8
16        4          2  4  4  4  4
17        4          3  5  5  5  5
18        4          4  6  6  6  6
19        4          5  2  2  2  2
20        4          6  3  3  3  3

1

请指定您想要保留的前两列(或从数据中确定它们),然后对所有其他列进行排序。使用.loc函数并正确地选择列表,然后可以“排序”DataFrame。

import numpy as np

first_cols = ['subject', 'timepoint']
#first_cols = df.columns[0:2].tolist()  # OR determine first two

other_cols = np.sort(df.columns.difference(first_cols)).tolist()

df = df.loc[:, first_cols+other_cols]

print(df.head())
   subject  timepoint  a  b  c  d
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6

1

您可以尝试将数据框的列作为列表获取,重新排列它们,并使用df = df[cols]将其重新分配回数据框。

import pandas as pd

subject = [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,4,4,4,4,4,4]
timepoint = [1,2,3,4,5,6,1,2,3,4,5,6,1,2,4,1,2,3,4,5,6]
c = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
d = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
a = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
b = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]

df = pd.DataFrame({'subject':subject,
                   'timepoint':timepoint,
                   'c':c,
                   'd':d,
                   'a':a,
                   'b':b})

cols = df.columns.tolist()
cols = cols[:2] + sorted(cols[2:])
df = df[cols]


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接