如何根据字母和数字对 Pandas 列进行排序？

Question

如何根据字母和数字对 Pandas 列进行排序？

4

我有一个pandas DataFrame，其中列出了96孔板或384孔板中的井，我希望对它们进行排序。这些井的标签如下：

A1, A2, A3, ..., A10, A11, A12, B1, B2, B3,...

在我的 Pandas 数据框中，按照 well 列排序后得到：

A1, A10, A11, A12, A2, A3, ...

然而，我希望上述排序顺序。

除了将该列拆分为字母列和数字列，并按两列排序外，是否有其他更智能或更简洁的替代方法？

- ericmjl

可能需要使用pandas DataFrame进行排序，而不是列表。虽然这个建议很有帮助，但我怀疑是否可以在pyjanitor中添加一个函数来使natsort与数据框一起工作。 - ericmjl

1

作为一个小提示，如果你需要一个默认排序的井ID列表，你可以简单地使用[letter+str(num) for letter in 'ABCDEFGH' for num in range(1, 13)]。 - jfaccioni

3个回答

0

解决方案：

对于标准的 Python list，您可以像这样做：

>>> my_list = ['R10', 'R2', 'R3', 'R4', 'R5', 'R6', 'R7', 'R8', 'R9', 'R1']
>>> my_list.sort(key=lambda x:(x[0], int(x[1:])))
>>> my_list 
['R1', 'R2', 'R3', 'R4', 'R5', 'R6', 'R7', 'R8', 'R9', 'R10']

对于 pandas DataFrame，您可以尝试这样做：

>>> df = pd.DataFrame({
...      "regions": ['R10', 'R2', 'R3', 'R4', 'R5', 'R6', 'R7', 'R8', 'R9', 'R1'],
...      "value": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
... })
... 

>>> df.sort_values(
...     by='regions',
...     inplace=True,
...     key=lambda x: x.str.extract('(\d+)').squeeze().astype(int)
... )
... 
>>> df
regions  value
9    R1  100
1    R2  20
2    R3  30
3    R4  40
4    R5  50
5    R6  60
6    R7  70
7    R8  80
8    R9  90
0    R10 10

您也可以查看这个 link。

- Aliakbar Hosseinzadeh

0

一个只使用pandas的解决方案。同时处理前缀文本长度可变的情况：Sales1，Region1，Product1等等。

# Extract the columns into a separate series and sort the series
s = df.columns.to_series()
s.index = s.str.extract('(\D+)(?P<num>\d+)').assign(num=lambda x: x['num'].astype('int'))
s.sort_index(inplace=True)

# Access the columns in sorted order. Note that you are not changing
# the dataframe at all
df[s]

- Code Different

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- anky · Accepted Answer

如果我理解正确，你可以尝试：

l = ['A1', 'A10', 'A3', 'A2', 'A11', 'A12', 'B1', 'B2', 'B3']
sorted(l,key = lambda x: (x[0],int(x[1:])))

或者 natsort：

import natsort as ns
ns.natsorted(l)

['A1', 'A2', 'A3', 'A10', 'A11', 'A12', 'B1', 'B2', 'B3']

['A1', 'A2', 'A3', 'A10', 'A11', 'A12', 'B1', 'B2', 'B3']