在pandas数据框中更改类别名称。

Question

在pandas数据框中更改类别名称。

7

我想知道是否有办法更改pandas dataframe中的类别名称。我尝试使用labels.rename_categories({'zero': '0', 'one': '1', 'two': '2', 'three': '3', 'four': '4', 'five': '5', 'six': '6', 'seven': '7', 'eight': '8', 'nine': '9'})，但遗憾的是它没有起作用。

下面是当前pandas dataframe的样子：

                              File  Label
20936  eight/b63fea9e_nohash_1.wav  eight
21016  eight/f44f440f_nohash_2.wav  eight
7423   three/d8ed3745_nohash_0.wav  three
1103    zero/ad63d93c_nohash_4.wav   zero
13399   five/5b09db89_nohash_0.wav   five
...                            ...    ...
13142   five/1a892463_nohash_0.wav   five
21176  eight/810c99be_nohash_0.wav  eight
16908  seven/6d818f6c_nohash_0.wav  seven
15308    six/2bfe70ef_nohash_1.wav    six
646     zero/24632875_nohash_0.wav   zero

[23666 rows x 2 columns]

- Loai Alnouri

4个回答

2

试试这个


label_dict = {'zero': 0,
        'one' : 1,
        'two': 2,
        'three' : 3,
        'four': 4,
        'five': 5,
        'six' : 6,
        'seven' : 7,
        'eight' : 8,
        'nine' : 9,
        }
df['Label'] = df['Label'].apply( lambda x : label_dict[x])

- Nk03

1

你可以使用.replace()，并将你的字典作为to_replace参数。

这是文档。

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html

- acrobat

1

通过更改类别名称，您的意思是用字典替换值。我理解得对吗？

试试这样做：

df = df["label"].replace({
    'zero': '0', 
    'one': '1', 
    'two': '2', 
    'three': '3', 
    'four': '4', 
    'five': '5', 
    'six': '6', 
    'seven': '7', 
    'eight': '8', 
    'nine': '9'
  }
)

- dpkandy

谢谢，已排序！我使用了

df['Labels'] = df['Labels'].replace({     'zero': '0',      'one': '1',      'two': '2',      'three': '3',      'four': '4',      'five': '5',      'six': '6',      'seven': '7',      'eight': '8',      'nine': '9'   } )

。我使用了 df['Labels'] 来避免用重命名的类别覆盖数据框中的其余部分。 - Loai Alnouri

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- tdy · Accepted Answer

简述

对于分类变量，请使用Series.cat.rename_categories。
对于非分类变量，请使用Series.map。
如果需要正则表达式，请使用Series.replace。

1. `Series.cat.rename_categories`

这个选项是最快的，但需要Categorical dtype。如果你分析分类变量，因其速度/内存/语义优势而强烈推荐使用。

首先将其转换为Categorical（如果还没有）：

df['Label'] = df['Label'].astype('category')

然后通过Series.cat.rename_categories进行重命名：

df['Label'] = df['Label'].cat.rename_categories({'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9})

#                               File Label
# 20936  eight/b63fea9e_nohash_1.wav     8
# 21016  eight/f44f440f_nohash_2.wav     8
# 7423   three/d8ed3745_nohash_0.wav     3
# ...                            ...   ...
# 646     zero/24632875_nohash_0.wav     0

2. `Series.map`

如果您不能（或不想）使用{{link2：Categorical}} dtype，则{{link1：Series.map}}是接下来最快的选项：

df['Label'] = df['Label'].map({'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9})

#                               File Label
# 20936  eight/b63fea9e_nohash_1.wav     8
# 21016  eight/f44f440f_nohash_2.wav     8
# 7423   three/d8ed3745_nohash_0.wav     3
# ...                            ...   ...
# 646     zero/24632875_nohash_0.wav     0

3. `Series.replace`

这个选项虽然速度较慢，但可以通过regex和method参数提供正则表达式/填充功能。

举个假设的例子，假设我们想要更少细节的标签:

mapping = {
    r'zero|one': '0,1',
    r'two|three': '2,3',
    r'four|five': '4,5',
    r'six|seven': '6,7',
    r'eight|nine': '8,9',
}

然后我们可以使用Series.replace与regex=True：

df['Label'] = df['Label'].replace(mapping, regex=True)

#                               File Label
# 20936  eight/b63fea9e_nohash_1.wav   8,9
# 7423   three/d8ed3745_nohash_0.wav   2,3
# 1103    zero/ad63d93c_nohash_4.wav   0,1
# ...                            ...   ...
# 646     zero/24632875_nohash_0.wav   0,1

在pandas数据框中更改类别名称。

简述

1. Series.cat.rename_categories

2. Series.map

3. Series.replace

1. `Series.cat.rename_categories`

2. `Series.map`

3. `Series.replace`