Pandas: 如何用字符串替换一个范围内的值？

Question

Pandas: 如何用字符串替换一个范围内的值？

3

我想要替换某个范围内的值为其他值。

我有一个包含字符作为键和上限范围作为值的字典，如下所示 -

replace_dict = {
        'A': 10, 
        'B': 21, 
        'C': 34, 
        'D': 49, 
        'E': 66, 
        'F': 85, 
        'G': 107, 
        'H': 132, 
        'I': 160, 
        'J': 192, 
        'K': 229, 
        'L': 271, 
        'M': 319, 
        'N': 395, 
        'O': 495, 
        'P': 595, 
        'Q': 795, 
        'R': 1100
}

我需要用落在范围内的相应键来替换值。

例如：

Values in the range of 1-10 will be replaced by 'A',
Values in the range of 11-21 will be replaced by 'B'
Values in the range of 22-34 will be replaced by 'C'
Values in the range of 35-50 will be replaced by 'D'
Values in the range of 51-66 will be replaced by 'E'

我已经写了以下代码：

k=1
for i, j in replace_dict.items():
    data.loc[data['my_col'].between(k,j)] = i
    k=j+1

这段代码显示了一个错误：TypeError: '>=' not supported between instances of 'str' and 'int'。

然而，这行代码data.loc[data['my_col'].between(1,10)] = 'A'却能够正常运行。

有什么好的解决方案吗？

- Abdullah Al Imran

只需交换 i 和 j。 - bonnal-enzo

data['my_col'] 的数据类型是 str 吗？尝试使用 data['my_col'],astype('int32').between(k,j)。 - wwii

2个回答

1

你可以使用所需的范围创建一个单独的DataFrame，并使用intervalIndex映射。设置

ranges = pd.DataFrame(replace_dict, index=['STOP']).T.reset_index()
ranges['START'] = (ranges.STOP.shift(1)+1).fillna(1)
ranges.index = pd.IntervalIndex.from_arrays(ranges.START, ranges.STOP, closed='both')

                index  STOP  START
[1.0, 10.0]         A    10    1.0
[11.0, 21.0]        B    21   11.0
[22.0, 34.0]        C    34   22.0
[35.0, 49.0]        D    49   35.0
[50.0, 66.0]        E    66   50.0
etc...

使用您的intervalIndex创建地图。

df = pd.DataFrame({'nums': np.random.randint(1, 1000, 10)})
   nums
0   699
1   133
2   829
3   299
4   306
5   691
6   172
7   225
8   522
9   671

df.nums.map(ranges['index'])

0    Q
1    I
2    R
3    M
4    M
5    Q
6    J
7    K
8    P
9    Q

- user3483203

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jpp · Accepted Answer

您可以使用pandas.cut。需要注意以下几点：

我们使用字典键和值的顺序是一致的。
我们明确提供了bins和labels；请注意，labels必须比bins少一个项目。
您可能希望为大于1100的值添加一个额外的容器。

这是一个最简示例。

df = pd.DataFrame({'col': [500, 123, 56, 12, 1000, 2, 456]})

df['mapped'] = pd.cut(df['col'],
                      bins=[1]+list(replace_dict.values()),
                      labels=list(replace_dict.keys()))

print(df)

    col mapped
0   500      P
1   123      H
2    56      E
3    12      B
4  1000      R
5     2      A
6   456      O