修改pandas数据框列的字符串值

Question

修改pandas数据框列的字符串值

3

在数据框中

df = pd.DataFrame({'c1': ['c10:b', 'c11', 'c12:k'], 'c2': ['c20', 'c21', 'c22']})

     c1    c2
0   c10:b  c20
1   c11    c21
2   c12:k  c22

我希望修改c1列的字符串值，使冒号及其之后的内容都被删除，最终结果如下：

     c1    c2
0   c10    c20
1   c11    c21
2   c12    c22

我尝试过切片

df[’c1’].str[:df[’c1’].str.find(’:’)]

但它不起作用。我该如何完成这个任务？

- leofer

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user3483203 · Accepted Answer

使用 regex=True 的 replace：

df.replace(r'\:.*', '', regex=True)

    c1   c2
0  c10  c20
1  c11  c21
2  c12  c22

如果只想在单列中替换此模式，请使用str访问器：

df.c1.str.replace(r'\:.*', '')

如果性能是一个问题，使用列表推导和partition代替pandas字符串方法：

[i.partition(':')[0] for i in df.c1]
# ['c10', 'c11', 'c12']

时序

df = pd.concat([df]*10000)

%timeit df.replace(r'\:.*', '', regex=True)
30.8 ms ± 340 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df.c1.str.replace(r'\:.*', '')
31.2 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df['c1'].str.partition(':')[0]
56.7 ms ± 269 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [i.partition(':')[0] for i in df.c1]
4.2 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)