如何使用正则表达式从字符串中提取前两个字符

Question

如何使用正则表达式从字符串中提取前两个字符

pythonregexpandas

5

参考：Pandas DataFrame：从列中删除字符串中的不需要部分

参考上面链接中提供的答案。我研究了一些正则表达式，计划深入学习，但同时也需要帮助。

我的数据框如下：

df：

  c_contofficeID
0           0109
1           0109
2           3434
3         123434  
4         1255N9
5           0109
6         123434
7           55N9
8           5599
9           0109

伪代码

如果前两个字符是12，则删除它们。或者，对于前两个字符中没有12的情况，添加12。

结果如下:

  c_contofficeID
0           0109
1           0109
2           3434
3           3434  
4           55N9
5           0109
6           3434
7           55N9
8           5599
9           0109

我将使用上面链接中的答案作为起点:

df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')

I've tried the following:

Attempt 1)

df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'[1][2]',value=r'')

尝试 2)

df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'$[1][2]',value=r'')

尝试3）

df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'?[1]?[2]',value=r'')

- Dave

“^12”是指以“12”开头的正则表达式。 - OneCricketeer

1

如果你有一个字符串“1234”，那么在这种情况下应该保留“12”还是舍弃呢？ - Nathan Davis

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- piRSquared · Accepted Answer

新答案
来自 @Addison 的评论

# '12(?=.{4}$)' makes sure we have a 12 followed by exactly 4 something elses
df.c_contofficeID.str.replace('^12(?=.{4}$)', '')

如果ID必须有四个字符，那么更简单的方法是使用前导零来填充不足的位数。

df.c_contofficeID.str[-4:]

old answer
use str.replace

df.c_contofficeID.str.replace('^12', '').to_frame()