Python的pandas库无法识别特殊字符

Question

Python的pandas库无法识别特殊字符

pythonpandasspecial-characterscontains

4

我正在尝试在Python Pandas中使用df['column_name'].str.count("+")，但是我收到了错误信息：

"error: nothing to repeat"

用普通字符使用这个方法是可行的，例如df['column_name'].str.count("a")可以正常工作。

同时，使用 "^" 符号也会出现问题。如果我使用df['column_name'].str.contains("^")，结果会不正确 - 看起来 "^" 被解释为 " "（空格）。

令人惊讶的是，如果我在常规的非Pandas字符串上使用.count("+")和.contains("^")，它们可以完美地工作。

以下是简单的工作示例：

df = pd.DataFrame({'column1': ['Nighthawks+', 'Dragoons'], 'column2': ['1st', '2nd']}, columns = ['column1', 'column2'])

应用df["column1"].str.contains("^")将得到"True, True"，但应该是"False, False"。

而应用df["column1"].str.count("+")会出现错误：

"error: nothing to repeat"

但是，在panda之外，"bla++".count("+")可以正确地给出结果"2"。

有什么解决方法吗？谢谢。

- NeStack

2个回答

6

在str.count()中，对于特殊字符，您需要使用反斜杠作为正则表达式模式（有关详细信息，请参见@EdChum）。

另一方面，在str.contains()中，我们不需要对正则表达式模式使用反斜杠。只需要添加regex=False参数，例如df['a'].str.contains("+", regex=False))来搜索并查找包含特殊字符的字符串。

- msklc

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- EdChum · Accepted Answer

您需要转义加号：

In[10]:
df = pd.DataFrame({'a':['dsa^', '^++', '+++','asdasads']})
df

Out[10]: 
          a
0      dsa^
1       ^++
2       +++
3  asdasads

In[11]:
df['a'].str.count("\+")

Out[11]: 
0    0
1    2
2    3
3    0
Name: a, dtype: int64

当您执行 df['a'].str.count('^') 时，所有行都只返回1：

In[12]:
df['a'].str.count('^')

Out[12]: 
0    1
1    1
2    1
3    1
Name: a, dtype: int64

再次需要转义模式：

In[16]:
df['a'].str.count('\^')

Out[16]: 
0    1
1    1
2    0
3    0
Name: a, dtype: int64

编辑

关于普通字符串和Series上的count之间的语义差异，Python中str上的count只是计算字符数，而str.count则需要使用正则表达式模式。如果要搜索^和+这些特殊字符，则需要使用反斜杠进行转义。