如何从pandas数据框中删除方括号

Question

如何从pandas数据框中删除方括号

31

我在对Pandas数据框中的某一列使用str.findall()后，得到一个类似于list的值，其中使用了方括号。如何去掉这些方括号？

print df

id     value                 
1      [63]        
2      [65]       
3      [64]        
4      [53]       
5      [13]      
6      [34]

- DougKruger

2

那一列的内容是什么？这是一个字符串 '[63]' 还是一个列表 [63]？ - EdChum

5个回答

8

如果是字符串，我们也可以使用 string.replace 方法。

import pandas as pd

df =pd.DataFrame({'value':['[63]','[65]','[64]']})

print(df)
  value
0  [63]
1  [65]
2  [64]

df['value'] =  df['value'].apply(lambda x: x.replace('[','').replace(']','')) 

#convert the string columns to int
df['value'] = df['value'].astype(int)

#output
print(df)

   value
0     63
1     65
2     64

print(df.dtypes)
value    int32
dtype: object

- qaiser

0

一个从数据框的字符串列中移除 [ 和 ] 字符的通用解决方案是：

df['value'] = df['value'].str.replace(r'[][]', '', regex=True)  # one by one
df['value'] = df['value'].str.replace(r'[][]+', '', regex=True) # by chunks of one or more [ or ] chars

[][] 是正则表达式中的字符类，用于匹配 ] 或 [ 字符。使用 + 可以让正则引擎按顺序匹配这些字符一次或多次。

请参见正则表达式演示。

然而，在这种情况下，方括号标记了 Series.str.findall 的字符串列表结果。很明显，您想要从列值中提取一个第一个匹配项。

当您需要第一个匹配项时，请使用 Series.str.extract
当您需要所有匹配项时，请使用 Series.str.findall

所以，在这种情况下，为了避免你自己陷入麻烦，你可以使用

df['value'] = df['source_column'].str.extract(r'my regex with one set of (parentheses)')

请注意，str.extract 至少需要一个捕获括号才能实际工作并返回值（str.findall 即使没有捕获组也可以工作）。

请注意，如果您使用 findall 获得多个匹配项，并且希望将其作为单个字符串输出，则可以使用 str.join 连接这些匹配项：

df['value'] = df['source_column'].str.findall(pattern).str.join(', ')

- Wiktor Stribiżew

0

jezrael给出的答案在一个列表中有多个成员的情况下是不起作用的。在这种情况下，可以使用replace方法。

df['column_name'] =  df['column_name'].apply(lambda x: x.replace('[','').replace(']',''))

如果列表中的成员不是整数，你需要将它们转换成整数。

df['column_name'] = df['column_name'].astype(int)

- Sumit Pokhrel

0

"values in column values have type list"这个解决方案由jezrael提出，如果一个列表中有多个成员，它将无法工作。你可以使用qaiser和sumit提出的"lambda"解决方案。但在应用该方法之前，将其转换为"str"。完整代码如下：

import pandas as pd
df = pd.DataFrame({'value':[[70,63],[12,65],[64,39]]}).astype(str) #list converted into string, so we can use str.replace
df=df['value'].apply(lambda x: x.replace("[","").replace("]",""))

输出：

0    70, 63
1    12, 65
2    64, 39
Name: value, dtype: object

- fidibsp

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

如果 value 列中的值是列表类型，请使用：

df['value'] = df['value'].str[0]

或者：

df['value'] = df['value'].str.get(0)

Docs.

Sample:

df = pd.DataFrame({'value':[[63],[65],[64]]})
print (df)
  value
0  [63]
1  [65]
2  [64]

#check type if index 0 exist
print (type(df.loc[0, 'value']))
<class 'list'>

#check type generally, index can be `DatetimeIndex`, `FloatIndex`...
print (type(df.loc[df.index[0], 'value']))
<class 'list'>

df['value'] = df['value'].str.get(0)
print (df)
   value
0     63
1     65
2     64

如果 strings 使用 str.strip 并转换为数值类型使用 astype：

df['value'] = df['value'].str.strip('[]').astype(int)

示例：

df = pd.DataFrame({'value':['[63]','[65]','[64]']})
print (df)
  value
0  [63]
1  [65]
2  [64]

#check type if index 0 exist
print (type(df.loc[0, 'value']))
<class 'str'>

#check type generally, index can be `DatetimeIndex`, `FloatIndex`...
print (type(df.loc[df.index[0], 'value']))
<class 'str'>


df['value'] = df['value'].str.strip('[]').astype(int)
print (df)
  value
0    63
1    65
2    64