如何从Pandas数据帧的行中删除特殊字符

Question

如何从Pandas数据帧的行中删除特殊字符

6

我有一个Pandas数据框中的列，就像下面所示的一样：

LGA

Alpine (S)
Ararat (RC)
Ballarat (C)
Banyule (C)
Bass Coast (S)
Baw Baw (S)
Bayside (C)
Benalla (RC)
Boroondara (C)

我想要做的是，从每一行末尾删除所有特殊字符，例如（S），（RC）。

期望的输出应该是：

LGA

Alpine
Ararat
Ballarat
Banyule
Bass Coast
Baw Baw
Bayside
Benalla
Boroondara

我不太确定如何获得上述所需的输出。

任何帮助将不胜感激。

谢谢。

- adey27

3个回答

1

import pandas as pd
df = {'LGA': ['Alpine (S)', 'Ararat (RC)', 'Bass Coast (S)']  }
df = pd.DataFrame(df)
df[['LGA','throw away']] = df['LGA'].str.split('(',expand=True)

- Gerrit

如果它始终在（）之间，使用（作为分隔符来拆分列。 - Gerrit

评论Sibtain Reza的代码。只需要将x.split()更改为x.split('(')。 - Gerrit

无法在“Bass Coast”上运行。更改代码为 df['LGA'] = df['LGA'].apply(lambda x : x.split('(')[0])。 - Gerrit

1

你可以使用 Pandas 的 str.replace 进行替换。


…
dataf['LGA'] = dataf['LGA'].str.replace(r"\([^()]*\)", "", regex=True)

演示


import pandas as pd

dataf = pd.DataFrame({
"LGA":\
"""Alpine (S)
Ararat (RC)
Ballarat (C)
Banyule (C)
Bass Coast (S)
Baw Baw (S)
Bayside (C)
Benalla (RC)
Boroondara (C)""".split("\n")
})

output = dataf['LGA'].str.replace(r"\([^()]*\)", "", regex=True)

print(output)

0        Alpine 
1        Ararat 
2      Ballarat 
3       Banyule 
4    Bass Coast 
5       Baw Baw 
6       Bayside 
7       Benalla 
8    Boroondara 
Name: LGA, dtype: object

- Prayson W. Daniel

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gedas Miksenas · Accepted Answer

我有一种不同的使用正则表达式的方法。它将删除括号中的任何内容：

import re
import pandas as pd
df = {'LGA': ['Alpine (S)', 'Ararat (RC)', 'Bass Coast (S)']  }
df = pd.DataFrame(df)
df['LGA'] = [re.sub("[\(\[].*?[\)\]]", "", x).strip() for x in df['LGA']] # delete anything between brackets