对于一个字符串,下面的代码可以去除 Unicode 字符和换行符/回车符:
t = "We've\xe5\xcabeen invited to attend TEDxTeen, an independently organized TED event focused on encouraging youth to find \x89\xdb\xcfsimply irresistible\x89\xdb\x9d solutions to the complex issues we face every day.,"
t2 = t.decode('unicode_escape').encode('ascii', 'ignore').strip()
import sys
sys.stdout.write(t2.strip('\n\r'))
但是当我尝试在pandas中编写一个函数以将其应用于列的每个单元格时,由于属性错误而失败,或者我会收到警告,即正在尝试在DataFrame的切片副本上设置值。
def clean_text(row):
row= row["text"].decode('unicode_escape').encode('ascii', 'ignore')#.strip()
import sys
sys.stdout.write(row.strip('\n\r'))
return row
适用于我的数据框:
df["text"] = df.apply(clean_text, axis=1)
如何将此代码应用于Series的每个元素?