如何在数据框中去除换行符

Question

如何在数据框中去除换行符

pythonpandasreplacecarriage-returndata-cleaning

12

我有一个数据框，其中包含名为id、country_name、location和total_deaths的列。在进行数据清洗过程时，我发现一行中有一个值附加了'\r' 。完成清理过程后，我将生成的数据框存储在destination.csv文件中。由于以上特定行附加了\r，它总是创建一个新行。

id                               29
location            Uttar Pradesh\r
country_name                  India
total_deaths                     20

我想要删除 \r。我尝试了 df.replace({'\r': ''}, regex=True)，但它对我没有起作用。

是否有其他解决方案？能否有人帮忙吗？

编辑：

在上述过程中，我正在迭代 df 来查看是否存在 \r。如果存在，则需要进行替换。这里的 row.replace() 或 row.str.strip() 似乎无法正常工作，或者我可能使用方法不正确。

我不想在使用 replace() 时指定列名或行号。因为我无法确定仅“位置”列将具有 \r。请找到下面的代码。

count = 0
for row_index, row in df.iterrows():
    if re.search(r"\\r", str(row)):
        print type(row)               #Return type is pandas.Series
        row.replace({r'\\r': ''} , regex=True)
        print row
        count += 1

- Saranya

1

而且 df.replace({r'\\r': ''}, regex=True) 也不起作用吗？为什么要使用 iterrows()？我认为这是不必要的，因为迭代非常慢。 - jezrael

我没有其他方法可以迭代df。df.replace({r'\\r': ''}, regex=True)不起作用。 - Saranya

5个回答

3

使用 str.replace 函数时，需要转义序列，以便将其视为换行符而不是字面值 \r：

In [15]:
df['29'] = df['29'].str.replace(r'\\r','')
df

Out[15]:
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

- EdChum

3

下面的代码可以去掉 \n 制表符、\n 换行符和 \r 回车符，非常适合将数据压缩成一行。这个答案来自于 https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a。

df.replace(to_replace=[r"\\t|\\n|\\r", "\t|\n|\r"], value=["",""], regex=True, inplace=<INPLACE>)

- Gwen Au

1

只需将 df 等于 df.replace 代码行，然后打印 df。

df=df.replace({'\r': ''}, regex=True) 
print(df)

- user13078533

2

那个答案已经存在，一字不差。我建议你删除它，以避免在答案空间中出现已经存在的答案。出于对未来读者和已经发布答案的用户的尊重。 - Nicolas Gervais

1

不知何故，被接受的答案对我没有用。最终，我通过以下方式找到了解决方案

df["29"] = df["29"].replace(r'\r', '', regex=True)

区别在于我使用的是\r而不是\\r。

- Yusril Maulidan Raji

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

另一种解决方案是使用 str.strip：

df['29'] = df['29'].str.strip(r'\\r')
print df
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

如果你想使用replace，需要加上r和一个\：

print df.replace({r'\\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

在replace中，您可以定义要替换的列，例如：

print df
               id               29
0        location  Uttar Pradesh\r
1    country_name            India
2  total_deaths\r               20

print df.replace({'29': {r'\\r': ''}}, regex=True)
               id             29
0        location  Uttar Pradesh
1    country_name          India
2  total_deaths\r             20

print df.replace({r'\\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

评论编辑：

import pandas as pd

df = pd.read_csv('data_source_test.csv')
print df
   id country_name           location  total_deaths
0   1        India          New Delhi           354
1   2        India         Tamil Nadu            48
2   3        India          Karnataka             0
3   4        India      Andra Pradesh            32
4   5        India              Assam           679
5   6        India             Kerala           128
6   7        India             Punjab             0
7   8        India      Mumbai, Thane             1
8   9        India  Uttar Pradesh\r\n            20
9  10        India             Orissa            69

print df.replace({r'\r\n': ''}, regex=True)
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

如果只需要替换“location”列中的内容：

df['location'] = df.location.str.replace(r'\r\n', '')
print df
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69