用顶行替换标题

Question

用顶行替换标题

195

我目前有一个数据框，看起来像这样：

       Unnamed: 1    Unnamed: 2   Unnamed: 3  Unnamed: 4
0   Sample Number  Group Number  Sample Name  Group Name
1             1.0           1.0          s_1         g_1
2             2.0           1.0          s_2         g_1
3             3.0           1.0          s_3         g_1
4             4.0           2.0          s_4         g_2

我正在寻找一种方法来删除表头行并将第一行作为新的表头行，使得新的数据框看起来像这样:

    Sample Number  Group Number  Sample Name  Group Name
0             1.0           1.0          s_1         g_1
1             2.0           1.0          s_2         g_1
2             3.0           1.0          s_3         g_1
3             4.0           2.0          s_4         g_2

我尝试过类似于 if 'Unnamed' in df.columns: 的方法，然后创建一个没有表头的 dataframe

df.to_csv(newformat, header=False, index=False)

但是我似乎一直没有取得进展。

- Jeremy G

13个回答

102

只需执行以下操作即可更改数据框：

df.columns = df.iloc[0]
df = df[1:]

需要翻译的内容：

Then

df.to_csv(path, index=False)

应该能解决问题。

- JoeCondron

9

这是一个更好的答案，因为它没有多余的代码（new_header）。 - Ad Infinitum

73

如果你想要一行代码，你可以这样做：

df.rename(columns=df.iloc[0]).drop(df.index[0])

- ostrokach

4

如果您不希望索引缺失，请将其更改为df.rename(columns=df.iloc[0]).drop(df.index[0]).reset_index(drop=True)。 - z33k

在将其变成两行代码后，它对我起作用了： df.rename(columns=df.iloc[0, :], inplace=True) df.drop(df.index[0], inplace=True) - Marc Steffen

19

另一种使用 Python 交换值的单行代码：

df, df.columns = df[1:] , df.iloc[0]

这不会重置索引

尽管相反的操作不能像预期的那样工作 df.columns， df = df.iloc [0]， df [1:]

- ijoel92

赋值是如何工作的？是先给df赋值还是先给df.columns赋值？ - Aman Bagrecha

1

这个答案解释得非常好 https://dev59.com/hVsX5IYBdhLWcg3wJcuW#34171485 - ijoel92

10

@ostrokach的回答是最好的。最好在任何引用数据帧时都保留它，因此将受益于inplace = True。
df.rename(columns=df.iloc[0], inplace=True) df.drop([0], inplace=True)

- GoPackGo

7

下面是一个简单的技巧，可以在原地定义列索引。因为set_index会改变行索引，我们可以通过转置数据框、设置索引和再次转置来对列进行相同的操作：

df = df.T.set_index(0).T

请注意，如果你的行已经有不同的索引，你可能需要在 set_index(0) 中更改 0。

- Alex P. Miller

最佳解决方案。优雅。 - Ayan Mitra

6

另外，我们可以使用pandas读取文件时来完成这个任务。

在这种情况下，我们可以使用以下代码：

pd.read_csv('file_path',skiprows=1)

读取文件时，这将跳过第一行，并将第二行设置为文件的列。

- Ransaka Ravihara

这并没有解决问题。第二行中的值不应该是标题值。事实上，这基本上是解决方案所应该反过来的。如果skiprows=-1会导致第一行被用作标题，那就是解决方案了。已接受的解决方案达到了目标。 - Anthony

2

因为某些原因，我不得不以这种方式处理：

df.columns = [*df.iloc[0]]
df = table[1:]

我将列表拆分成列表的部分看起来有些冗余，但否则标题仍然会出现在实际表格中。

- Moritz Gruenwald

你刚刚错过了表不存在的问题，正确的表名应该是“df”。 - Eduardo Rauchbach

1

另一种方法来做这件事。


df.columns = df.iloc[0]
df = df.reindex(df.index.drop(0)).reset_index(drop=True)
df.columns.name = None

    Sample Number  Group Number  Sample Name  Group Name
0             1.0           1.0          s_1         g_1
1             2.0           1.0          s_2         g_1
2             3.0           1.0          s_3         g_1
3             4.0           2.0          s_4         g_2

如果你喜欢它，请点击向上箭头。谢谢。

- rra

0

这似乎是一个可能需要多次的任务。我采用了rgalbo的答案，并编写了一个简单的函数，可以轻松地提取并放置到任何项目中。

def promote_df_headers(df):
    '''
    Takes a df and uses the first row as the header

    Parameters
    ----------
    df : DataFrame
        Any df with one or more columns.

    Returns
    -------
    df : DataFrame
        Input df with the first row removed and used as the column names.

    '''

    new_header = df.iloc[0] 
    df = df[1:] 
    df.columns = new_header
    df = df.reset_index(drop=True)

    return df

- Matt_Haythornthwaite

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- rgalbo · Accepted Answer

342

new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header #set the header row as the df header

- rgalbo

3

当我这样做时，0索引也成为标题的一部分。有没有办法从我的标题行中删除0索引？ - Pete

@Pete 你从 df.columns 得到的输出是什么？ - rgalbo

1

@Pete，请将 new_header = df.iloc[0] 更改为 new_header = df.iloc[0].tolist()。这将删除标题中的索引0。 - jb12n

@jb12n，我只想说你说得一点没错。而且我要补充的是，你可以通过使用df.columns = df.iloc[0].tolist()直接设置列。据我所理解，iloc生成了一个Series，其名称为索引位置。通过打印new_header的类型来进行验证。当你创建一个列表时，会丢失Series的名称。我觉得有趣的是，文档似乎忽略了这一点（至少我找不到任何相关内容）。真高兴今天在这个讨论中遇到了这个问题。 - wiseass