Pandas：多级列名

Question

Pandas：多级列名

52

pandas 支持多层列名：

>>>  x = pd.DataFrame({'instance':['first','first','first'],'foo':['a','b','c'],'bar':rand(3)})
>>> x = x.set_index(['instance','foo']).transpose()
>>> x.columns
MultiIndex
[(u'first', u'a'), (u'first', u'b'), (u'first', u'c')]
>>> x
instance     first                    
foo              a         b         c
bar       0.102885  0.937838  0.907467

由于它允许在列名称的第一级（在我的例子中是instance）区分实例，因此此功能非常有用，可以将同一数据帧的多个版本“水平”附加。

想象一下，我已经有了这样一个数据框：

                 a         b         c
bar       0.102885  0.937838  0.907467

是否有一种简便的方法来为列名添加另一个级别，类似于行索引的方法：

x['instance'] = 'first'
x.set_level('instance',append=True)

- LondonRob

3

我认为目前没有，但肯定应该有这个功能。我想在 GitHub 上有一个针对这个功能的请求。 - Andy Hayden

1

虽然它引发了一些有趣的问题，比如“当存在两个列命名级别时，如何选择特定列？” - LondonRob

2

x['first']，x[('first'，'a')]或x.xs（'a'，axis=1，level=1）？：s - Andy Hayden

3

这将在0.14版本中实现，并将促进有趣的MI选择：https://github.com/pydata/pandas/pull/6134 - Jeff

7个回答

22

不需要创建元组列表，使用 pd.MultiIndex.from_product(iterables)。

import pandas as pd
import numpy as np

df = pd.Series(np.random.rand(3), index=["a","b","c"]).to_frame().T
df.columns = pd.MultiIndex.from_product([["new_label"], df.columns])

生成的数据框：

  new_label                    
          a         b         c
0   0.25999  0.337535  0.333568

2014年1月25日的拉取请求

- Ian Zurutuza

2

这比我5年前选择的答案看起来更好！我假设它是新的，但我将选择它作为“被选答案”。如果它是错误或不好的，请在评论中告诉我。 - LondonRob

2

为什么我会收到 AttributeError: module 'pandas' has no attribute 'Multiindex' 的错误信息..? - haneulkim

1

@Ambleu 我也遇到了同样的错误，应该写成 MultiIndex 而非 Multiindex (第二个 I 是大写)。 - Maria

20

你可以使用concat。给它一个数据框的字典，其中键是你想要添加的新列级别。

In [46]: d = {}

In [47]: d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[10, 0.89, 0.98, 0.31],
                                               [20, 0.34, 0.78, 0.34]]).set_index('idx')

In [48]: pd.concat(d, axis=1)
Out[48]:
    first_level
              a     b     c
idx
10         0.89  0.98  0.31
20         0.34  0.78  0.34

你可以使用相同的技术创建多个级别。

In [49]: d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[10, 0.29, 0.63, 0.99],
                                                [20, 0.23, 0.26, 0.98]]).set_index('idx')

In [50]: pd.concat(d, axis=1)
Out[50]:
    first_level             second_level
              a     b     c            a     b     c
idx
10         0.89  0.98  0.31         0.29  0.63  0.99
20         0.34  0.78  0.34         0.23  0.26  0.98

- Carl

simple and elegant! - erickfis

7

很多这样的解决方案似乎比它们需要的要复杂一点。

当速度不是绝对必要时，我更喜欢使事物看起来尽可能简单和直观。我认为这个解决方案达到了这个目的。在早期版本的pandas中进行了测试，例如0.22.0。

只需创建一个DataFrame（忽略第一步中的列），然后将列设置为您的n维列名称列表即可。

In [1]: import pandas as pd                                                                                                                                                                                          

In [2]: df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2]])                                                                                                                                                              

In [3]: df                                                                                                                                                                                                           
Out[3]: 
   0  1  2  3
0  1  1  1  1
1  2  2  2  2

In [4]: df.columns = [['a', 'c', 'e', 'g'], ['b', 'd', 'f', 'h']]                                                                                                                                                    

In [5]: df                                                                                                                                                                                                           
Out[5]: 
   a  c  e  g
   b  d  f  h
0  1  1  1  1
1  2  2  2  2

- Keith

5

x = [('G1','a'),("G1",'b'),("G2",'a'),('G2','b')]
y = [('K1','l'),("K1",'m'),("K2",'l'),('K2','m'),("K3",'l'),('K3','m')]
row_list = pd.MultiIndex.from_tuples(x)
col_list = pd.MultiIndex.from_tuples(y)

A = pd.DataFrame(np.random.randint(2,5,(4,6)), row_list,col_list)
A

这是创建多级列和行的最简单易用的方法。

- Raj_Ame09

1

这里有一个函数可以帮助您更通用地创建元组，可供pd.MultiIndex.from_tuples()使用。从@user3377361那里得到了灵感。

def create_tuple_for_for_columns(df_a, multi_level_col):
    """
    Create a columns tuple that can be pandas MultiIndex to create multi level column

    :param df_a: pandas dataframe containing the columns that must form the first level of the multi index
    :param multi_level_col: name of second level column
    :return: tuple containing (second_level_col, firs_level_cols)
    """
    temp_columns = []
    for item in df_a.columns:
        temp_columns.append((multi_level_col, item))
    return temp_columns

这可以这样使用：

df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
columns = create_tuple_for_for_columns(df, 'c')
df.columns = pd.MultiIndex.from_tuples(columns)

- Charl

0

通过Carl改进pd.concat方法，如果每次迭代只获取一行呢？这不是一个优化的方法，但你可以像这样做：

# initial
ds = []

# first iteration (can be inside function)
d = {}
d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[10, 0.89, 0.98, 0.31]]).set_index('idx')
d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[10, 0.29, 0.63, 0.99]]).set_index('idx')
ds.append(pd.concat(d, axis=1))

# display(ds[0])

# second iteration (can be inside function)
d = {}
d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[20, 0.34, 0.78, 0.34]]).set_index('idx')
d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[20, 0.23, 0.26, 0.98]]).set_index('idx')
ds.append(pd.concat(d, axis=1))

# display(ds[1])

# final concat
pd.concat(ds, axis=0)

结果：

	first_level			second_level
idx	a	b	c	a	b	c
10	0.89	0.98	0.31	0.29	0.63	0.99

	first_level			second_level
idx	a	b	c	a	b	c
20	0.34	0.78	0.34	0.23	0.26	0.98

	first_level			second_level
idx	a	b	c	a	b	c
10	0.89	0.98	0.31	0.29	0.63	0.99
20	0.34	0.78	0.34	0.23	0.26	0.98

- Muhammad Yasirroni

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user3377361 · Accepted Answer

试一下这个：

df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

columns=[('c','a'),('c','b')]

df.columns=pd.MultiIndex.from_tuples(columns)