在pandas中使用多级索引设置值

Question

在pandas中使用多级索引设置值

5

已经有几个与此相关的SO问题，尤其是这个，但是没有一个答案对我有效，而且很多文档链接（特别是关于lexsorting的）都已经失效了，所以我会再问一个问题。

我试图做一些（看起来）非常简单的事情。考虑以下MultiIndexed数据框：

import pandas as pd; import random
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
      ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.concat([pd.Series(np.random.randn(8), index=index), pd.Series(np.random.randn(8), index=index)], axis=1)

现在我想将类别为“one”的所有观测值中列0的值设置为某个值（比如np.NaN）。我尝试了以下方法但失败了：

df.loc(axis=0)[:, "one"][0] = 1 # setting with copy warning

并且

df.loc(axis=0)[:, "one", 0] = 1

要解决这个问题，需要注意键的长度是否超过了索引的长度，或者缺少足够深度的词法排序。正确的做法是什么呢？

- Nils Gudat

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

我认为您可以使用元组来选择具有MultiIndex的数据框，并使用0来选择列：loc

import pandas as pd; 
import random
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
      ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

#add for testing
np.random.seed(0)
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.concat([pd.Series(np.random.randn(8), index=index), pd.Series(np.random.randn(8), index=index)], axis=1)

print df
                     0         1
first second                    
bar   one     1.764052 -0.103219
      two     0.400157  0.410599
baz   one     0.978738  0.144044
      two     2.240893  1.454274
foo   one     1.867558  0.761038
      two    -0.977278  0.121675
qux   one     0.950088  0.443863
      two    -0.151357  0.333674

df.loc[('bar', "one"), 0] = 1
print df
                     0         1
first second                    
bar   one     1.000000 -0.103219
      two     0.400157  0.410599
baz   one     0.978738  0.144044
      two     2.240893  1.454274
foo   one     1.867558  0.761038
      two    -0.977278  0.121675
qux   one     0.950088  0.443863
      two    -0.151357  0.333674

如果您需要将所有二级行的值设置为one，请使用slice(None)：

df.loc[(slice(None), "one"), 0] = 1
print df
                     0         1
first second                    
bar   one     1.000000 -0.103219
      two     0.400157  0.410599
baz   one     1.000000  0.144044
      two     2.240893  1.454274
foo   one     1.000000  0.761038
      two    -0.977278  0.121675
qux   one     1.000000  0.443863
      two    -0.151357  0.333674

Docs.