能否使用行索引和列名选择 pandas dataframe？

Question

能否使用行索引和列名选择 pandas dataframe？

9

对于没有意义的行索引的数据集，我发现按行号选择数据并同时使用列名更有用。我知道 .iloc 只接受行/列编号（整数），而 .loc 只接受名称。但是否有解决方法可以同时使用行号和列名进行组合呢？

例如，我想选择第2行和B列的单元格 - 我不一定知道第2行的行名称是5还是B列是第二列。那么引用该单元格的最佳方法是什么？

（行名称通常是更大数据集的过滤结果或随机样本）

- hurrikale

2

抱歉，您是想获取 df['B'].iloc[2] 吗？ - EdChum

在这种情况下，重新索引能否帮助你呢？ - MaxU - stand with Ukraine

你可以这样写 df[df.columns[1].iloc[2]，但问题在于使用整数作为列和行标签可能会变得模糊不清。 - EdChum

2

@EdChum 谢谢！我认为 df['B'].iloc[2] 可能是解决方案，但您是否知道这种链式索引是否存在与此处标记的相同问题（http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy）？ - hurrikale

对于没有意义的行索引数据集，为什么不使用 df.reset_index(drop=True, inplace=True) 将行索引丢弃呢？ - smci

显示剩余3条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

您可以使用更快的 iat 作为 iloc:

print df
    A  B
1   1  a
5   2  a
6   3  c
8   4  b
9   5  b
10  6  b

print df['B'].iat[2]
c

print df['B'].iloc[2]
c

时间:

In [266]: %timeit df['B'].iat[2]
The slowest run took 31.55 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 7.28 µs per loop

In [267]: %timeit df['B'].iloc[2]
The slowest run took 24.47 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 11.5 µs per loop