使用Pandas查找多个工作表中的最小值

Question

使用Pandas查找多个工作表中的最小值

3

如何在多个工作表中找到每个索引的最小值？

假设：

  worksheet 1

    index    A   B   C
       0     2   3   4.28
       1     3   4   5.23
    worksheet 2

    index    A   B   C
        0    9   6   5.9
        1    1   3   4.1

    worksheet 3

    index    A   B   C
        0    9   6   6.0
        1    1   3   4.3
 ...................(Worksheet 4,Worksheet 5)...........
by comparing C column, I want an answer, where dataframe looks like

index      min(c)
    0       4.28
    1       4.1

- rick Sarkar

只有一个答案能被接受 ;) - jezrael

2个回答

3

你需要使用参数sheetname=None的read_excel函数来处理所有工作表名中的OrderedDict，然后结合reduce和numpy.fmin进行列表推导。

dfs = pd.read_excel('file.xlsx', sheetname=None)
print (dfs)
OrderedDict([('Sheet1',    A  B     C
0  2  3  4.28
1  3  4  5.23), ('Sheet2',    A  B    C
0  9  6  5.9
1  1  3  4.1), ('Sheet3',    A  B    C
0  9  6  6.0
1  1  3  4.3)])

from functools import reduce

df = reduce(np.fmin, [v['C'] for k,v in dfs.items()])
print (df)
0    4.28
1    4.10
Name: C, dtype: float64

使用concat解决方案：

df = pd.concat([v['C'] for k,v in dfs.items()],axis=1).min(axis=1)
print (df)
0    4.28
1    4.10
dtype: float64

如果需要在read_excel中定义索引：

dfs = pd.read_excel('file.xlsx', sheetname=None, index_col='index')
print (dfs)
OrderedDict([('Sheet1',        A  B     C
index            
0      2  3  4.28
1      3  4  5.23), ('Sheet2',        A  B    C
index           
0      9  6  5.9
1      1  3  4.1), ('Sheet3',        A  B    C
index           
0      9  6  6.0
1      1  3  4.3)])


df = pd.concat([v['C'] for k,v in dfs.items()], axis=1).min(axis=1)
print (df)
index
0    4.28
1    4.10
dtype: float64

- jezrael

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- piRSquared · Accepted Answer

from functools import reduce

reduce(np.fmin, [ws1.C, ws2.C, ws3.C])

index
0    4.28
1    4.10
Name: C, dtype: float64

这个概念可以很好地通过理解来推广

reduce(np.fmin, [w.C for w in [ws1, ws2, ws3, ws4, ws5]])

如果你坚持使用你的列名

from functools import reduce

reduce(np.fmin, [ws1.C, ws2.C, ws3.C]).to_frame('min(C)')

       min(C)
index        
0        4.28
1        4.10

您还可以在字典上使用pd.concat函数，并使用pd.Series.min函数和level=1参数。

pd.concat(dict(enumerate([w.C for w in [ws1, ws2, ws3]]))).min(level=1)
# equivalently
# pd.concat(dict(enumerate([w.C for w in [ws1, ws2, ws3]])), axis=1).min(1)

index
0    4.28
1    4.10
Name: C, dtype: float64

注意：

dict(enumerate([w.C for w in [ws1, ws2, ws3]]))

是另一种说法

{0: ws1.C, 1: ws2.C, 2: ws3.C}