在Pandas中计算元素数量

3
假设我有一个类似这样的Panda DataFrame:
import pandas as pd


a=pd.Series([{'Country'='Italy','Name'='Augustina','Gender'='Female','Number'=1}])
b=pd.Series([{'Country'='Italy','Name'='Piero','Gender'='Male','Number'=2}])
c=pd.Series([{'Country'='Italy','Name'='Carla','Gender'='Female','Number'=3}])
d=pd.Series([{'Country'='Italy','Name'='Roma','Gender'='Female','Number'=4}])
e=pd.Series([{'Country'='Greece','Name'='Sophia','Gender'='Female','Number'=5}])
f=pd.Series([{'Country'='Greece','Name'='Zeus','Gender'='Male','Number'=6}])

df=pd.DataFrame([a,b,c,d,e,f])

然后,我使用多重索引进行排序,如下所示:
df.set_index(['Country','Gender'],inplace=True)

现在,我想知道如何计算数据框中来自意大利的人数,或有多少希腊女性。
我尝试了:
df['Italy'].count()

and

 df['Greece']['Female'].count()

他们都不起作用,

谢谢。


2
我假设你的实际代码没有所有的语法错误,对吗? - DeepSpace
是的,没问题,我会编写代码,我的问题是我所询问的内容,但数据不同。我刚意识到 pd.Series 存在语法错误。 - MatMorPau22
1个回答

9
我认为您需要使用groupbysize进行聚合操作:
在 Pandas 中,size 和 count 有什么区别?
a=pd.DataFrame([{'Country':'Italy','Name':'Augustina','Gender':'Female','Number':1}])
b=pd.DataFrame([{'Country':'Italy','Name':'Piero','Gender':'Male','Number':2}])
c=pd.DataFrame([{'Country':'Italy','Name':'Carla','Gender':'Female','Number':3}])
d=pd.DataFrame([{'Country':'Italy','Name':'Roma','Gender':'Female','Number':4}])
e=pd.DataFrame([{'Country':'Greece','Name':'Sophia','Gender':'Female','Number':5}])
f=pd.DataFrame([{'Country':'Greece','Name':'Zeus','Gender':'Male','Number':6}])

df=pd.concat([a,b,c,d,e,f], ignore_index=True)
print (df)
  Country  Gender       Name  Number
0   Italy  Female  Augustina       1
1   Italy    Male      Piero       2
2   Italy  Female      Carla       3
3   Italy  Female       Roma       4
4  Greece  Female     Sophia       5
5  Greece    Male       Zeus       6

df = df.groupby('Country').size()
print (df)
Country
Greece    2
Italy     4
dtype: int64

df = df.groupby(['Country', 'Gender']).size()
print (df)
Country  Gender
Greece   Female    1
         Male      1
Italy    Female    3
         Male      1
dtype: int64

如果只需要使用MultiIndex通过xsslicers选择一些大小:
df.set_index(['Country','Gender'],inplace=True)
print (df)
                     Name  Number
Country Gender                   
Italy   Female  Augustina       1
        Male        Piero       2
        Female      Carla       3
        Female       Roma       4
Greece  Female     Sophia       5
        Male         Zeus       6

print (df.xs('Italy', level='Country'))
             Name  Number
Gender                   
Female  Augustina       1
Male        Piero       2
Female      Carla       3
Female       Roma       4

print (len(df.xs('Italy', level='Country').index))
4

print (df.xs(('Greece', 'Female'), level=('Country', 'Gender')))
                  Name  Number
Country Gender                
Greece  Female  Sophia       5

print (len(df.xs(('Greece', 'Female'), level=('Country', 'Gender')).index))
1

#KeyError: 'MultiIndex Slicing requires
#the index to be fully lexsorted tuple len (2), lexsort depth (0)'        
df.sort_index(inplace=True)
idx = pd.IndexSlice
print (df.loc[idx['Italy', :],:])
                     Name  Number
Country Gender                   
Italy   Female  Augustina       1
        Female      Carla       3
        Female       Roma       4
        Male        Piero       2

print (len(df.loc[idx['Italy', :],:].index))
4

print (df.loc[idx['Greece', 'Female'],:])
                  Name  Number
Country Gender                
Greece  Female  Sophia       5

print (len(df.loc[idx['Greece', 'Female'],:].index))
1

我添加了另一种解决方案,请检查。 - jezrael

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接