如何使用Python pandas将特定列扩展为行

4
我有一个两列的表格,其中存在多对多的关系。
例如:
Animal     Food
rabbit     grass
rabbit     carrots
rabbit     cabbage
dog        carrots
horse      grass
horse      hay

我想要类似这样的东西:
Animal     Food1   Food2     Food3 
rabbit     grass   carrots   cabbage  
dog        carrots
horse      grass   hay

或者类似这样:
Animal     Grass     Carrots    Cabbage     Hay
rabbit      True      True       True       False
dog         False     True       False      False
horse       True      False      False      True

英译中:

我尝试了数据透视表、melt和stack,但仍然无法弄清楚如何做到这一点。任何帮助将不胜感激。谢谢!!


哇哦...有这么多种方法。我喜欢熊猫! - Scott Boston
4个回答

4

选项1
使用groupbysize

df.groupby(['Animal', 'Food']).size().unstack(fill_value=0).astype(bool)

Food    cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

s = df.groupby('Animal').Food.apply(list)
pd.DataFrame(s.values.tolist(), s.index).add_prefix('Food').fillna('')

          Food0    Food1    Food2
Animal                           
dog     carrots                  
horse     grass      hay         
rabbit    grass  carrots  cabbage

选项2
groupbyvalue_counts

df.groupby('Animal').Food.value_counts().unstack(fill_value=0).astype(bool)

Food    cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

选项3
groupbystr.get_dummies
df.groupby('Animal').Food.apply('|'.join).str.get_dummies().astype(bool)

        cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

选项4 使用numpy.bincount的pandas.factorize
f1, u1 = pd.factorize(df.Animal.values)
f2, u2 = pd.factorize(df.Food.values)

n = u1.size
m = u2.size

b = np.bincount(f1 * m + f2, minlength=n * m).reshape(n, m)

pd.DataFrame(b.astype(bool), u1, u2)

        grass  carrots  cabbage    hay
rabbit   True     True     True  False
dog     False     True    False  False
horse    True    False    False   True

选项5
感到无聊...所以提出了更多的想法

f, u = pd.factorize(df.Animal.values)
n = u.size

a = [[] for _ in range(n)]
[a[i].append(food) for i, food in zip(f, df.Food)];
pd.DataFrame(a, u).rename(columns=lambda x: x+1).add_prefix('Food').fillna('')

          Food1    Food2    Food3
rabbit    grass  carrots  cabbage
dog     carrots                  
horse     grass      hay         

4

您可以使用:

df = pd.pivot(index=df['Animal'], 
              columns=df.groupby('Animal').cumcount().add(1).astype(str), 
              values=df['Food'])
       .add_prefix('Food').fillna('')
print (df)
          Food1    Food2    Food3
Animal                           
dog     carrots                  
horse     grass      hay         
rabbit    grass  carrots  cabbage

或者:

df = pd.crosstab(df['Animal'], df['Food']).astype(bool)
print (df)
Food    cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

4
df_out = df.set_index(['Animal','Food']).assign(Value=True).unstack().fillna(False)
df_out.columns = df_out.columns.droplevel()

输出:

Food    cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

3
你可以使用交叉表。
pd.crosstab(df.Animal, df.Food).astype(bool)


Food    cabbage carrots grass   hay
Animal              
dog     False   True    False   False
horse   False   False   True    True
rabbit  True    True    True    False

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接