import pandas as pd
import numpy as np
import random
labels = ["c1","c2","c3"]
c1 = ["one","one","one","two","two","three","three","three","three"]
c2 = [random.random() for i in range(len(c1))]
c3 = ["alpha","beta","gamma","alpha","gamma","alpha","beta","gamma","zeta"]
DF = pd.DataFrame(np.array([c1,c2,c3])).T
DF.columns = labels
数据框如下:
c1 c2 c3
0 one 0.440958516531 alpha
1 one 0.476439953723 beta
2 one 0.254235673552 gamma
3 two 0.882724336464 alpha
4 two 0.79817899139 gamma
5 three 0.677464637887 alpha
6 three 0.292927670096 beta
7 three 0.0971956881825 gamma
8 three 0.993934915508 zeta
我唯一能想到制作字典的方法是:
D_greek_value = {}
for greek in set(DF["c3"]):
D_c1_c2 = {}
for i in range(DF.shape[0]):
row = DF.iloc[i,:]
if row[2] == greek:
D_c1_c2[row[0]] = row[1]
D_greek_value[greek] = D_c1_c2
D_greek_value
生成的字典看起来像这样:
{'alpha': {'one': '0.67919712421',
'three': '0.67171020684',
'two': '0.571150669821'},
'beta': {'one': '0.895090207979', 'three': '0.489490074662'},
'gamma': {'one': '0.964777504708',
'three': '0.134397632659',
'two': '0.10302290374'},
'zeta': {'three': '0.0204226923557'}}
我不想假设c1会成块出现(“one”每次都在一起)。我正在处理几百MB的csv文件,感觉自己完全错了。如果您有任何想法,请帮忙!
groupby
非常快,但是Lambda可能会拖慢它的速度。不过我太懒了,不想去计时。 - Steven Rumbalski