>>> # a sequence of dictionaries in an interable called 'data'
>>> # assuming that not all dicts have the same keys
>>> pprint(data)
[,
,
,
,
]
>>> # get the unique keys across entire dataset
>>> keys = [list(dx.keys()) for dx in data]
>>> # flatten and coerce to 'set'
>>> keys =
>>> # create a map (look-up table) from each key
>>> # to a column in a NumPy array
>>> LuT = dict(enumerate(keys))
>>> LuT
>>> idx = list(LuT.values())
>>> # pre-allocate NUmPy array (100 rows is arbitrary)
>>> # number of columns is len(LuT.keys())
>>> D = NP.empty((100, len(LuT.keys())))
>>> keys = list(LuT.keys())
>>> keys
[0, 1, 2, 3]
>>> # now populate the array from the original data using LuT
>>> for i, row in enumerate(data):
D[i,:] = [ row.get(LuT[k], 0) for k in keys ]
>> D[:5,:]
array([[ 4.5 , 2. , 2.773, 7. ],
[ 4.44 , 2.576, 1.171, 0.081],
[ 0. , 3.173, 0.671, 0. ],
[ 3.978, 3.791, 0. , 0.242],
[ 3.602, 4.43 , 2.088, 0.323]])
比较最后5行D的结果与上面的
data,注意对于每一行(单个字典),其排序是保留的,即使键集不完整——换句话说,
D的第2列始终对应于以y2为键的值,等等,即使数据中给定的行没有为该键存储任何值;例如,查看data中的第三行,它只有两个键/值对,在D的第三行中,第一列和最后一列都是
0,这些列对应于键
x和
y2,实际上是两个缺失的键。