如何通过键名计算Python字典中所有值的总和和平均值?

4
我有一个`dict`对象,我想要计算特定命名的`values`的总和和平均值。
设置:
tree_str = """{
    'trees': [
        {
            'tree_idx': 0,
            'dimensions': (1120, 640),
            'branches': [
                'leaves': [
                    {'geometry': [[0.190673828125, 0.0859375], [0.74609375, 0.1181640625]]},
                    {'geometry': [[0.1171875, 0.1162109375], [0.8076171875, 0.15625]]}
                ],
                'leaves': [
                    {'geometry': [[0.2197265625, 0.1552734375], [0.7119140625, 0.1943359375]]},
                    {'geometry': [[0.2060546875, 0.1923828125], [0.730712890625, 0.23046875]]}
                ]
            ]
        }
    ]
}"""

tree_dict = yaml.load(tree_str, Loader=yaml.Loader)

地点:

# assume for the sake of coding
{'geometry': ((xmin, ymin), (xmax, ymax))}
# where dimensions are relative to an image of a tree

现在我有了`dict`对象,我该怎么做呢:
1. 获取所有叶子节点的数量? 2. 获取所有叶子节点的平均宽度和平均高度?
我可以通过以下方式访问值并遍历树结构:
tree_dict['trees'][0]['branches'][0]['leaves'][0]['geometry'][1][1]

所以我可以使用嵌套的for循环来完成这个任务。
leafcount = 0
leafwidth = 0
leafheight = 0
sumleafwidth = 0
sumleafheight = 0
avgleafwidth = 0
avgleafheight = 0

for tree in tree_dict['trees']:
    print("TREE")
    for branch in  tree['branches']:
        print("\tBRANCH")
        for leaf in branch['leaves']:
            leafcount += 1
            (lxmin, lymin), (lxmax, lymax) = leaf['geometry']
            leafwidth = lxmax - lxmin
            leafheight = lymax - lymin
            print("\t\tLEAF: x1({}), y1({}), x2({}), y2({})\n\t\t\tWIDTH: {}\n\t\t\tHEIGHT: {}".format(lxmin, lymin, lxmax, lymax, leafwidth, leafheight))
            sumleafwidth += lxmax - lxmin
            sumleafheight += lymax - lymin

avgleafwidth = sumleafwidth / leafcount
avgleafheight = sumleafheight / leafcount

print("LEAVES\n\tCOUNT: {}\n\tAVERAGE WIDTH: {}\n\tAVERAGE HEIGHT: {}".format(leafcount, avgleafwidth, avgleafheight))

但是有没有更好的方法呢?
# psuedo code
leafcount = count(tree_dict['trees'][*]['branches'][*]['leaves'][*])
leaves = (tree_dict['trees'][*]['branches'][*]['leaves'][*])
sumleafwidth = sum(leaves[*]['geometry'][1][*]-leaves[*]['geometry'][0][*])
sumleafheight = sum(leaves[*]['geometry'][*][1]-leaves[*]['geometry'][*][0])
avgleafwidth = sumleafwidth / leafcount
avgleafheight = sumleafheight / leafcount

这是适用于您伪代码第二行的内容:leaves = [leaf for tree in tree-dict['trees'] for branch in tree['branches'] for leaf in branch['leaves']];当然,leafcount = len(leaves) - undefined
这个问题被用作“已知良好”的审核项目,但实际上目前不适合,因为它提出了多个问题,其中一个是基于观点的。请编辑您的问题,只问一个问题,并确保它可以客观地回答。如果您有自己的解决方案,请将其作为“答案”发布,而不是添加到问题中。询问基于观点的问题,比如“有没有更好的方法”,是不适合的。 - undefined
3个回答

2
我认为,尽管Python字典在大多数情况下可以用作树形表示,但如果您想处理更复杂的与树相关的任务,正如上面提到的,它并不是最好的数据结构。
Python中有很多树形结构的实现,例如treelib
您可以像这样从字典转换为树形结构:
def dict_to_tree(data, parent_node=None, tree=None):
    if tree is None:
        tree = Tree()
    
    for key, value in data.items():
        if isinstance(value, dict):
            # Create a node for the key
            tree.create_node(tag=key, identifier=key, parent=parent_node)
            # Recursively call the function to process the sub-dictionary
            dict_to_tree(value, parent_node=key, tree=tree)
        else:
            # Create a node for the key and value
            tree.create_node(tag=f"{key}: {value}", identifier=key, parent=parent_node)

    return tree 

你应该能够在正确的数据结构上以更简单、更优雅的方式解决你的问题。

2
可能不是你期望的答案,但如果你对数据分析感到舒适,你可以使用pandas和numpy来重塑你的数据集。
# pip install pandas
import pandas as pd
import numpy as np

# Build trees
branches = pd.json_normalize(tree_dict['trees'], 'branches', 'tree_idx')
leaves = pd.json_normalize(branches.pop('leaves')).melt(var_name='branch_idx', value_name='geometry', ignore_index=False)
trees = leaves.merge(branches, left_on='branch_idx', right_index=True)

# Extract geometry
geom = np.concatenate(trees.pop('geometry').str['geometry'].values).reshape(4, -1)
geom = pd.DataFrame(geom, columns=['x1', 'y1', 'x2', 'y2'], index=leaves.index)
trees = pd.concat([trees, geom], axis=1).sort_index().reset_index(names='leaf_idx')

# Width and Height
trees['width'] = trees['x2'] - trees['x1']
trees['height'] = trees['y2'] - trees['y1']

输出将是:
>>> trees
   leaf_idx  branch_idx tree_idx        x1        y1        x2        y2     width    height
0         0           0        0  0.190674  0.085938  0.746094  0.118164  0.555420  0.032227
1         0           1        0  0.117188  0.116211  0.807617  0.156250  0.690430  0.040039
2         1           0        0  0.219727  0.155273  0.711914  0.194336  0.492188  0.039062
3         1           1        0  0.206055  0.192383  0.730713  0.230469  0.524658  0.038086

其他用途:
# Average width
>>> trees['width'].mean()
0.565673828125

# Average height
>>> trees['height'].mean()
0.037353515625

# How many trees?
>>> trees['tree_idx'].nunique()
1

# How many branches?
>>> trees['branch_idx'].nunique()
2

# How many leaves?
>>> len(trees)
4

你的回答很好,虽然对于我的需求来说可能有点过头了,而且我还没有完全理解它。 - undefined
1
@skeetastax。这就是我说如果你对数据分析感到舒适的原因 :-)。如果你的数据量很大,使用Pandas可以作为向量化计算的一个很好的替代方案。 - undefined

1
好的,这里有另一个答案;虽然不是绝对必要的,但我利用了numpy的向量化来同时求解叶子的宽度和高度的总和。
import numpy as np

leaves = [leaf for tree in tree_dict['trees'] for branch in tree['branches'] for leaf in branch['leaves']]
leafsums = sum(np.array(leaf['geometry'][1]) - np.array(leaf['geometry'][0]) for leaf in leaves)

print(f"LEAVES\n\tCOUNT: {len(leaves)}\n\tAVERAGE WIDTH: {leafsums[0]/len(leaves)}\n\tAVERAGE HEIGHT: {leafsums[1]/len(leaves)}")

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接