如何实现高级的Python哈希自动初始化？

Question

如何实现高级的Python哈希自动初始化？

11

这个问题是关于在Python中实现完整的Perl自动创建数据结构的。我知道之前有类似的问题，并且到目前为止最好的答案在“如何在Python中实现嵌套字典的最佳方式?”中。然而，我希望能够做到以下几点：

a['x']['y'].append('z')

不需要先声明`a['x']['y'] = []`，或者说，也不需要声明`a['x'] = {}`。（注意，在Perl中，你可以这样做：`push @{$a->{x}{y}}, 'z';`。）

我知道`dict`和`list`类有点不搭，所以这很困难，但我很想看看是否有人有一个巧妙的解决方案，可能涉及从`dict`创建一个继承类，然后在其中定义一个新的`append`方法？

我也知道这可能会让一些Python纯粹主义者感到不适，他们会要求我坚持使用Perl。但是，即使只是为了挑战，我也想看到一些东西。

- Zhang18

5个回答

2

也许这可以满足您在字典中需要任意数量“维度”的需求：

a= collections.defaultdict(list)

你需要改变的唯一代码是:

a['x', 'y'].append('z')

当然，你想要的这个解决方案取决于两个条件：

您是否需要轻松访问所有包含“x”的“第一维度”列表
您是否被Perl神奇的方式所吸引 :)

如果这两个条件中有任何一个成立，我的解决方案将无法帮助您。

- tzot

是的，对于第一点是肯定的。对于第二点，也是遗憾的是我必须接受这样一个事实，在这种情况下，Python没有Perl的类似功能 - 有点令人失望。 - Zhang18

2

想象一下你正在约会，不知道，妮可·基德曼。你告诉她，有点失望的是她没有像莫妮卡·贝鲁奇那样的曲线。你能想象这样的声明缺乏多么无用(和优雅！)？无论如何，撇开寓言不谈，我认为你已经表达了你的观点，世界终于可以松口气了。 - tzot

2

由于我们事先不知道是否需要字典或列表，因此您不能将自动初始化与列表结合使用。除非，根据链接问题中Nosklo的答案，将列表“特性”添加到底层字典中。基本上假设键具有“排序”顺序，并始终将其与列表方法一起使用。我已经做了一个示例：

class AutoVivification(dict):
    """Implementation of perl's autovivification feature. Has features from both dicts and lists,
    dynamically generates new subitems as needed, and allows for working (somewhat) as a basic type.
    """
    def __getitem__(self, item):
        if isinstance(item, slice):
            d = AutoVivification()
            items = sorted(self.iteritems(), reverse=True)
            k,v = items.pop(0)
            while 1:
                if (item.start < k < item.stop):
                    d[k] = v
                elif k > item.stop:
                    break
                if item.step:
                    for x in range(item.step):
                        k,v = items.pop(0)
                else:
                    k,v = items.pop(0)
            return d
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

    def __add__(self, other):
        """If attempting addition, use our length as the 'value'."""
        return len(self) + other

    def __radd__(self, other):
        """If the other type does not support addition with us, this addition method will be tried."""
        return len(self) + other

    def append(self, item):
        """Add the item to the dict, giving it a higher integer key than any currently in use."""
        largestKey = sorted(self.keys())[-1]
        if isinstance(largestKey, str):
            self.__setitem__(0, item)
        elif isinstance(largestKey, int):
            self.__setitem__(largestKey+1, item)

    def count(self, item):
        """Count the number of keys with the specified item."""
        return sum([1 for x in self.items() if x == item])

    def __eq__(self, other):
        """od.__eq__(y) <==> od==y. Comparison to another AV is order-sensitive
        while comparison to a regular mapping is order-insensitive. """
        if isinstance(other, AutoVivification):
            return len(self)==len(other) and self.items() == other.items()
        return dict.__eq__(self, other)

    def __ne__(self, other):
        """od.__ne__(y) <==> od!=y"""
        return not self == other

这遵循了动态生成自身的基本自动嵌套特性，以处理不存在的键。但是它还实现了一些此处列出的方法。这使它可以像准列表/字典一样运作。

针对其余列表功能，加入列出的方法。我把它当作一个带有列表方法的字典。如果调用列表方法，则会假设所持项目的顺序，即字符串小于整数，并且键始终按“排序”顺序排列。

它还支持添加，这是这些方法的一个示例。这来自于我的使用案例。我需要从AutoVivified字典中添加项目，但如果它不存在，则创建并返回一个新的AutoVivification对象。它们没有整数“值”，因此您无法执行以下操作：

rp = AutoVivification()
rp['a']['b'] = 3
rp['a']['b'] + rp['q']

这样会失去意义，因为我不知道是否会有某个值存在，但我仍需设置一个默认值。因此，我添加了__add__和__radd__方法。它们使用基础字典的length作为整数值，因此新创建的AV对象在加法运算中的值为零。如果键中除了AV对象之外还有其他类型的数据，则我们调用那个数据类型的加法方法（如果已实现）。

- Spencer Rathbun

2

仅在Ignacio的回答基础上进行扩展，提供一些额外选项，以便您可以明确要求Python字典具有更多神奇行为。以这种方式编写的代码的可维护性仍然存在疑问，但我想让它非常清楚，这是一个“这种数据结构可维护吗？”的问题（我有疑虑），而不是“Python能否表现出这种行为？”（当然可以）。

要支持命名空间方面的任意嵌套级别，您只需要给函数命名（而不是使用lambda）并使其自引用即可：

>>> from collections import defaultdict
>>> def autodict(): return defaultdict(autodict)
...
>>> a = autodict()
>>> a[1][2][3] = []
>>> type(a[1])
<class 'collections.defaultdict'>
>>> type(a[1][2])
<class 'collections.defaultdict'>
>>> type(a[1][2][3])
<class 'list'>
>>> a[1][2][3]
[]

然而，这会引入一个“问题”，即您必须显式地设置一个列表，然后才能将其附加到其中。Python的解决方案在于setdefault方法，实际上它已经存在比collections.defaultdict更长的时间：

>>> a.setdefault(3, []).append(10)
>>> a.setdefault(3, []).append(11)
>>> a[3]
[10, 11]
>>> a[2].setdefault(3, []).append(12)
>>> a[2].setdefault(3, []).append(13)
>>> a[2][3]
[12, 13]
>>> a[1][2].setdefault(3, []).append(14)
>>> a[1][2].setdefault(3, []).append(15)
>>> a[1][2][3]
[14, 15]

collections.defaultdict的真正作用就是让你在每次调用dict.setdefault时都传递相同的第二个参数变得更加容易。对于像这样更复杂的情况，你仍然可以直接使用dict.setdefault来处理collections.defaultdict无法处理的方面。

- ncoghlan

0

这是我的版本，我称之为ArrayHash。它的行为类似于数组，除非您使用字符串作为键，然后它开始像哈希表一样运作。它支持初始化、使用+和+=进行连接，以及在左侧和右侧进行切片。在数组模式下，如果您引用或分配给arrayhash[N]，它将使用None值从0..N-1填充。在哈希模式下，它停止执行此操作。它回答isinstance(arrayhash, dict)为True，并且即使在哈希模式下，isinstance(arrayhash, collections.abc.Sequence)也会返回True。我很想听听您的反馈！

from collections import defaultdict
import collections.abc

class _ArrayHash(defaultdict, collections.abc.Sequence):
    def append(self, value):
        self[len(self)] = value

    def extend(self, lst):
        ln = len(self)
        for item in lst:
            self[ln] = item
            ln += 1

    def update(self, values):
        self.isHash = True
        for k, v in values.items():
            self[k] = v

    def __getitem__(self, index):
        if isinstance(index, slice):
            return [self[i] for i in range(*index.indices(len(self)))]
        elif isinstance(index, int):
            if index < 0:
                index += len(self)
            return super().__getitem__(index)
        else:
            self.isHash = True
            return super().__getitem__(index)

    def __setitem__(self, index, value):
        if isinstance(index, slice):
            try:
                if not hasattr(self, 'isHash'):
                    for i in range(len(self), index.start):
                        super().__setitem__(i, None)
                value = iter(value)
                ndx = index.start
                for i in range(*index.indices(len(self))):
                    super().__setitem__(i, next(value))
                    ndx += 1
                rest = list(value)
                lr = len(rest)
                if lr:
                    for i in range(len(self)-1,ndx-1,-1):  # Move everything else up
                        super().__setitem__(i+lr, super().__getitem__(i))
                for i in range(lr):
                    super().__setitem__(i+ndx, rest[i])
            except StopIteration:
                pass
        elif isinstance(index, int):
            if index < 0:
                index += len(self)
            if not hasattr(self, 'isHash'):
                for i in range(len(self), index):
                    super().__setitem__(i, None)
            super().__setitem__(index, value)
        else:
            self.isHash = True
            super().__setitem__(index, value)

    def __iter__(self):
        if hasattr(self, 'isHash'):
            for i in self.keys():
                yield i
        else:
            for i in range(len(self)):
                yield self[i]

    def __add__(self, other):
        result = ArrayHash(self)
        if hasattr(self, 'isHash') or (isinstance(other, dict) and not isinstance(other, _ArrayHash)) or hasattr(other, 'isHash'):
            result.update(other)
        else:
            result.extend(other)
        return result

    def __iadd__(self, other):
        if hasattr(self, 'isHash') or (isinstance(other, dict) and not isinstance(other, _ArrayHash)) or hasattr(other, 'isHash'):
            self.update(other)
        else:
            self.extend(other)
        return self

    def __radd__(self, other):
        result = ArrayHash()
        if hasattr(self, 'isHash') or (isinstance(other, dict) and not isinstance(other, _ArrayHash)) or hasattr(other, 'isHash'):
            result.update(other)
            result.update(self)
        else:
            result.extend(other)
            result.extend(self)
        return result


def ArrayHash(init=None):
    """Acts like an array with autovivification, unless you use a string as a key, then it becomes a hash"""
    result = _ArrayHash(ArrayHash)
    if init is not None:
        if isinstance(init, collections.abc.Sequence) and not isinstance(init, str):
            result.extend(init)
        elif isinstance(init, _ArrayHash):
            if(hasattr(init, 'isHash')):
                result.update(init)
            else:
                result.extend(init)
        elif isinstance(init, dict):
            result.update(init)
        else:
            result.append(init)
    return result

- snoopyjc

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ignacio Vazquez-Abrams · Accepted Answer

16

a = collections.defaultdict(lambda: collections.defaultdict(list))

- Ignacio Vazquez-Abrams

2

Python的一个例子是只有一种明显的方法来做某件事情 :) - Brian

实际上，我意识到这个答案只适用于2级字典和第3级列表。有没有一种方法可以不考虑层级？也就是说，在声明a之后，我可以执行a['x'].append('z')或a['x']['y']['w'].append('z')？谢谢。 - Zhang18

有人能给出一个经过验证的示例来按照Ignacio的建议创建一个类吗？谢谢。 - Zhang18

Ignacio，那是不正确的。你所要做的就是给函数命名，并使用命名函数作为内部defaultdict调用的参数，而不是使用列表内置函数：http://code.activestate.com/recipes/578057-recursive-defaultdict/ - ncoghlan

@ncoghlan：给函数命名怎么能突然将其变成标准库的一部分呢？ - Ignacio Vazquez-Abrams

显示剩余4条评论