如何在Python的OrderedDict中使用字符串键而不是整数进行切片？

Question

如何在Python的OrderedDict中使用字符串键而不是整数进行切片？

pythonpython-2.7dictionarypython-internalsordereddictionary

21

由于 OrderedDict 具有列表（具有有序元素）和字典（使用键而不是索引）的特性，因此似乎可以使用键来进行切片。

>>> from collections import OrderedDict
>>> cities = OrderedDict((('san francisco', 650), ('new york', 212), ('shanghai', 8621), ('barcelona', 42423)))
>>> test['shanghai':]  # I want all the cities from shanghai to the end of the list
TypeError: unhashable type

有趣的是，这不是由于OrderedDictionary.__getslice__未被实现而导致的错误。我尝试向OrderedDict添加自己的__getslice__方法，但我一直遇到这个TypeError问题。似乎Python在将切片键传递给__getslice__函数之前进行某种类型检查，以强制执行切片键仅为整数，这是多么不符合Python风格！

>>> class BetterOrderedDict(OrderedDict):
        def __getslice__(self, start=None, end=None, step=1):
            return 'potato'

>>> test = BetterOrderedDict((('one', 1), ('two', 2), ('three', 3), ('four', 4)))
>>> print test[1:4]
'potato'                           # ok this makes sense so far

>>> test['one':'four']
TypeError: unhashable type         # WTF, strings are hashable!

所以我的问题是，为什么我不能实现非整数片段，是什么类型检查阻止了切片键甚至到达我的__getslice__函数，并且我是否可以通过用绑定在C中实现我的BetterOrderedDict来覆盖它？

- Nick Sweeting

你想从键“one”创建一个切片，直到键“four”？ - Rafael Barros

但是...为什么？目的是什么？ - Rafael Barros

3

请查看此PEP文档：https://www.python.org/dev/peps/pep-0357/。 - Asad Saeeduddin

@Asad 把那个放在答案里，我会接受它作为正确的答案，感谢你提供链接！还有，哇，那是在2006年提交的！！ - Nick Sweeting

2

请注意，在Python3.4中，slice('a','b')是有效的。但是，[1,2,3][slice('a','b')]会报错：TypeError: slice indices must be integers or None or have an __index__ method。 - Adam Smith

显示剩余4条评论

3个回答

12

这是您期望的切片功能的实际实现。 OrderedDict 内部通过双向链表的形式维护键的顺序。引用 Python 2.7.9 中的实际注释。

# The internal self.__map dict maps keys to links in a doubly linked list.
# The circular doubly linked list starts and ends with a sentinel element.
# The sentinel element never gets deleted (this simplifies the algorithm).
# Each link is stored as a list of length three:  [PREV, NEXT, KEY].

现在，为了切割字典，我们需要遍历双向链表__root，它实际上是一个私有变量，由名称修饰机制保护。

注意: 这涉及到非常规的名称解封装来使用OrderedDict的内部数据结构。

from collections import OrderedDict

class SlicableDict(OrderedDict):
    def __getitem__(self, key):
        if isinstance(key, slice):
            # Unmangle `__root` to access the doubly linked list
            root = getattr(self, "_OrderedDict__root")
            # By default, make `start` as the first element, `end` as the last
            start, end = root[1][2], root[0][2]
            start = key.start or start
            end = key.stop or end
            step = key.step or 1
            curr, result, begun, counter = root[1], [], False, 0

            # Begin iterating
            curr, result, begun = root[1], [], False
            while curr is not root:
                # If the end value is reached, `break` and `return`
                if curr[2] == end:
                    break
                # If starting value is matched, start appending to `result`
                if curr[2] == start:
                    begun = True
                if begun:
                    if counter % step == 0:
                        result.append((curr[2], self[curr[2]]))
                    counter += 1

                # Make the `curr` point to the next element
                curr = curr[1]

            return result

        return super(SlicableDict, self).__getitem__(key)

几个样本运行：

>>> s = SlicableDict(a=1, b=2, c=3, d=4)
>>> s
SlicableDict([('a', 1), ('c', 3), ('b', 2), ('e', 5), ('d', 4), ('f', 6)])
>>> s['a':'c']
[('a', 1)]
>>> s['a':]
[('a', 1), ('c', 3), ('b', 2), ('e', 5), ('d', 4)]
>>> s[:'a']
[]
>>> s['a':'f':2]
[('a', 1), ('b', 2), ('d', 4)]

- thefourtheye

5

很棒的回答，揭示了“有序字典”背后的“管道”。我预计你的回答比我的更快，但是可能会因为OrderedDict实现的更改（由于名称反解或其他原因）而出现问题，而我使用的“瓷器”实现可能在版本之间不会出现故障，但远远没有这么快。 - Adam Smith

非常详细，谢谢你实际实现了这个功能，而不是草率地处理它！我查看了Python源代码中的OrderedDict实现，但错过了应该使用getitem而不是getslice的事实。 - Nick Sweeting

2

@thefoureye 为什么要用 getattr 而不是直接使用 self._OrderedDict__root？ - Veedrac

当在假值项目上进行切片时，这也会出现错误。 - Veedrac

如果您已经依赖于OrderedDict的实现细节，为什么还要进行第一个元素的线性搜索呢？您可以直接使用self.__map跳转到那里，然后对最后一个元素进行线性搜索。 - Sven Marnach

4

尝试这个（非常丑陋的）实现。

class SliceOrdered(OrderedDict):

    def __getitem__(self, key):
        if isinstance(key, slice):
            tmp = OrderedDict()
            i_self = iter(self)
            for k in i_self:
                if key.start <= k <= key.stop:
                    tmp[k] = self[k]
                    if key.step is not None and key.step > 1:
                        for _ in range(key.step-1):
                            try:
                                next(i_self)
                            except StopIteration:
                                break
            return tmp
        else:
            return super(SliceOrdered, self).__getitem__(key)

演示（Python3.4）

>>> s = SliceOrdered([('a',2), ('b',2), ('c',3), ('d',4)])
>>> s['a':'c']
OrderedDict([('a', 2), ('b', 2), ('c', 3)])
>>> s['a':'d':2]
OrderedDict([('a', 2), ('c', 3)])

注意：这个例子可能只是有效的，因为在这个示例中，OrderedDict不仅被排序了，而且还被排序了。在未排序的字典中，切片'a':'c'不一定包含'b'，所以我的if key.start <= k <= key.stop逻辑可能会失败。下面的代码应该可以解决这个问题：

class SliceOrdered(OrderedDict):
    def __getitem__(self, key):
        if not isinstance(key, slice):
            return super(SliceOrdered,self).__getitem__(key)
        tmp = OrderedDict()
        step = key.step or 1
        accumulating = False
        i_self = iter(self)
        for k in i_self:
            if k == key.start:
                accumulating = True
            if accumulating:
                tmp[k] = self[k]
                for _ in range(step-1):
                    next(i_self)
            if k == key.stop:
                accumulating = False
                break
        return tmp

- Adam Smith

谢谢，我不知道使用__getitem__而不是getslice，现在这很有意义。 - Nick Sweeting

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- zch · Accepted Answer

__getslice__ 是实现切片的一种过时方法。与其如此，你应该使用 __getitem__ 来处理 slice 对象：

from collections import OrderedDict

class SlicableDict(OrderedDict):
    def __getitem__(self, key):
        if isinstance(key, slice):
            return 'potato({},{},{})'.format(key.start, key.stop, key.step)
        return super(SlicableDict, self).__getitem__(key)

>>> s = SlicableDict(a=1, b=2, c=3)
>>> s
SlicableDict([('a', 1), ('c', 3), ('b', 2)])
>>> s['a']
1
>>> s['a':'c']
'potato(a,c,None)'

如果您需要的不仅仅是切片一个土豆，那么可以按照以下方式实现所有三个切片操作：

def _key_slice_to_index_slice(items, key_slice):
    try:
        if key_slice.start is None:
            start = None
        else:
            start = next(idx for idx, (key, value) in enumerate(items)
                         if key == key_slice.start)
        if key_slice.stop is None:
            stop = None
        else:
            stop = next(idx for idx, (key, value) in enumerate(items)
                        if key == key_slice.stop)
    except StopIteration:
        raise KeyError
    return slice(start, stop, key_slice.step)

class SlicableDict(OrderedDict):
    def __getitem__(self, key):
        if isinstance(key, slice):
            items = self.items()
            index_slice = _key_slice_to_index_slice(items, key)
            return SlicableDict(items[index_slice])
        return super(SlicableDict, self).__getitem__(key)

    def __setitem__(self, key, value):
        if isinstance(key, slice):
            items = self.items()
            index_slice = _key_slice_to_index_slice(items, key)
            items[index_slice] = value.items()
            self.clear()
            self.update(items)
            return
        return super(SlicableDict, self).__setitem__(key, value)

    def __delitem__(self, key):
        if isinstance(key, slice):
            items = self.items()
            index_slice = _key_slice_to_index_slice(items, key)
            del items[index_slice]
            self.clear()
            self.update(items)
            return
        return super(SlicableDict, self).__delitem__(key)