我该如何在Python中记忆化一个类的实例化？

Question

我该如何在Python中记忆化一个类的实例化？

30

好的，这里是真实场景：我正在编写一个应用程序，并且我有一个表示某种类型文件的类（在我的情况下，这是照片，但该细节与问题无关）。每个Photograph类的实例都应该对应于照片的文件名。

问题是，当用户告诉我的应用程序加载文件时，我需要能够识别已经加载的文件，并使用该文件名的现有实例，而不是在相同的文件名上创建重复实例。

对我来说，这似乎是使用记忆化的好情况，而且有很多相关的示例，但在这种情况下，我不仅需要记忆普通函数，还需要记忆__init__()。这会带来一个问题，因为在调用__init__()之前，它已经太晚了，因为已经创建了一个新实例。

在我的研究中，我发现了Python的__new__() 方法，我实际上能够编写一个可工作的简单示例，但当我尝试将其用于我的真实对象时，它就不起作用了，我不确定为什么（唯一能想到的是我的真实对象是其他我无法控制的对象的子类，因此存在一些兼容性问题）。这是我拥有的：

class Flub(object):
    instances = {}

    def __new__(cls, flubid):
        try:
            self = Flub.instances[flubid]
        except KeyError:
            self = Flub.instances[flubid] = super(Flub, cls).__new__(cls)
            print 'making a new one!'
            self.flubid = flubid
        print id(self)
        return self

    @staticmethod
    def destroy_all():
        for flub in Flub.instances.values():
            print 'killing', flub


a = Flub('foo')
b = Flub('foo')
c = Flub('bar')

print a
print b
print c
print a is b, b is c

Flub.destroy_all()

这段代码的输出结果是什么：

making a new one!
139958663753808
139958663753808
making a new one!
139958663753872
<__main__.Flub object at 0x7f4aaa6fb050>
<__main__.Flub object at 0x7f4aaa6fb050>
<__main__.Flub object at 0x7f4aaa6fb090>
True False
killing <__main__.Flub object at 0x7f4aaa6fb050>
killing <__main__.Flub object at 0x7f4aaa6fb090>

太完美了！只有两个唯一ID的实例被创建了，而Flub.instances明显只列出了两个。

但是，当我尝试将这种方法用于我的对象时，我遇到了各种荒谬的错误，例如__init__()只需要0个参数，而不是2个。所以我改变了一些东西，然后它告诉我__init__()需要一个参数。非常奇怪。

经过一段时间的抗争，我基本上放弃了，并将所有__new__()黑魔法移动到一个名为get的staticmethod中，这样我就可以调用Photograph.get（filename），如果filename已经存在于Photograph.instances中，它就只会调用Photograph（filename）。

有人知道我在哪里错了吗？有没有更好的方法来做到这一点？

另一种思考方式是，它类似于一个单例模式，但它不是全局单例模式，而是每个文件名的单例模式。

如果您想看到全部代码，请点击此处查看使用staticmethod get的真实代码。

- robru

1

我已经编辑了问题，删除了您所说的那些东西。 - robru

4个回答

9

我使用的解决方案是这样的：

class memoize(object):
    def __init__(self, cls):
        self.cls = cls
        self.__dict__.update(cls.__dict__)

        # This bit allows staticmethods to work as you would expect.
        for attr, val in cls.__dict__.items():
            if type(val) is staticmethod:
                self.__dict__[attr] = val.__func__

    def __call__(self, *args):
        key = '//'.join(map(str, args))
        if key not in self.cls.instances:
            self.cls.instances[key] = self.cls(*args)
        return self.cls.instances[key]

然后你需要用这个装饰器来修饰类，而不是在__init__中使用。虽然brandizzi提供了我所需的关键信息，但他的示例装饰器并没有按预期运行。

我觉得这个概念很微妙，但基本上，在Python中使用装饰器时，你需要理解被修饰的东西（无论是方法还是类）实际上被装饰器本身所替换。例如，当我尝试访问Photograph.instances或Camera.generate_id()（一个静态方法）时，我实际上无法访问它们，因为Photograph实际上并不是指原始的Photograph类，它指的是memoized 函数（来自brandizzi的示例）。

为了解决这个问题，我必须创建一个装饰器类，它实际上从被装饰的类中获取所有属性和静态方法，并将它们公开为自己的属性和方法。几乎像一个子类，只是装饰器类事先不知道它将要装饰哪些类，所以它必须在之后复制属性。

最终结果是，memoize类的任何实例都成为实际上被它装饰的类的几乎透明的包装器，唯一的例外是，在尝试实例化它（但实际上是调用它）时，如果有可用的缓存副本，则会提供你缓存的副本。

- robru

2

这对我非常有帮助。我只想补充一下，我的用例也涉及到classmethods，因此需要在staticmethod检查之后添加以下行：if type(val) is classmethod: self.__dict__[attr] = functools.partial(val.__func__, cls)。 - MarcTheSpark

2

__new__的参数也会传递给__init__，因此:

def __init__(self, flubid):
    ...

您需要接受那里的flubid参数，即使您在__init__中不使用它。

这是从Python2.7.3中的typeobject.c中提取的相关注释。

/* You may wonder why object.__new__() only complains about arguments
   when object.__init__() is not overridden, and vice versa.

   Consider the use cases:

   1. When neither is overridden, we want to hear complaints about
      excess (i.e., any) arguments, since their presence could
      indicate there's a bug.

   2. When defining an Immutable type, we are likely to override only
      __new__(), since __init__() is called too late to initialize an
      Immutable object.  Since __new__() defines the signature for the
      type, it would be a pain to have to override __init__() just to
      stop it from complaining about excess arguments.

   3. When defining a Mutable type, we are likely to override only
      __init__().  So here the converse reasoning applies: we don't
      want to have to override __new__() just to stop it from
      complaining.

   4. When __init__() is overridden, and the subclass __init__() calls
      object.__init__(), the latter should complain about excess
      arguments; ditto for __new__().

   Use cases 2 and 3 make it unattractive to unconditionally check for
   excess arguments.  The best solution that addresses all four use
   cases is as follows: __init__() complains about excess arguments
   unless __new__() is overridden and __init__() is not overridden
   (IOW, if __init__() is overridden or __new__() is not overridden);
   symmetrically, __new__() complains about excess arguments unless
   __init__() is overridden and __new__() is not overridden
   (IOW, if __new__() is overridden or __init__() is not overridden).

   However, for backwards compatibility, this breaks too much code.
   Therefore, in 2.6, we'll *warn* about excess arguments when both
   methods are overridden; for all other cases we'll use the above
   rules.

*/

- John La Rooy

你所说的有道理，但是如果我根本没有定义__init__，我的简单例子怎么运行呢？它不应该也会给我关于传递了错误数量的参数的错误吗？ - robru

@Robru，我在我的答案中更新了typeobject.c中给出的解释。 - John La Rooy

0

我也在尝试弄清楚这个问题，我整合了其他StackOverflow问题的一些提示（链接在代码注释中）。

如果还有人需要，可以试试这个：

import functools
from collections import OrderedDict

def memoize(f):

    class Memoized:
        def __init__(self, func):
            self._f = func
            self._cache = {}
            # Make the Memoized class masquerade as the object we are memoizing.
            # Preserve class attributes
            functools.update_wrapper(self, func)
            # Preserve static methods
            # From https://dev59.com/3GXWa4cB1Zd3GeqPL1Wm
            for k, v in func.__dict__.items():
                self.__dict__[k] = v.__func__ if type(v) is staticmethod else v

        def __call__(self, *args, **kwargs):
            # Generate key
            key = (args)
            if kwargs:
                key += (object())
                for k, v in kwargs.items():
                    key += (hash(k))
                    key += (hash(v))
            key = hash(key)
            if key in self._cache:
                return self._cache[key]
            else:
                self._cache[key] = self._f(*args, **kwargs)
                return self._cache[key]

        def __get__(self, instance, owner):
            """
            From https://dev59.com/vF0a5IYBdhLWcg3woZ83
            """
            return functools.partial(self.__call__, instance)

        def __instancecheck__(self, other):
            """Make isinstance() work"""
            return isinstance(other, self._f)

    return Memoized(f)

然后您可以这样使用：

@memoize
class Test:
    def __init__(self, value):
        self._value = value

    @property
    def value(self):
        return self._value

已将完整的文档上传至：https://github.com/spoorn/nemoize

- spoorn

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- brandizzi · Accepted Answer

让我们来看一下你的问题中的两个点。

使用memoize

你可以使用memoization，但应该装饰整个类，而不是__init__方法。假设我们有这个memoizator：

def get_id_tuple(f, args, kwargs, mark=object()):
    """ 
    Some quick'n'dirty way to generate a unique key for an specific call.
    """
    l = [id(f)]
    for arg in args:
        l.append(id(arg))
    l.append(id(mark))
    for k, v in kwargs:
        l.append(k)
        l.append(id(v))
    return tuple(l)

_memoized = {}
def memoize(f):
    """ 
    Some basic memoizer
    """
    def memoized(*args, **kwargs):
        key = get_id_tuple(f, args, kwargs)
        if key not in _memoized:
            _memoized[key] = f(*args, **kwargs)
        return _memoized[key]
    return memoized

现在你只需要装饰这个类：

@memoize
class Test(object):
    def __init__(self, somevalue):
        self.somevalue = somevalue

让我们来看一个测试吧？

tests = [Test(1), Test(2), Test(3), Test(2), Test(4)]
for test in tests:
    print test.somevalue, id(test)

下面是输出结果。请注意，相同的参数会生成相同id的返回对象：

1 3072319660
2 3072319692
3 3072319724
2 3072319692
4 3072319756

不管怎样，我更喜欢创建一个函数来生成对象并进行记忆化。这种方式对我来说更加清晰，但这可能只是一些无关紧要的个人癖好。

class Test(object):
    def __init__(self, somevalue):
        self.somevalue = somevalue

@memoize
def get_test_from_value(somevalue):
    return Test(somevalue)

使用`new`:

当然，你也可以重写__new__方法。几天前我发表了一个有关覆盖 __new__的内部原理和最佳实践的答案，这可能会很有帮助。基本上，它建议总是将*args, **kwargs传递给你的__new__方法。

个人而言，我更喜欢对创建对象的函数进行备忘录处理，甚至编写一个特定的函数来确保不会为相同的参数重新创建对象。当然，这主要是我的看法，而不是规则。

我该如何在Python中记忆化一个类的实例化？

使用memoize

使用__new__:

使用`new`: