懒加载设置字典

27

假设我在Python中有一个字典,定义在模块级别(mysettings.py):

settings = {
    'expensive1' : expensive_to_compute(1),
    'expensive2' : expensive_to_compute(2),
    ...
}

当访问键时,我希望计算这些值:

from mysettings import settings # settings is only "prepared"

print settings['expensive1'] # Now the value is really computed.

这可能吗?如何做到?


问题在于,如果您保持模块不变,则“from mysettings import settings”将评估模块的内容,因此完全创建字典。 - njzk2
8个回答

12

不要继承内置字典 dict。即使你重写了 dict.__getitem__() 方法,dict.get() 也不会按照你的预期工作。

正确的方法是从 collections 中继承 abc.Mapping

from collections.abc import Mapping

class LazyDict(Mapping):
    def __init__(self, *args, **kw):
        self._raw_dict = dict(*args, **kw)

    def __getitem__(self, key):
        func, arg = self._raw_dict.__getitem__(key)
        return func(arg)

    def __iter__(self):
        return iter(self._raw_dict)

    def __len__(self):
        return len(self._raw_dict)

然后您可以执行:

settings = LazyDict({
    'expensive1': (expensive_to_compute, 1),
    'expensive2': (expensive_to_compute, 2),
})

我在这里也列出了示例代码和示例:https://gist.github.com/gyli/9b50bb8537069b4e154fec41a4b5995a


2
继承abc.Mapping并覆盖__iter__()__len__()相比继承dict并仅覆盖get()的优势是什么? - ChaimG
1
@ChaimG 看起来在继承dict时,不仅仅是get()方法需要进行适应,例如还有pop()方法。请参考https://treyhunner.com/2019/04/why-you-shouldnt-inherit-from-list-and-dict-in-python/。 - undefined

7

如果您没有将参数与可调用对象分开,我认为不可能完成这个任务。但是,以下方法应该有效:

class MySettingsDict(dict):

    def __getitem__(self, item):
        function, arg = dict.__getitem__(self, item)
        return function(arg)


def expensive_to_compute(arg):
    return arg * 3

现在:

>>> settings = MySettingsDict({
'expensive1': (expensive_to_compute, 1),
'expensive2': (expensive_to_compute, 2),
})
>>> settings['expensive1']
3
>>> settings['expensive2']
6

编辑:

如果需要多次访问expensive_to_compute的结果,您可能还希望缓存这些结果。可以像下面这样做:

class MySettingsDict(dict):

    def __getitem__(self, item):
        value = dict.__getitem__(self, item)
        if not isinstance(value, int):
            function, arg = value
            value = function(arg)
            dict.__setitem__(self, item, value)
        return value

现在开始:

>>> settings.values()
dict_values([(<function expensive_to_compute at 0x9b0a62c>, 2),
(<function expensive_to_compute at 0x9b0a62c>, 1)])
>>> settings['expensive1']
3
>>> settings.values()
dict_values([(<function expensive_to_compute at 0x9b0a62c>, 2), 3])

根据您使用字典的方式,您可能还想覆盖其他dict方法。


2
存储函数并覆盖__getitem__是很聪明的做法,但我认为继承abc.Mapping而不是内置字典会更好。否则,它不支持.get()。您可以在此处检查我的示例https://gist.github.com/ligyxy/9b50bb8537069b4e154fec41a4b5995a - Guangyang Li

5

将函数引用存储为键的值,例如:

def A():
    return "that took ages"
def B():
    return "that took for-ever"
settings = {
    "A": A,
    "B": B,
}

print(settings["A"]())

这种方式仅在访问并调用键相关联的函数时才会评估它。一个适合处理非懒惰值的类应该是:
import types
class LazyDict(dict):
    def __getitem__(self,key):
        item = dict.__getitem__(self,key)
        if isinstance(item,types.FunctionType):
            return item()
        else:
            return item

使用方法:

settings = LazyDict([("A",A),("B",B)])
print(settings["A"])
>>> 
that took ages

3
你可以将 expensive_to_compute 改为生成器函数:
settings = {
    'expensive1' : expensive_to_compute(1),
    'expensive2' : expensive_to_compute(2),
}

那么尝试:

from mysettings import settings

print next(settings['expensive1'])

1
有趣的想法,但不是我要找的。我真的希望保持字典API不变。 - blueFast

3

最近我也需要类似的功能。将 Guangyang Limichaelmeyer 的两种策略结合起来,这是我做到的方法:

class LazyDict(MutableMapping):
  """Lazily evaluated dictionary."""

  function = None

  def __init__(self, *args, **kargs):
    self._dict = dict(*args, **kargs)

  def __getitem__(self, key):
      """Evaluate value."""
      value = self._dict[key]
      if not isinstance(value, ccData):
          value = self.function(value)
      self._dict[key] = value
      return value

  def __setitem__(self, key, value):
      """Store value lazily."""
      self._dict[key] = value

  def __delitem__(self, key):
      """Delete value."""
      return self._dict[key]

  def __iter__(self):
      """Iterate over dictionary."""
      return iter(self._dict)

  def __len__(self):
      """Evaluate size of dictionary."""
      return len(self._dict)

让我们懒惰地评估以下函数:

def expensive_to_compute(arg):
  return arg * 3

优点是该函数尚未在对象内定义,而参数实际上是存储的参数(这正是我需要的):

>>> settings = LazyDict({'expensive1': 1, 'expensive2': 2})
>>> settings.function = expensive_to_compute # function unknown until now!
>>> settings['expensive1']
3
>>> settings['expensive2']
6

这种方法仅适用于单个函数。

我可以指出以下优点:

  • 实现了完整的 MutableMapping API
  • 如果您的函数是非确定性的,则可以重置值以重新评估

3

我会使用可调用对象填充字典的值,在读取时将它们更改为结果。

class LazyDict(dict):
    def __getitem__(self, k):
        v = super().__getitem__(k)
        if callable(v):
            v = v()
            super().__setitem__(k, v)
        return v

    def get(self, k, default=None):
        if k in self:
            return self.__getitem__(k)
        return default

然后使用技术手段进行
def expensive_to_compute(arg):
    print('Doing heavy stuff')
    return arg * 3

以下操作可供参考:

>>> settings = LazyDict({
    'expensive1': lambda: expensive_to_compute(1),
    'expensive2': lambda: expensive_to_compute(2),
})

>>> settings.__repr__()
"{'expensive1': <function <lambda> at 0x000001A0BA2B8EA0>, 'expensive2': <function <lambda> at 0x000001A0BA2B8F28>}"

>>> settings['expensive1']
Doing heavy stuff
3

>>> settings.get('expensive2')
Doing heavy stuff
6

>>> settings.__repr__()
"{'expensive1': 3, 'expensive2': 6}"

1

另外,也可以使用LazyDictionary包创建线程安全的延迟字典。

安装方法:

pip install lazydict

使用方法:

from lazydict import LazyDictionary
import tempfile
lazy = LazyDictionary()
lazy['temp'] = lambda: tempfile.mkdtemp()

1

传入一个函数来生成第一个属性的值:

class LazyDict(dict):
  """ Fill in the values of a dict at first access """
  def __init__(self, fn, *args, **kwargs):
    self._fn = fn
    self._fn_args = args or []
    self._fn_kwargs = kwargs or {}
    return super(LazyDict, self).__init__()
  def _fn_populate(self):
    if self._fn:
      self._fn(self, *self._fn_args, **self._fn_kwargs)
      self._fn = self._fn_args = self._fn_kwargs = None
  def __getattribute__(self, name):
    if not name.startswith('_fn'):
      self._fn_populate()
    return super(LazyDict, self).__getattribute__(name)
  def __getitem__(self, item):
    self._fn_populate()
    return super(LazyDict, self).__getitem__(item)



>>> def _fn(self, val):
...   print 'lazy loading'
...   self['foo'] = val
... 
>>> d = LazyDict(_fn, 'bar')
>>> d
{}
>>> d['foo']
lazy loading
'bar'
>>> 

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接