使用需要Dask计算的关键字参数的自定义Dask图函数

5
如何使用需要另一个dask任务结果作为关键字参数的函数构建自定义dask图形?
dask文档和一些stackoverflow问题建议使用“partial”、“toolz”或“dask.compatibility.apply”。所有这些解决方案都适用于静态关键字参数。根据在自定义Dask图中包含关键字参数(kwargs)和阅读源代码和调试器的理解,dask.compatibility.apply可能能够处理作为dask计算结果的关键字参数。然而,我似乎无法得到正确的语法,也找不到其他答案。
下面的示例展示了一个相对简单的应用程序,使用dask计算的关键字值来使用dask.compatibility.apply。Dask成功地传递了计算出的args 'a'和'b'的值,以及静态关键字值'other'。然而,它将字符串'c'传递给函数,而不是替换成其计算出的值。
import dask
from dask.compatibility import apply


def custom_func(a, b, other=None, c=None):
    print(a, b, other, c)
    return a * b / c / other


dsk = {
    'a': (sum, (1, 1)),
    'b': (sum, (2, 2)),
    'c': (sum, (3, 3)),
    'd': (apply, custom_func, ['a', 'b'], {'c': 'c', 'other': 2})
}

dask.visualize(dsk, filename='graph.png')
for key in sorted(dsk):
    print(key)
    print(dask.get(dsk, key))
    print('\n')

输出如下:
a
2


b
4


c
6


d
2 4 2 c
Traceback (most recent call last):
  File "dask_kwarg.py", line 20, in <module>
    print(dask.get(dsk, key))
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 562, in get_sync
    return get_async(apply_sync, 1, dsk, keys, **kwargs)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 529, in get_async
    fire_task()
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 504, in fire_task
    callback=queue.put)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 551, in apply_sync
    res = func(*args, **kwds)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 295, in execute_task
    result = pack_exception(e, dumps)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 290, in execute_task
    result = _execute_task(task, data)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 271, in _execute_task
    return func(*args2)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/compatibility.py", line 50, in apply
    return func(*args, **kwargs)
  File "dask_kwarg.py", line 7, in custom_func
    return a * b / c / other
TypeError: unsupported operand type(s) for /: 'int' and 'str'

graph.png


这个问题看起来很具体,可能被认为是一个 bug - 要么这个功能应该正常工作,要么文档不够清晰,或者这是一个需要请求的功能。我建议您重新在 Github 上发布。 - mdurant
感谢您的指导。https://github.com/dask/dask/issues/3741 - Will Holmgren
1个回答

7
一种方法是找出dask.delayed如何实现 :)
In [1]: import dask

In [2]: @dask.delayed
   ...: def f(*args, **kwargs):
   ...:     pass
   ...: 

In [3]: dict(f(x=1).dask)
Out[3]: 
{'f-d2cd50e7-25b1-49c5-b463-f05198b09dfb': (<function dask.compatibility.apply>,
  <function __main__.f>,
  [],
  (dict, [['x', 1]]))}

有趣的是,在本地调度程序和分布式调度程序之间,这也是不同意的情况。 分布式调度程序可以很好地处理这个问题。
In [1]: from dask.distributed import Client

In [2]: client = Client()

In [3]: import dask
   ...: from dask.compatibility import apply
   ...: 
   ...: 
   ...: def custom_func(a, b, other=None, c=None):
   ...:     print(a, b, other, c)
   ...:     return a * b / c / other
   ...: 
   ...: 
   ...: dsk = {
   ...:     'a': (sum, (1, 1)),
   ...:     'b': (sum, (2, 2)),
   ...:     'c': (sum, (3, 3)),
   ...:     'd': (apply, custom_func, ['a', 'b'], {'c': 'c', 'other': 2})
   ...: }
   ...: 

In [4]: for key in sorted(dsk):
   ...:     print(key, client.get(dsk, key))
   ...:     
a 2
b 4
c 6
2 4 2 6
d 0.6666666666666666

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接