mpi4py中的共享内存

Question

mpi4py中的共享内存

python-3.xmpishared-memorympi4pymemoryview

11

我使用一个 MPI (mpi4py) 脚本（在单个节点上），它处理一个非常大的对象。为了使所有进程都能访问该对象，我通过 comm.bcast() 进行分发。这将对象复制到所有进程中，并在复制过程中消耗大量内存。因此，我想分享类似于指针的东西，而不是对象本身。我发现 memoryview 中的一些功能对于提高在进程内处理对象的效率很有用。同时，可以通过 memoryview 对象字符串表示来访问对象的真实内存地址，并像这样进行分布：

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank:
    content_pointer = comm.bcast(root = 0)
    print(rank, content_pointer)
else:
    content = ''.join(['a' for i in range(100000000)]).encode()
    mv = memoryview(content)
    print(mv)
    comm.bcast(str(mv).split()[-1][: -1], root = 0)

这将打印：

<memory at 0x7f362a405048>
1 0x7f362a405048
2 0x7f362a405048
...

因此，我认为一定有一种方法可以在另一个进程中重新构建对象。然而，在文档中我找不到任何线索如何实现它。

简而言之，我的问题是：在mpi4py中，是否可能在同一节点的进程之间共享对象？

- Roman

你能将JobJob的新MPI 3.0答案标记为正确吗？ - Robin De Schepper

2个回答

0

我对mpi4py并不是很了解，但从MPI的角度来看，这应该是不可能的。MPI代表消息传递接口，这意味着确切地传递进程之间的消息。您可以尝试使用MPI单边通信来模拟类似于全局可访问内存的东西，但除此之外，进程内存对其他进程不可用。

如果您需要依赖大块共享内存，则需要利用诸如OpenMP或线程之类的东西，在单个节点上绝对可以使用。MPI和一些共享内存并行化的混合并行化将允许您每个节点拥有一个共享内存块，但仍然可以利用多个节点。

- haraldkl

8

根据MPI 3.0共享内存（https://software.intel.com/en-us/articles/using-mpi-3-shared-memory-in-xeon-phi-processors）的规定，这个答案是不正确的。 - Ben Thompson

3

好的，你希望我删除它还是更改答案？我对这个新功能没有任何经验... - haraldkl

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- JobJob · Accepted Answer

这是一个使用MPI共享内存的简单示例，略微修改自https://groups.google.com/d/msg/mpi4py/Fme1n9niNwQ/lk3VJ54WAQAJ

您可以使用以下命令运行它：mpirun -n 2 python3 shared_memory_test.py（假设您将其保存为shared_memory_test.py）。

from mpi4py import MPI 
import numpy as np 

comm = MPI.COMM_WORLD 

# create a shared array of size 1000 elements of type double
size = 1000 
itemsize = MPI.DOUBLE.Get_size() 
if comm.Get_rank() == 0: 
    nbytes = size * itemsize 
else: 
    nbytes = 0

# on rank 0, create the shared block
# on rank 1 get a handle to it (known as a window in MPI speak)
win = MPI.Win.Allocate_shared(nbytes, itemsize, comm=comm) 

# create a numpy array whose data points to the shared mem
buf, itemsize = win.Shared_query(0) 
assert itemsize == MPI.DOUBLE.Get_size() 
ary = np.ndarray(buffer=buf, dtype='d', shape=(size,)) 

# in process rank 1:
# write the numbers 0.0,1.0,..,4.0 to the first 5 elements of the array
if comm.rank == 1: 
    ary[:5] = np.arange(5)

# wait in process rank 0 until process 1 has written to the array
comm.Barrier() 

# check that the array is actually shared and process 0 can see
# the changes made in the array by process 1
if comm.rank == 0: 
    print(ary[:10])

应该输出以下内容（从进程等级0打印）：

[0. 1. 2. 3. 4. 0. 0. 0. 0. 0.]