我正在处理一个已经使用MPI的Fortan代码。
现在,我面临一个情况,一组数据非常大,但对于每个进程都相同,因此我希望将其仅存储在每个节点的内存中,并且该节点上的所有进程都可以访问相同的数据。
为每个进程存储它将超出可用RAM。
是否有可能通过openMP实现这样的目标?
只需要按节点共享数据,不需要其他节点并行处理,因为这已经通过MPI完成了。
我正在处理一个已经使用MPI的Fortan代码。
现在,我面临一个情况,一组数据非常大,但对于每个进程都相同,因此我希望将其仅存储在每个节点的内存中,并且该节点上的所有进程都可以访问相同的数据。
为每个进程存储它将超出可用RAM。
是否有可能通过openMP实现这样的目标?
只需要按节点共享数据,不需要其他节点并行处理,因为这已经通过MPI完成了。
如果只是为了共享一块数据,你不需要实现混合MPI + OpenMP代码。你需要做的是:
1)将全局通信器分割成跨越相同主机/节点的组。如果你的MPI库实现了MPI-3.0,这非常容易 - 你只需要调用MPI_COMM_SPLIT_TYPE
并将split_type
设置为MPI_COMM_TYPE_SHARED
:
USE mpi_f08
TYPE(MPI_Comm) :: hostcomm
CALL MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, &
MPI_INFO_NULL, hostcomm)
MPI-2.2或更早版本不提供MPI_COMM_SPLIT_TYPE
操作,因此需要有些创意。例如,您可以使用我在Github上找到的简单的按主机拆分实现,该实现方法在此处。
2)现在,同一节点上的进程属于同一个通信器hostcomm
,它们可以创建一块共享内存,并使用它来交换数据。再次提醒,MPI-3.0提供了一个相对容易且可移植的方法来实现这一点:
USE mpi_f08
USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, C_F_POINTER
INTEGER :: hostrank
INTEGER(KIND=MPI_ADDRESS_KIND) :: size
INTEGER :: disp_unit
TYPE(C_PTR) :: baseptr
TYPE(MPI_Win) :: win
TYPE(MY_DATA_TYPE), POINTER :: shared_data
! We only want one process per host to allocate memory
! Set size to 0 in all processes but one
CALL MPI_Comm_rank(hostcomm, hostrank)
if (hostrank == 0) then
size = 10000000 ! Put the actual data size here
else
size = 0
end if
disp_unit = 1
CALL MPI_Win_allocate_shared(size, disp_unit, MPI_INFO_NULL, &
hostcomm, baseptr, win)
! Obtain the location of the memory segment
if (hostrank /= 0) then
CALL MPI_Win_shared_query(win, 0, size, disp_unit, baseptr)
end if
! baseptr can now be associated with a Fortran pointer
! and thus used to access the shared data
CALL C_F_POINTER(baseptr, shared_data)
! Use shared_data as if it was ALLOCATE'd
! ...
! Destroy the shared memory window
CALL MPI_Win_free(win)
代码的工作方式是使用MPI-3.0功能来分配共享内存窗口。 MPI_WIN_ALLOCATE_SHARED
在每个进程中分配一块共享内存。由于您希望共享一块数据,因此只有将其分配给单个进程而不是分散在多个进程中才有意义,因此在进行调用时对size
除了一个秩以外都设置为0。使用MPI_WIN_SHARED_QUERY
查找共享内存块在调用进程的虚拟地址空间中映射的地址。一旦知道了地址,C指针就可以使用C_F_POINTER()
子程序与Fortran指针相关联,并且后者可以用于访问共享内存。完成后,必须通过使用MPI_WIN_FREE
销毁共享内存窗口来释放共享内存。shm_open()
/ ftruncate()
/ mmap()
。必须编写一个可从Fortran中调用的实用C函数来执行这些操作。请参见此代码以获取一些灵感。mmap()
返回的void *
可以直接传递给Fortran代码中的C_PTR
类型变量,然后可以将其与Fortran指针相关联。program sharedmemtest
USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, C_F_POINTER
use mpi
implicit none
integer, parameter :: dp = selected_real_kind(14,200)
integer :: win,win2,hostcomm,hostrank
INTEGER(KIND=MPI_ADDRESS_KIND) :: windowsize
INTEGER :: disp_unit,my_rank,ierr,total
TYPE(C_PTR) :: baseptr,baseptr2
real(dp), POINTER :: matrix_elementsy(:,:,:,:)
integer,allocatable :: arrayshape(:)
call MPI_INIT( ierr )
call MPI_COMM_RANK(MPI_COMM_WORLD,MY_RANK,IERR) !GET THE RANK OF ONE PROCESS
call MPI_COMM_SIZE(MPI_COMM_WORLD,Total,IERR) !GET THE TOTAL PROCESSES OF THE COMM
CALL MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, hostcomm,ierr)
CALL MPI_Comm_rank(hostcomm, hostrank,ierr)
! Gratefully based on: https://dev59.com/aGAf5IYBdhLWcg3wUBNc
! and https://gcc.gnu.org/onlinedocs/gfortran/C_005fF_005fPOINTER.html
! We only want one process per host to allocate memory
! Set size to 0 in all processes but one
allocate(arrayshape(4))
arrayshape=(/ 10,10,10,10 /)
if (hostrank == 0) then
windowsize = int(10**4,MPI_ADDRESS_KIND)*8_MPI_ADDRESS_KIND !*8 for double ! Put the actual data size here
else
windowsize = 0_MPI_ADDRESS_KIND
end if
disp_unit = 1
CALL MPI_Win_allocate_shared(windowsize, disp_unit, MPI_INFO_NULL, hostcomm, baseptr, win, ierr)
! Obtain the location of the memory segment
if (hostrank /= 0) then
CALL MPI_Win_shared_query(win, 0, windowsize, disp_unit, baseptr, ierr)
end if
! baseptr can now be associated with a Fortran pointer
! and thus used to access the shared data
CALL C_F_POINTER(baseptr, matrix_elementsy,arrayshape)
!!! your code here!
!!! sample below
if (hostrank == 0) then
matrix_elementsy=0.0_dp
matrix_elementsy(1,2,3,4)=1.0_dp
end if
CALL MPI_WIN_FENCE(0, win, ierr)
print *,"my_rank=",my_rank,matrix_elementsy(1,2,3,4),matrix_elementsy(1,2,3,5)
!!! end sample code
call MPI_WIN_FENCE(0, win, ierr)
call MPI_BARRIER(MPI_COMM_WORLD,ierr)
call MPI_Win_free(win,ierr)
call MPI_FINALIZE(IERR)
end program
==== EDIT 1/7/2016 ====
这样的东西。 - Vladimir F Героям славаMPI_Win_fence
将内存初始化步骤matrix_elementsy=0.0_dp
与使用内存步骤matrix_elementsy(1,2,3,4)=1.0_dp
分开。当我尝试在下面的代码循环中进行操作时,这个问题困扰了我。这是用户错误,但也可以作为其他人的参考点。 - Jim Parkerprogram sharedmemtest
USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, C_F_POINTER
use mpi
implicit none
integer, parameter :: dp = selected_real_kind(14,200)
integer :: win,win2,hostcomm,hostrank
INTEGER(KIND=MPI_ADDRESS_KIND) :: windowsize
INTEGER :: disp_unit,my_rank,ierr,total, i
TYPE(C_PTR) :: baseptr,baseptr2
real(dp), POINTER :: matrix_elementsy(:,:,:,:)
integer,allocatable :: arrayshape(:)
call MPI_INIT( ierr )
call MPI_COMM_RANK(MPI_COMM_WORLD,my_rank, ierr) !GET THE RANK OF ONE PROCESS
call MPI_COMM_SIZE(MPI_COMM_WORLD,total,ierr) !GET THE TOTAL PROCESSES OF THE COMM
CALL MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, hostcomm,ierr)
CALL MPI_Comm_rank(hostcomm, hostrank,ierr)
! Gratefully based on: https://dev59.com/aGAf5IYBdhLWcg3wUBNc
! and https://gcc.gnu.org/onlinedocs/gfortran/C_005fF_005fPOINTER.html
! We only want one process per host to allocate memory
! Set size to 0 in all processes but one
allocate(arrayshape(4))
arrayshape=(/ 10,10,10,10 /)
if (hostrank == 0) then
windowsize = int(10**4,MPI_ADDRESS_KIND)*8_MPI_ADDRESS_KIND !*8 for double ! Put the actual data size here
else
windowsize = 0_MPI_ADDRESS_KIND
end if
disp_unit = 1
CALL MPI_Win_allocate_shared(windowsize, disp_unit, MPI_INFO_NULL, hostcomm, baseptr, win, ierr)
! Obtain the location of the memory segment
if (hostrank /= 0) then
CALL MPI_Win_shared_query(win, 0, windowsize, disp_unit, baseptr, ierr)
end if
! baseptr can now be associated with a Fortran pointer
! and thus used to access the shared data
CALL C_F_POINTER(baseptr, matrix_elementsy,arrayshape)
!!! your code here!
!!! sample below
if (hostrank == 0) then
matrix_elementsy=0.0_dp
endif
call MPI_WIN_FENCE(0, win, ierr)
do i=1, 15
if (hostrank == 0) then
matrix_elementsy(1,2,3,4)=i * 1.0_dp
matrix_elementsy(1,2,2,4)=i * 2.0_dp
elseif ((hostrank > 5) .and. (hostrank < 11)) then ! code for non-root nodes to do something different
matrix_elementsy(1,2,hostrank, 4) = hostrank * 1.0 * i
endif
call MPI_WIN_FENCE(0, win, ierr)
write(*,'(A, I4, I4, 10F7.1)') "my_rank=",my_rank, i, matrix_elementsy(1,2,:,4)
call MPI_BARRIER(MPI_COMM_WORLD, ierr)
enddo
!!! end sample code
call MPI_WIN_FENCE(0, win, ierr)
call MPI_BARRIER(MPI_COMM_WORLD,ierr)
call MPI_Win_free(win,ierr)
call MPI_FINALIZE(IERR)
end program