文档说明如下(link):
MPI_Reduce_scatter首先对由sendbuf、count和datatype定义的发送缓冲区中的元素数量为S(i)recvcounts[i]的向量进行逐元素约简。接下来,结果向量被分成n个不相交的段,其中n是组中进程的数量。第i个段包含recvcounts[i]个元素。第i个段被发送到进程i并存储在由recvbuf、recvcounts[i]和datatype定义的接收缓冲区中。
我有以下(非常简单的)C程序,我希望得到前recvcounts[i]个元素的最大值,但似乎我做错了什么...
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#define NUM_PE 5
#define NUM_ELEM 3
char *print(int arr[], int n);
int main(int argc, char *argv[]) {
int rank, size, i, n;
int sendbuf[5][3] = {
{ 1, 2, 3 },
{ 4, 5, 6 },
{ 7, 8, 9 },
{ 10, 11, 12 },
{ 13, 14, 15 }
};
int recvbuf[15] = {0};
int recvcounts[5] = {
3, 3, 3, 3, 3
};
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
n = sizeof(sendbuf[rank]) / sizeof(int);
printf("sendbuf (thread %d): %s\n", rank, print(sendbuf[rank], n));
MPI_Reduce_scatter(sendbuf, recvbuf, recvcounts, MPI_INT, MPI_MAX, MPI_COMM_WORLD);
n = sizeof(recvbuf) / sizeof(int);
printf("recvbuf (thread %d): %s\n", rank, print(recvbuf, n)); // <--- I receive the same output as with sendbuf :(
MPI_Finalize();
return 0;
}
char *print(int arr[], int n) { } // it returns a string formatted as the following output
我的程序的输出对于recvbuf和sendbuf是相同的。我预期recvbuf应该包含最大值:
$ mpicc 03_reduce_scatter.c
$ mpirun -n 5 ./a.out
sendbuf (thread 4): [ 13, 14, 15 ]
sendbuf (thread 3): [ 10, 11, 12 ]
sendbuf (thread 2): [ 7, 8, 9 ]
sendbuf (thread 0): [ 1, 2, 3 ]
sendbuf (thread 1): [ 4, 5, 6 ]
recvbuf (thread 1): [ 4, 5, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
recvbuf (thread 2): [ 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
recvbuf (thread 0): [ 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
recvbuf (thread 3): [ 10, 11, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
recvbuf (thread 4): [ 13, 14, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]