使用线程进行矩阵乘法（每个线程执行单个乘法）

Question

使用线程进行矩阵乘法（每个线程执行单个乘法）

cmultithreadingpthreadsmatrix-multiplication

5

我想使用线程进行矩阵乘法，每个线程执行一次乘法，然后主线程将所有结果相加，并将它们放在最终矩阵的适当位置（在其他线程退出后）。我尝试的方法是创建一个单行数组，其中包含每个线程的结果。然后我会遍历数组并将结果相加并放置在最终矩阵中。例如：如果您有以下矩阵：A = [{1,4}, {2,5}, {3,6}]和B = [{8,7,6}, {5,4,3}]，那么我想要一个包含[8，20，7，16，6，12，16等]的数组。然后，我会循环遍历数组，将每两个数字相加并将它们放置在我的最终数组中。这是我的作业任务，因此我不需要完整的代码，但需要一些逻辑来正确存储数组中的结果。我在如何跟踪每个矩阵中的位置以避免遗漏任何数字方面遇到了困难。谢谢。编辑2：必须为每个乘法操作分配单个线程。也就是说，对于上面的示例，将有18个线程分别执行自己的计算。编辑：我目前正在使用此代码作为基础进行工作。

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define M 3
#define K 2
#define N 3
#define NUM_THREADS 10

int A [M][K] = { {1,4}, {2,5}, {3,6} };
int B [K][N] = { {8,7,6}, {5,4,3} };
int C [M][N];

struct v {
   int i; /* row */
   int j; /* column */
};

void *runner(void *param); /* the thread */

int main(int argc, char *argv[]) {

   int i,j, count = 0;
   for(i = 0; i < M; i++) {
      for(j = 0; j < N; j++) {
         //Assign a row and column for each thread
         struct v *data = (struct v *) malloc(sizeof(struct v));
         data->i = i;
         data->j = j;
         /* Now create the thread passing it data as a parameter */
         pthread_t tid;       //Thread ID
         pthread_attr_t attr; //Set of thread attributes
         //Get the default attributes
         pthread_attr_init(&attr);
         //Create the thread
         pthread_create(&tid,&attr,runner,data);
         //Make sure the parent waits for all thread to complete
         pthread_join(tid, NULL);
         count++;
      }
   }

   //Print out the resulting matrix
   for(i = 0; i < M; i++) {
      for(j = 0; j < N; j++) {
         printf("%d ", C[i][j]);
      }
      printf("\n");
   }
}

//The thread will begin control in this function
void *runner(void *param) {
   struct v *data = param; // the structure that holds our data
   int n, sum = 0; //the counter and sum

   //Row multiplied by column
   for(n = 0; n< K; n++){
      sum += A[data->i][n] * B[n][data->j];
   }
   //assign the sum to its coordinate
   C[data->i][data->j] = sum;

   //Exit the thread
   pthread_exit(0);
}

来源：http://macboypro.wordpress.com/2009/05/20/matrix-multiplication-in-c-using-pthreads-on-linux/

这篇文章介绍了如何在Linux上使用Pthreads进行矩阵乘法计算。通过将矩阵划分为块，每个线程可以处理一个块的计算，从而提高计算效率。作者提供了完整的代码和详细的注释，方便读者学习和理解。

- Kinru

2

这个大概已经做了十万次了。你最好通过确定机器上的 CPU 核心数 C，确定需要多少行乘以列的向量乘法，将后者除以前者（大约），并将其发送给 C 个线程独立处理。任何模数（多出来的向量直到 C-1）都会作为额外的乘法发送给第一系列线程。你很难找到一个更高效和简单的算法，尤其是考虑到根本不需要任何锁定。 - WhozCraig

抱歉，我的表述不够清晰。根据作业要求，每个需要进行的乘法都必须有1个线程。也就是说，对于我给出的示例矩阵，将会有18个线程执行18次乘法。这并不意味着它是高效的。这只是一项作业练习。 - Kinru

是的，我认为这只是一个练习。当你处理类似 A[500][800] x B[800][1000] 这样的东西时，概念很快就会降级。它变得越大，你花费的时间就越多，而这些时间本可以用来进行更多的乘法运算。唉，祝你好运！ - WhozCraig

如果矩阵足够大，您可能还想调查Strassen算法来进行矩阵乘法 - 额外的奖励？ - Brett Hale

2个回答

0

不确定需要分派多少线程，也不确定是否会使用join来收集它们。我猜测你在使用C语言，所以我会使用线程ID来跟踪要处理的行...类似这样：

#define NUM_THREADS 64
/*
 * struct to pass parameters to a dispatched thread 
 */
typedef struct {
  int   value;     /* thread number */
  char  somechar[128];   /* char data passed to thread */
  unsigned long ret;
  struct foo *row;
} thread_parm_t;

在这里我猜测每个线程将在指针*row中获取它的行数据，该指针具有某种定义好的类型foo。一堆整数、浮点数甚至是复杂类型。无论您需要传递给线程的是什么。

/*
 * the thread to actually crunch the row data
 */
void *thr_rowcrunch( void *parm );

pthread_t tid[NUM_THREADS]; /* POSIX array of thread IDs */

然后在你的主代码段中，类似于：

thread_parm_t *parm=NULL;

然后使用类似以下的方式分发线程：

for ( i = 0; i < NUM_THREADS; i++) {
    parm = malloc(sizeof(thread_parm_t));
    parm->value = i;
    strcpy(parm->somechar, char_data_to-pass );
    fill_in_row ( parm->row, my_row_data );
    pthread_create(&tid[i], NULL, thr_insert, (void *)parm);
}

稍后：

for ( i = 0; i < NUM_THREADS; i++)
    pthread_join(tid[i], NULL);

然而，真正的工作需要在thr_rowcrunch（void *parm）中完成，该函数接收行数据，然后每个线程只需知道自己的线程号。然而，在分派的线程中所做的实质性工作，我只能猜测。

只是想在这里提供帮助，不确定这是否清楚明白。

- paul lanken

你能分享一下矩阵乘法的实际输入数据吗？即使只是出于个人兴趣，我也对这个问题很感兴趣。我想设计一个整洁的线程解决方案，这是一个完美的方式来享受咖啡和编写代码的过程。 - paul lanken

实际输入数据是可变的。为了完成任务，我们需要进行一些文件I/O操作来读取矩阵，然后输出结果。这与我的问题并不相关。由于代码不应该依赖于矩阵，我一直在使用我在原始帖子中提供的示例来尝试使其工作。 - Kinru

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- John Bollinger · Accepted Answer

你需要存储 M * K * N 个逐元素乘积。想法可能是所有线程都会并行运行，或者至少能够这样做，因此每个线程都需要自己的适当类型的独立存储位置。一种简单的方法是创建一个具有那么多元素的数组...但是该数组的元素类型是什么？

每个线程不仅需要知道要存储其结果的位置，还需要知道要执行哪个乘法。所有这些信息都需要通过类型为 void * 的单个参数传递。通常，您需要创建一个适合保存一个线程所需的所有数据的结构类型，为每个线程创建该结构类型的实例，并传递指向这些结构的指针。那么听起来您需要一个结构体数组。

细节可以通过各种方式处理，但我认为最自然的方法是为两个因子分别设置结构成员，并设置一个成员来存储乘积。然后，我会让主线程声明一个这样的结构体的三维数组（如果所需总数较小），否则动态分配一个。例如，

struct multiplication {
    // written by the main thread; read by the compute thread:
    int factor1;
    int factor2;

    // written by the compute thread; read by the main thread:
    int product;
} partial_result[M][K][N];

如何编写相关代码，留下的练习就是为此而设。