C信号量和线程之间的“屏障”

Question

C信号量和线程之间的“屏障”

3

我正在尝试解决我们操作系统教授在上次考试中给我们展示的问题，以便为下一次考试做准备。

这个问题是要有两个线程同时执行，并可能在不同的时间内完成。当一个特定的线程完成后，它需要阻塞直到另一个线程完成，然后它们才能继续执行。

这对我来说在概念上似乎很简单，但我的代码并没有按照我认为的方式工作。

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <semaphore.h>

#define N 10

sem_t t_1_sem;
sem_t t_2_sem;

void *thread(void *vargp);
/* shared by both threads*/
struct {
    int count;
} thread_count;

int main() {
    pthread_t tid, tid1;

    thread_count.count = 0;

    sem_init(&t_1_sem, 0, 1);
    sem_init(&t_2_sem, 0, 1);
    printf("Hello from main thread! tid:%ld pid:%d\n", pthread_self(), getpid());
    pthread_create(&tid, NULL, thread, NULL);
    pthread_create(&tid1, NULL, thread, NULL);

    pthread_join(tid, NULL);
    pthread_join(tid1, NULL);

    exit(0);
}

void *thread(void *vargp) {
    int i, tid;

    int val, val2;
    sem_getvalue(&t_1_sem, &val);
    sem_getvalue(&t_2_sem, &val2);
    printf("initial value::: %d : %d\n", val, val2);


    tid = thread_count.count;
    thread_count.count += 1;

    for(i = 0;i<N;i++){
        printf("%d, %d\n", tid, i);
        fflush(stdout);
        //sleep(0.1);
    }

    // TODO
    // barrier
    sem_getvalue(&t_1_sem, &val);
    sem_getvalue(&t_2_sem, &val2);
    printf("second value::: %d : %d\n", val, val2);
    int sem_val;
    if(tid == 0){
        // free other
        sem_getvalue(&t_1_sem, &sem_val);
        printf("posting to 2, waiting on 1 w/ %d count\n", sem_val);
        sem_post(&t_2_sem);
        // wait on this one
        sem_wait(&t_1_sem);
        printf("done waiting on 1\n");
    } else if(tid == 1){
        sem_getvalue(&t_2_sem, &sem_val);
        printf("posting to 1, waiting on 2 w/ %d count\n", sem_val);
        sem_post(&t_1_sem);
        sem_wait(&t_2_sem);
        printf("done waiting on 2\n");
    }

    sem_getvalue(&t_1_sem, &val);
    sem_getvalue(&t_2_sem, &val2);
    printf("final value::: %d : %d\n", val, val2);
    return NULL;
}

我希望看到的是两个线程都数到十，然后两个“最终值”printf相邻地发生。然而，我看到的是在线程完成计数到10之后立即发生“最终值”打印 - 它似乎没有等待。

此外，在“posting to N”printf中，我打印出了非常奇怪的sem_val整数值，例如：

Hello from main thread! tid:-1606277344 pid:5479
initial value::: 0 : 0
0, 0
initial value::: 0 : 0
1, 0
0, 1
1, 1
0, 2
1, 2
1, 3
1, 4
1, 5
0, 3
1, 6
0, 4
1, 7
0, 5
1, 8
0, 6
1, 9
0, 7
second value::: 0 : 0
posting to 1, waiting on 2 w/ -1809628646 count
0, 8
done waiting on 2
final value::: 0 : 0
0, 9
second value::: 0 : 0
posting to 2, waiting on 1 w/ -1809628646 count
done waiting on 1
final value::: 0 : 0

有什么想法/提示吗？

- ashgromnies

7个回答

6

在OSX上，sem_getvalue()并未实现。 http://discussions.apple.com/thread.jspa?messageID=9404131&tstart=0

- ghoti

哇...这解释了我在使用OS X时sem_getvalue函数的困惑。 - ashgromnies

1

这是你想要的吗？

我不确定我是否理解了你的想法，但我怀疑你可能误用了信号量。以下是生成上述现象的代码。希望能对你的问题有所帮助。

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <semaphore.h>

#define N 10

sem_t t_1_sem;
sem_t t_2_sem;

void *thread_1(void *vargp);
void *thread_2(void *vargp);
/* shared by both threads*/
struct {
    int count;
} thread_count;

 int main() {
pthread_t tid, tid1;

thread_count.count = 0;

sem_init(&t_1_sem, 0, 1);
sem_init(&t_2_sem, 0, 1);
printf("Hello from main thread! tid:%ld pid:%d\n", pthread_self(), getpid());
pthread_create(&tid, NULL, thread_1, NULL);
pthread_create(&tid1, NULL, thread_2, NULL);

pthread_join(tid, NULL);
pthread_join(tid1, NULL);

exit(0);
}

void *thread_1(void *vargp) {
int i, tid;

int val, val2;
sem_getvalue(&t_1_sem, &val);
printf("enter thread_1\n");
sem_wait(&t_1_sem);
//even thread_1 is sleeping , thread_2 will not be scheduled to run
//as thread_1 is holding the semphore
sleep(1);
tid = thread_count.count;
thread_count.count += 1;

for(i = 0;i<N;i++){
    printf("%d, %d\n", tid, i);
    fflush(stdout);
    //sleep(0.1);
}

    sem_post(&t_1_sem);
    return NULL;
}


void *thread_2(void *vargp) {
int i, tid;

int val, val2;
sem_getvalue(&t_1_sem, &val);

printf("enter thread_2\n");
sem_wait(&t_1_sem);
tid = thread_count.count;
thread_count.count += 1;

for(i = 0;i<N;i++){
    printf("%d, %d\n", tid, i);
    fflush(stdout);
    //sleep(0.1);
}

    sem_post(&t_1_sem);
    return NULL;
}

- pierrotlefou

有点。为什么您在使用两个不同的线程函数？这两个函数实际上是相同的。当我尝试用一个函数替换这两个函数时，我得到的行为是不可预测的，而不是您展示的行为。我为什么不能用一个函数代替这两个函数？等待不应该仍然有效吗？ - ashgromnies

你可以只使用一个函数，但是两个函数会使事情更明确：你有两个任务在运行，需要同步这两个任务。在现实世界中，通常有两个任务在做不同的事情。 - pierrotlefou

1

我刚刚使用互斥锁和条件变量成功地处理了类似的问题。

信号量是一种通用的线程或进程间消息传递机制，你可以通过一些困难来进行线程协调。而互斥锁是专门针对线程协调的二元信号量，提供了简化的API。

因此，如果你不受限于使用信号量，你可能会考虑使用互斥锁来更轻松地完成工作。

无论你选择哪种线程协调方法，都要使用搜索引擎找到解决类似问题的代码示例，并确保你理解它们。例如，“pthread semaphore example”和“pthread mutex example”都有很多有趣的结果。

一个重要的提示：确保你实际上需要两个信号量。一个信号量或互斥锁可以控制任意数量的线程。

作为更一般的教学性评论，不针对你的具体例子，线程是那些KISS概念真正适用的地方之一。很容易变得太花哨并让自己陷入混乱。

- Bob Murphy

0

这就是你想要的：

i'm 0 and waiting for 1 - starting
i'm 1 and waiting for 0 - starting
i'm 1, 0 arrived, lets go
i'm 0, 1 arrived, lets go
i'm 1 and waiting for 0 - stopping
i'm 0 and waiting for 1 - stopping
i'm 0, 1 stopped, go home now
i'm 1, 0 stopped, go home now

并且找到了正确的代码，自己发现了原始代码的错误。

#define SEM_INIT_V      0

static sem_t t_0_sem;
static sem_t t_1_sem;

void *thread(void *vargp);
/* shared by both threads*/
struct {
        int count;
} thread_count = {};

int main() {
        pthread_t tid0, tid1;

        thread_count.count = 0;

        sem_init(&t_0_sem, 0, SEM_INIT_V);
        sem_init(&t_1_sem, 0, SEM_INIT_V);
        pthread_create(&tid0, NULL, thread, &thread_count.count);
        thread_count.count++;
        pthread_create(&tid1, NULL, thread, &thread_count.count);

        pthread_join(tid0, NULL);
        pthread_join(tid1, NULL);

        return 0;
}

void *thread(void *vargp) {
        int tid = *(int*)vargp;

        //await to sync 0 & 1
        if (0 == tid) {
                puts("i'm 0 and waiting for 1 - starting");
                sem_post(&t_1_sem);
                sem_wait(&t_0_sem);
                puts("i'm 0, 1 arrived, lets go");
                sleep(8);
        } else {
                puts("i'm 1 and waiting for 0 - starting");
                sem_post(&t_0_sem);
                sem_wait(&t_1_sem);
                puts("i'm 1, 0 arrived, lets go");
                sleep(3);
        }

        if(tid == 0){
                puts("i'm 0 and waiting for 1 - stopping");
                sem_post(&t_1_sem);
                sem_wait(&t_0_sem);
                puts("i'm 0, 1 stopped, go home now");
        } else if(tid == 1){
                puts("i'm 1 and waiting for 0 - stopping");
                sem_post(&t_0_sem);
                sem_wait(&t_1_sem);
                puts("i'm 1, 0 stopped, go home now");
        }

        return NULL;
}

- Test

1

这个例子有问题，不能正常工作。看一下线程函数及其两个if-else块。 - fpmurphy

你正在使用哪个编译器？我已经在FC9上运行它了。有两个if-else，一个用于启动，另一个用于停止。你遇到了什么错误？你能聪明点吗？ - Test

有没有人能告诉我为什么stackoverflow上的人这么懒？初学者，请自己动手做点什么。让我失望了。 - Test

1

例如，如果助手在他们的代码中没有给您包含头文件，请自行完成它们。 - Test

0

您正在使用初始值为1初始化信号量，因此等待将立即成功（操作结束时的发布将简单地将值增加到2）。

sem_init(&t_1_sem, 0, 1);
sem_init(&t_2_sem, 0, 1);

第二个问题是你以一种非线程安全的方式生成tid变量。即使在这种情况下没有发生，两个线程也可能都以零值结束。

- Komat

0

这并不完全回答你的问题，但你可能想要从线程中提取屏障 - 如果是C＃，你会将其作为对象，而我几乎没有做过任何C，但你可能会有一个结构体和一些函数。在这种情况下，这可能有很大帮助，但它可以避免每次需要时都从头开始编写屏障。此外，如果您首先实现信号量，然后可以使用信号量编写屏障，这简化了事情。（我目前正在进行.NET并发编程课程，其中之一是编写一组并发实用程序 - 信号量、互斥锁、屏障、约会、通道等，然后我们可以将其插入到其他程序中。）

至于屏障，正如詹姆斯已经提到的那样，《信号量小书》描述了正确的实现方法。

- David Johnstone

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- James McNellis · Accepted Answer

7

您可能需要参考《信号量小书》第3.5节，该节介绍了屏障模式及其正确实现方式。虽然这并没有直接回答您的问题，但它应该能指引您朝着正确的方向前进。请参考The Little Book of Semaphores。

- James McNellis

谢谢。《信号量小书》确实很好地解释了伪代码背后的理论，但我认为我卡在了具体的C实现细节上——比如从信号量计数中返回的-1809628646。TLBoS只展示了使用互斥锁结构体的C实现，而不是我们教授希望我们使用的sem_t。 - ashgromnies

此外，我知道我的实现与TLBoS中描述的模式不同。我会尝试重新实现它，但现在我的问题是针对我正在尝试的实现具体而言的。 - ashgromnies

“mutex”结构体只是您的实现所使用的任何东西的替代品；该书与语言无关。通常，所有同步原语，无论实现方式如何，都支持相同的基本操作集。在POSIX中，您需要查看pthread_mutex_t以获取互斥锁和sem_t以获取信号量（请注意，互斥锁也可以使用sem_t实现）。我脑海中没有-1809628646的含义。 - James McNellis

另外还有一个注释-问题的说明中写道："您的任务是仅使用二进制信号量编写Barrier()函数。您不得使用除信号量以外的共享变量。一定要显示信号量的初始值。为了简化问题，我们假设系统中有N个线程。线程ID是从0到N-1的整数。"这意味着我们不能使用TLBoS中的解决方案，因为它使用了共享计数器（我猜我的解决方案也是这样）。 - ashgromnies

在 POSIX 中，互斥锁（mutex）和信号量（semaphore）的区别是：锁定 mutex 的进程必须解锁 mutex，而任何进程都可以发布（解锁）semaphore。请就此留言。 - pierrotlefou

@pierr - 这是互斥锁和信号量之间的概念差异。前者是必须由同一“所有者”获取和释放的锁，而后者是可等待计数器。 - D.Shawley