为什么 Go 的套接字比 C++ 的套接字慢?

4
我对Go和C++进行了一个简单的socket ping pong测试基准。客户端首先向服务器发送0。服务器将收到的任何数字加1并将其发送回客户端。客户端将数字回显到服务器,并在数字达到100万时停止。
由于客户端和服务器都在同一台计算机上,因此我在两种情况下都使用Unix套接字。(我还尝试过同一主机的TCP套接字,结果显示类似)。
Go测试需要14秒,而C++测试需要8秒。这让我很惊讶,因为我已经运行了相当数量的Go vs. C++基准测试,并且一般来说,只要我不触发垃圾收集器,Go就与C++一样具有高性能。
虽然评论者也报告说在Linux上Go版本较慢,但我使用的是Mac。
想知道我是否错过了优化Go程序的方法,或者底层存在低效率。
以下是我运行测试的命令以及测试结果。所有代码文件都粘贴在本问题的底部。
运行Go服务器:
$ rm /tmp/go.sock
$ go run socketUnixServer.go

运行 Go 客户端:

$ go build socketUnixClient.go; time ./socketUnixClient

real    0m14.101s
user    0m5.242s
sys     0m7.883s

运行C++服务器:
$ rm /tmp/cpp.sock
$ clang++ -std=c++11 tcpServerIncUnix.cpp -O3; ./a.out

运行 C++ 客户端:

$ clang++ -std=c++11 tcpClientIncUnix.cpp -O3; time ./a.out

real    0m8.690s
user    0m0.835s
sys     0m3.800s

代码文件

Go 服务器:

// socketUnixServer.go

package main

import (
    "log"
    "net"
    "encoding/binary"
)

func main() {
    ln, err := net.Listen("unix", "/tmp/go.sock")
    if err != nil {
        log.Fatal("Listen error: ", err)
    }

    c, err := ln.Accept()
    if err != nil {
        panic(err)
    }
    log.Println("Connected with client!")

    readbuf := make([]byte, 4)
    writebuf := make([]byte, 4)
    for {
        c.Read(readbuf)
        clientNum := binary.BigEndian.Uint32(readbuf)
        binary.BigEndian.PutUint32(writebuf, clientNum+1)

        c.Write(writebuf)
    }
}

Go客户端:

// socketUnixClient.go

package main

import (
    "log"
    "net"
    "encoding/binary"
)

const N = 1000000

func main() {
    c, err := net.Dial("unix", "/tmp/go.sock")
    if err != nil {
        log.Fatal("Dial error", err)
    }
    defer c.Close()

    readbuf := make([]byte, 4)
    writebuf := make([]byte, 4)

    var currNumber uint32 = 0
    for currNumber < N {
        binary.BigEndian.PutUint32(writebuf, currNumber)
        c.Write(writebuf)

        // Read the incremented number from server
        c.Read(readbuf[:])
        currNumber = binary.BigEndian.Uint32(readbuf)
    }
}

C++ 服务器:

// tcpServerIncUnix.cpp

// Server side C/C++ program to demonstrate Socket programming
// #include <iostream>
#include <unistd.h>
#include <stdio.h>
#include <sys/un.h>
#include <sys/socket.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <string.h>
#include <unistd.h>

// Big Endian (network order)
unsigned int fromBytes(unsigned char b[4]) {
    return b[3] | b[2]<<8 | b[1]<<16 | b[0]<<24;
}

void toBytes(unsigned int x, unsigned char (&b)[4]) {
    b[3] = x;
    b[2] = x>>8;
    b[1] = x>>16;
    b[0] = x>>24;
}

int main(int argc, char const *argv[])
{
    int server_fd, new_socket, valread;
    struct sockaddr_un saddr;
    int saddrlen = sizeof(saddr);
    unsigned char recv_buffer[4] = {0};
    unsigned char send_buffer[4] = {0};

    server_fd = socket(AF_UNIX, SOCK_STREAM, 0);

    saddr.sun_family = AF_UNIX;
    strncpy(saddr.sun_path, "/tmp/cpp.sock", sizeof(saddr.sun_path));
    saddr.sun_path[sizeof(saddr.sun_path)-1] = '\0';
    bind(server_fd, (struct sockaddr *)&saddr, sizeof(saddr));

    listen(server_fd, 3);

    // Accept one client connection
    new_socket = accept(server_fd, (struct sockaddr *)&saddr, (socklen_t*)&saddrlen);
    printf("Connected with client!\n");

    // Note: if /tmp/cpp.sock already exists, you'll get the Connected with client!
    // message before running the client. Delete this file first.

    unsigned int x = 0;

    while (true) {
        valread = read(new_socket, recv_buffer, 4);
        x = fromBytes(recv_buffer);
        toBytes(x+1, send_buffer);

        write(new_socket, send_buffer, 4);
    }
}

C++客户端:

// tcpClientIncUnix.cpp

// Server side C/C++ program to demonstrate Socket programming
// #include <iostream>
#include <unistd.h>
#include <stdio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <string.h>
#include <unistd.h>

// Big Endian (network order)
unsigned int fromBytes(unsigned char b[4]) {
    return b[3] | b[2]<<8 | b[1]<<16 | b[0]<<24;
}

void toBytes(unsigned int x, unsigned char (&b)[4]) {
    b[3] = x;
    b[2] = x>>8;
    b[1] = x>>16;
    b[0] = x>>24;
}

int main(int argc, char const *argv[])
{
    int sock, valread;
    struct sockaddr_un saddr;
    int opt = 1;
    int saddrlen = sizeof(saddr);

    // We'll be passing uint32's back and forth
    unsigned char recv_buffer[4] = {0};
    unsigned char send_buffer[4] = {0};

    sock = socket(AF_UNIX, SOCK_STREAM, 0);

    saddr.sun_family = AF_UNIX;
    strncpy(saddr.sun_path, "/tmp/cpp.sock", sizeof(saddr.sun_path));
    saddr.sun_path[sizeof(saddr.sun_path)-1] = '\0';

    // Accept one client connection
    if (connect(sock, (struct sockaddr *)&saddr, sizeof(saddr)) != 0) {
        throw("connect failed");
    }

    int n = 1000000;

    unsigned int currNumber = 0;
    while (currNumber < n) {
        toBytes(currNumber, send_buffer);
        write(sock, send_buffer, 4);

        // Read the incremented number from server
        valread = read(sock, recv_buffer, 4);
        currNumber = fromBytes(recv_buffer);
    }
}

2
我的结果:go version devel +90dca98d33 Thu Dec 20 22:11:45 2018 +0000 linux/amd64 time ./client real 0m7.653s user 0m1.803s sys 0m5.150s - peterSO
2
go version go1.10.4 linux/amd64; time go client real 0m7.242s, clang version 3.8.0-2ubuntu4; time c++ client real 0m4.942s ... 没有那么大的差异。 - tink
1
我的结果:g++ (Ubuntu 8.2.0-7ubuntu1) 8.2.0 time ./a.out real 0m6.013s user 0m0.253s sys 0m4.564s - peterSO
4
这样比较两种语言几乎没有意义。功能可能相同,但实现差异会扭曲结果。这类似于“我的数据库比竞争对手x、y和z快x倍”,虽然在某些工作负载下可能是真的,但它无法告诉你有关现实生产力、稳定性、可维护性和易用性的信息。 - RickyA
1
这是我的工作站,戴尔Optiplex,i7-6700 @ 3.4GHz,16GB。@rampatowl - tink
显示剩余5条评论
1个回答

7
首先,我确认这个问题中的Go程序比C++程序运行速度明显慢。我认为了解原因确实很有趣。
我使用pprof对Go客户端和服务器进行了分析,并发现syscall.Syscall占总执行时间的70%。根据此this票据,在Go中,系统调用大约比C慢1.4倍。
(pprof) top -cum
Showing nodes accounting for 18.78s, 67.97% of 27.63s total
Dropped 44 nodes (cum <= 0.14s)
Showing top 10 nodes out of 44
  flat  flat%   sum%        cum   cum%
 0.11s   0.4%   0.4%     22.65s 81.98%  main.main
     0     0%   0.4%     22.65s 81.98%  runtime.main
18.14s 65.65% 66.05%     19.91s 72.06%  syscall.Syscall
 0.03s  0.11% 66.16%     12.91s 46.72%  net.(*conn).Read
 0.10s  0.36% 66.52%     12.88s 46.62%  net.(*netFD).Read
 0.16s  0.58% 67.10%     12.78s 46.25%  internal/poll.(*FD).Read
 0.06s  0.22% 67.32%     11.87s 42.96%  syscall.Read
 0.11s   0.4% 67.72%     11.81s 42.74%  syscall.read
 0.02s 0.072% 67.79%      9.30s 33.66%  net.(*conn).Write
 0.05s  0.18% 67.97%      9.28s 33.59%  net.(*netFD).Write

我逐渐减少了Conn.WriteConn.Read的调用次数,并相应地增加了缓冲区的大小,以使传输字节数保持不变。结果是,这些调用次数越少,程序的性能就越接近C++版本。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接