Python + C比纯C略微快一些。

Question

Python + C比纯C略微快一些。

4

我一直在不同的语言和实现中实施相同的代码（在二十一点游戏中发牌而不爆牌的方式数量）。我注意到一个奇怪的事情是，Python调用C中的partitions函数的实现实际上比整个用C编写的程序略快。对于其他语言也是如此（Ada vs Python calling Ada, Nim vs Python calling Nim）。这似乎与我的直觉相反 - 你有什么想法吗？

所有代码都在我的GitHub存储库中：

https://github.com/octonion/puzzles/tree/master/blackjack

这是C代码，使用'gcc -O3 outcomes.c'编译。

#include <stdio.h>

int partitions(int cards[10], int subtotal)
{
    //writeln(cards,subtotal);
    int m = 0;
    int total;
    // Hit
    for (int i = 0; i < 10; i++)
    {
        if (cards[i] > 0)
        {
            total = subtotal + i + 1;
            if (total < 21)
            {
                // Stand
                m += 1;
                // Hit again
                cards[i] -= 1;
                m += partitions(cards, total);
                cards[i] += 1;
            }
            else if (total == 21)
            {
                // Stand; hit again is an automatic bust
                m += 1;
            }
        }
    }
    return m;
}

int main(void)
{
    int deck[] =
    { 4, 4, 4, 4, 4, 4, 4, 4, 4, 16 };
    int d = 0;

    for (int i = 0; i < 10; i++)
    {
        // Dealer showing
        deck[i] -= 1;
        int p = 0;
        for (int j = 0; j < 10; j++)
        {
            deck[j] -= 1;
            int n = partitions(deck, j + 1);
            deck[j] += 1;
            p += n;
        }

        printf("Dealer showing %i partitions = %i\n", i, p);
        d += p;
        deck[i] += 1;
    }
    printf("Total partitions = %i\n", d);
    return 0;
}

这是一个C函数，使用'gcc -O3 -fPIC -shared -o libpartitions.so partitions.c'编译。

int partitions(int cards[10], int subtotal)
{
    int m = 0;
    int total;
    // Hit
    for (int i = 0; i < 10; i++)
    {
        if (cards[i] > 0)
        {
            total = subtotal + i + 1;
            if (total < 21)
            {
                cards[i] -= 1;
                // Stand
                m += 1;
                // Hit again
                m += partitions(cards, total);
                cards[i] += 1;
            }
            else if (total == 21)
            {
                // Stand; hit again is an automatic bust
                m += 1;
            }
        }
    }
    return m;
}

这里是C函数的Python包装器：

#!/usr/bin/env python

from ctypes import *
import os

test_lib = cdll.LoadLibrary(os.path.abspath("libpartitions.so"))
test_lib.partitions.argtypes = [POINTER(c_int), c_int]
test_lib.partitions.restype = c_int

deck = ([4]*9)
deck.append(16)

d = 0

for i in xrange(10):
    # Dealer showing
    deck[i] -= 1
    p = 0
    for j in xrange(10):
        deck[j] -= 1
        nums_arr = (c_int*len(deck))(*deck)
        n = test_lib.partitions(nums_arr, c_int(j+1))
        deck[j] += 1
        p += n
    print('Dealer showing ', i,' partitions =',p)
    d += p
    deck[i] += 1

print('Total partitions =',d)

- Christopher D. Long

8

如果删除所有打印功能，速度是否会更快？ - 101

5

你是如何计时的？ - abarnert

另外，你有比较过实际等效的C代码吗？它使用dlopen和dlsym调用函数指针，而不是正常调用。我无法想象为什么这样会更快，但我也无法想象为什么Python会更快，这至少可以缩小我们的想象空间。 :) - abarnert

谢谢！正如我所提到的，使用 dlopen 和 dlsym 调用 C 语言编写的程序比独立的 C 语言程序运行速度更快。这是我的代码。 - Christopher D. Long

每个优化器都有一些边角情况，偶尔会变得悲观，我猜你在gcc 5中找到了一个。我想知道使用更低的-O设置编译是否会使静态版本更快？对于一个不那么愚蠢的解决方案，如果您对静态版本进行PGO并运行几次呢？ - abarnert

显示剩余2条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- hgminh · Accepted Answer

我认为这里的原因在于GCC如何编译函数partitions的两种情况。您可以使用objdump比较outcomes二进制可执行文件和libpartitions.so中的汇编代码以查看差异。请保留HTML标签。

objdump -d -M intel <file name>

构建共享库时，GCC不知道如何调用“partitions”。然而，在C程序中，GCC确切地知道何时调用“partitions”（在本例中，这会导致更差的性能）。这种上下文的差异使得GCC进行了不同的优化。

您可以尝试使用不同的编译器来比较结果。我已经使用过GCC 5.4和Clang 6.0进行了检查。在GCC 5.4下，Python脚本运行速度更快，而在Clang下，C程序运行速度更快。