CUDA：在头文件中使用的 device 函数出现 LNK2005 错误

Question

CUDA：在头文件中使用的 device 函数出现 LNK2005 错误

8

我有一个设备函数，定义在头文件中。它在头文件中的原因是因为它被一个全局内核使用，该内核需要在头文件中，因为它是一个模板内核。

当这个头文件被包含在2个或更多的.cu文件中时，在链接过程中会出现LNK2005错误：

FooDevice.cu.obj : error LNK2005: "int __cdecl getCurThreadIdx(void)" (?getCurThreadIdx@@YAHXZ) already defined in Main.cu.obj

这个错误是什么原因导致的？如何解决？

以下是产生上述错误的示例代码：

FooDevice.h:

#ifndef FOO_DEVICE_H
#define FOO_DEVICE_H

__device__ int getCurThreadIdx()
{
    return ( ( blockIdx.x * blockDim.x ) + threadIdx.x );
}

template< typename T >
__global__ void fooKernel( const T* inArr, int num, T* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

__global__ void fooKernel2( const int* inArr, int num, int* outArr );

#endif // FOO_DEVICE_H

FooDevice.cu:

#include "FooDevice.h"

// One other kernel that uses getCurThreadIdx()
__global__ void fooKernel2( const int* inArr, int num, int* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

Main.cu:

#include "FooDevice.h"

int main()
{
    int num             = 10;
    int* dInArr         = NULL;
    int* dOutArr        = NULL;
    const int arrSize   = num * sizeof( *dInArr );

    cudaMalloc( &dInArr, arrSize );
    cudaMalloc( &dOutArr, arrSize );

    // Using template kernel
    fooKernel<<< 10, 10 >>>( dInArr, num, dOutArr );

    return 0;
}

- Ashwin Nanjappa

2个回答

0

这个应该被内联。你可以尝试添加inline关键字。

也许你可以删除不必要的代码，并创建一个简单的文本示例供我们查看？通常问题在于细节...

- CygnusX1

CygnusX1：我已经在上面的问题中添加了示例代码。内联或__forceinline__限定符似乎都可以工作。但是，它们为什么能够工作呢？为什么两者都能解决这个问题？一个是C/C++限定符，另一个是CUDA限定符！ - Ashwin Nanjappa

1

CUDA支持__inline和__forceinline（请查看host_defines.h）。它们之所以有效，是因为将getCurThreadIdx()内联到调用方法中，因此每次包含头文件时都不会定义getCurThreadIdx()，从而破坏ODR。 - Ade Miller

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ade Miller · Accepted Answer

为什么会出现这个错误？

因为您在 FooDevice.cu 和 Main.cu 中都包含了头文件，其中定义了同一个函数，因此链接器检测到您现在有两个副本。

如何修复它？

如果您在 foo.h 中定义了以下内容：

template<typename T> __device__ T foo(T x)
{
    return x;
}

还有两个.cu文件，它们都包含foo.h并且都调用了它，例如：

int x = foo<int>(1);

然后你可以强制将foo()内联：

template<typename T>
inline __device__ T foo(T x)
{
    return x;
}

并且调用：

int x = foo<int>(1);

这将防止它被多次声明。

函数模板是一种例外，不受单一定义规则的限制，因此在不同的翻译单位中可能会有多个定义。完整的函数模板特化不是模板，而是普通的函数，所以如果你想把它们放在一个包含在多个翻译单位中的头文件中，就需要使用内联关键字来遵守单一定义规则。

引用自http://www.velocityreviews.com/forums/t447911-why-does-explicit-specialization-of-function-templates-cause-generation-of-code.html

另请参见：http://en.wikipedia.org/wiki/One_Definition_Rule

我已将您的代码更改如下：

inline __device__ int getCurThreadIdx()
{
    return ( ( blockIdx.x * blockDim.x ) + threadIdx.x );
}

template< typename T >
__global__ void fooKernel( const T* inArr, int num, T* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

现在代码可以被编译了。你没有将 "getCurThreadIdx()" 函数声明为内联形式，这违反了一个定义规则。

CUDA：在头文件中使用的 __device__ 函数出现 LNK2005 错误

CUDA：在头文件中使用的 device 函数出现 LNK2005 错误