从四个16位值中构建64位整数的最快方法是什么？

Question

从四个16位值中构建64位整数的最快方法是什么？

4

基本上我正在尝试从坐标中构建一个独特的64位ID，然后稍后可以将其拆分。这些操作将在短时间内执行数十亿次，因此速度至关重要。以下是我想要的：我有4个32位整数，但只有底部16位相关。我想将底部16位连接成一个64位的“长整型”（由于位相同，因此是否带符号并不重要）。所以如果我有：

largeId = 0000 0000 0000 0000   0000 0000 1000 1000
x       = 0000 0000 0000 0000   0000 0000 1100 1100
y       = 0000 0000 0000 0000   0000 0000 1110 1110
z       = 0000 0000 0000 0000   0000 0000 1111 1111

它将变成：

Id = 0000 0000 1000 1000   0000 0000 1100 1100   0000 0000 1110 1110   0000 0000 1111 1111

我已经写了几个例程来产生所需的结果（即构建和拆分），并使用500^3迭代进行计时，以尝试找到最快的例程。将64位数字解码回4个int变量的例程运行时间约为编码所需时间的43%。如何加速编码？ 例程：（根据Paul Smith的建议更新了多个变体）

        public static long GetCombinedId(int largeId, int x, int y, int z)
        {

            var _largeId = (long)largeId;
            var _x = (long)x;
            var _y = (long)y;
            var _z = (long)z;

            return (_largeId << 48) | (_x << 32) | (_y << 16) | _z;

        }

        public static long GetCombinedId2(int largeId, int x, int y, int z)
        {
            return ((long)largeId << 48) | ((long)x << 32) | ((long)y << 16) | (long)z;
        }

        public static long GetCombinedId3(int largeId, int x, int y, int z)
        {
            unchecked
            {
                return ((long)(largeId << 16 | x) << 32) | (y << 16 | z );
            }

        }



        public static void GetCoordinates(long id, out int largeId, out int x, out int y, out int z)
        {

            largeId = (int)(id >> 48);

            x = (int)((id >> 32) & 0x0000_0000_0000_FFFF);
            y = (int)((id >> 16) & 0x0000_0000_0000_FFFF);
            z = (int)(id & 0x0000_0000_0000_FFFF);


        }

        public static void GetCoordinates2(long id, out int largeId, out int x, out int y, out int z)
        {

            largeId = (int)(id >> 48);

            x = (int)((id << 16 ) >> 48);
            y = (int)((id << 32 ) >> 48);
            z = (int)((id << 48 ) >> 48);

        }

保罗·史密斯技术的变体见答案部分描述

  [StructLayout(LayoutKind.Explicit)]
        public struct Mapper
        {
            [FieldOffset(0)] public Int64 Combined;
            [FieldOffset(0)] public Int16 Short0;
            [FieldOffset(2)] public Int16 Short1;
            [FieldOffset(4)] public Int16 Short2;
            [FieldOffset(6)] public Int16 Short3;
        }

        public static long GetId4(int largeId, int x, int y, int z)
        {

            Mapper mapper = new Mapper()
            {
                Short0 = (Int16)z,
                Short1 = (Int16)y,
                Short2 = (Int16)x,
                Short3 = (Int16)largeId
            };

            return mapper.Combined;

        }

        private static Mapper _mapper = new Mapper();
        public static long GetId5(int largeId, int x, int y, int z)
        {

            _mapper.Short0 = (Int16)z;
            _mapper.Short1 = (Int16)y;
            _mapper.Short2 = (Int16)x;
            _mapper.Short3 = (Int16)largeId;

            return _mapper.Combined;
        }

        [StructLayout(LayoutKind.Explicit)]
        public struct Mapper2
        {
            [FieldOffset(0)] public Int64 Combined;
            [FieldOffset(0)] public Int32 Integer0;
            [FieldOffset(4)] public Int32 Integer1;
        }

        private static Mapper2 _mapper2 = new Mapper2();
        public static long GetId6(int largeId, int x, int y, int z)
        {


            _mapper2.Integer0 = y << 16 | z;   //dangerous because we aren't checking upper bits of z
            _mapper2.Integer1 = largeId << 16 | x; //dangerous because we aren't checking upper bits of x


            return _mapper2.Combined;
        }

结果：

GetId1 = 2168ms
GetId2 = 1824ms
GetId3 = 1679ms
GetId4 = 2217ms
GetId5 = 2008ms
GetId6 = 1757ms
GetCoord1 = 785ms
GetCoord2 = 865ms
Routine1: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
Routine2: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
Routine3: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
Routine4: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
Routine5: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
Routine6: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
255, 170, 187, 204
255, 170, 187, 204

有没有更好/更快的方法将这4个整数编码成64位长整型？

（FYI… BitConverter类速度非常慢，且由于不可行而已被移除）

- Mike S.

1

在C/C++中，我会欺骗并将 int64 Id64 的地址转换为 short x[4]，以便我可以复制它而不是移位。或者只需使用memcpy。这可能比移位快或慢取决于情况。 - Dave S

如果您正在使用 .NET Core 3.0，请查看是否可以使用 Intrinsic 函数来优化性能。 - Tanveer Badar

1

@DaveS 下面的 Paul Smith 的答案符合这种技术。我尝试了一下，在几个版本之后，它几乎和我的第三个例程相当了。但还不完全符合要求。我已经根据他的建议更新了我的问题。感谢您的建议。 - Mike S.

@TanveerBadar 感谢您的建议。我之前没有听说过内置函数。不幸的是，我无法控制将运行此程序的硬件，因此我认为我不能依赖直接访问硬件。但还是非常感谢您的建议。 - Mike S.

也许你应该从不同的角度来看待这个问题。如果你需要一个64位的ID，并且快速地分离坐标，那么为什么不使用一个包含一个64位ID和4个坐标的结构体呢？ - Optional Option

显示剩余3条评论

2个回答

-2

我没有计时，但这似乎避免了移位和掩码。

[StructLayout(LayoutKind.Explicit)]
public struct Mapper
{
   [FieldOffset(0)] public UInt64 Combined;
   [FieldOffset(1)] public UInt16 Short0;
   [FieldOffset(2)] public UInt16 Short1;
   [FieldOffset(3)] public UInt16 Short2;
   [FieldOffset(4)] public UInt16 Short3;
}

创建一个 Mapper，然后分配各种 Shortx 值。读取组合值。

var test = new Mapper();
test.Short0 = 1;
test.Short1 = 16;
test.Short2 = 256;
test.Short3 = 4096;

然后test.Combined将是64位连接。

- Paul Smith

6

Short0 必须从 0 开始。同时请记住，它们的大小为两个字节，因此偏移量应增加 2。 - Domi

我对这个解决方案非常有希望，因为它看起来很流畅。然而，表结构的分配并没有完全发挥其作用，这损害了你的确切方法。然后我在类中将该结构设为静态，速度更快。之后，我将位移与该方法相结合，并且速度更快。但是，我的第三种方法仍然略微优于使用字段偏移量的最佳版本。尽管如此，我仍将根据您的解决方案和结果更新我的答案。因此，感谢您为我提供了其他需要检查的内容！ - Mike S.

由于每个UInt16值占用2个字节，因此FieldOffset的值不需要分别为0,0,2,4,6。当我使用这段代码时，结构成员的值为4097、16、0、4096。 - Chris Dunaway

相关文档（请参见第二个示例）：如何使用属性创建C/C++联合（C#）。此外，正如其他评论中所述，使用当前偏移量，Combined和Shortx字段将不会占用相同的64位。 - Lance U. Matthews

不确定为什么大家都在踩这个评论。虽然它有一些漏洞，但它并不是一个坏主意。它只是在性能方面表现不够理想。 - Mike S.

显示剩余2条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mike S. · Accepted Answer

5天后，当我洗澡时，脑海中突然冒出一个问题……如果将返回值从局部堆栈移动到调用过程堆栈中会消耗时间怎么办？事实证明……确实如此。

下面的新方法采用了上述最快方法（第3种方法），而不是返回变量（会导致“堆栈到堆栈”的复制），而是将返回值作为“out”引用传递。这使得计算可以直接在调用过程中的结果变量中进行。

这样做……我现在可以比解码更快，这一直是目标。下面是新的例程和速度比较。

        public static void GetId7(int largeId, int x, int y, int z, out long id)
        {

            id = ((long)(largeId << 16 | x) << 32) | (y << 16 | z);

        }

速度比较。 GetId7 显示了新的结果：

GetId1 = 2282ms
GetId2 = 1910ms
GetId3 = 1782ms
GetId4 = 2306ms
GetId5 = 2092ms
GetId6 = 1816ms
GetId7 = 831ms
GetCoord1 = 828ms
GetCoord2 = 930ms
Routine1: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
Routine2: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
Routine3: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
Routine4: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
Routine5: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
Routine6: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
Routine7: 71776849217913036   binary: 11111111000000001010101000000000101110110000000011001100
255, 170, 187, 204
255, 170, 187, 204

虽然不是必须的，但我很好奇是否有人能更快地完成它。