快速将字节数组转换为音频数据的短整型数组

Question

快速将字节数组转换为音频数据的短整型数组

5

我需要最快的方法将字节数组转换为短音频数据数组。

音频数据字节数组包含来自两个音频通道的数据，排列方式如下：

C1C1C2C2 C1C1C2C2 C1C1C2C2 ...

where

C1C1 - two bytes of first channel

C2C2 - two bytes of second channel

目前我正在使用这种算法，但我感觉有更好的方法来执行这个任务。

byte[] rawData = //from audio device
short[] shorts = new short[rawData.Length / 2];
short[] channel1 = new short[rawData.Length / 4];
short[] channel2 = new short[rawData.Length / 4];
System.Buffer.BlockCopy(rawData, 0, shorts, 0, rawData.Length);
for (int i = 0, j = 0; i < shorts.Length; i+=2, ++j)
{
    channel1[j] = shorts[i];
    channel2[j] = shorts[i+1];
}

- anth

1

我要补充一点，如果你喜欢冒险，可以使用这里的“struct”技巧：https://dev59.com/SXRB5IYBdhLWcg3wcm6d来跳过BlockCopy。 - xanatos

1

@anth 为什么你需要最快的方法？这段代码已经可以在许多情况下实现实时速度了。 - CodesInChaos

@CodeInChaos 我不是在“建议”它。他要求提供选项，我给了他一些选项。我一直相信给人们很多的选择并让他们选择最差的选项 :-) - xanatos

1

如果性能非常重要，我会认真考虑重用所有缓冲区，特别是它们足够大以落在LOH上（我认为这发生在> 85kB，但这是一个实现细节）。 - CodesInChaos

1

你还应该考虑字节序问题。这里发布的一些代码假设本机字节序，而其他代码则假设固定字节序。哪一个是正确的取决于你输入数据的字节序。 - CodesInChaos

显示剩余5条评论

3个回答

3

如果你的数据量很大，最好使用标准托管代码和TPL，而不是使用不安全的代码来避免数组寻址或位移。但是如PVitt所说，在新的个人电脑上，你可以使用不安全的代码。

short[] channel1 = new short[rawData.Length / 4];
short[] channel2 = new short[rawData.Length / 4];

fixed(byte* pRawData = rawData)
fixed(short* pChannel1 = channel1)
fixed(short* pChannel2 = channel2)
{
    byte* end = pRawData + rawData.Length;
    while(pRawData < end)
    {
        (*(pChannel1++)) = *((short*)pRawData);
        pRawData += sizeof(short);
        (*(pChannel2++)) = *((short*)pRawData);
        pRawData += sizeof(short);
    }
}

与所有优化问题一样，您需要仔细计时，特别关注缓冲区分配，channel1和channel2可以是静态（大）缓冲区，自动增长，并且您只能使用前n个字节。每次执行此函数，您将能够跳过2个大数组分配，并使GC工作更少（在时间重要时总是更好）。

正如CodeInChaos所指出的，字节序可能很重要，如果您的数据不在正确的字节序中，则需要进行转换，例如，假设8位原子元素，在大端和小端之间进行转换的代码将如下所示：

short[] channel1 = new short[rawData.Length / 4];
short[] channel2 = new short[rawData.Length / 4];

fixed(byte* pRawData = rawData)
fixed(byte* pChannel1 = (byte*)channel1)
fixed(byte* pChannel2 = (byte*)channel2)
{
    byte* end = pRawData + rawData.Length;
    byte* pChannel1High = pChannel1 + 1;
    byte* pChannel2High = pChannel2 + 1;

    while(pRawData < end)
    {
        *pChannel1High = *pRawData;
        pChannel1High += 2 * sizeof(short);

        *pChannel1 = *pRawData;
        pChannel1 += 2 * sizeof(short);

        *pChannel2High = *pRawData;
        pChannel2High += 2 * sizeof(short);

        *pChannel2 = *pRawData;
        pChannel2 += 2 * sizeof(short);
    }
}

我在这篇文章中没有用实际编译器编译任何代码，所以如果你发现错误，请随意编辑。

- Julien Roncaglia

在我的经验中，LOH 上的收集操作非常昂贵，因为 LOH 上的对象只会在 Gen2 GC 期间被回收。因此，减少大型分配通常可以带来很大的性能提升。 - CodesInChaos

就像我说的那样，它需要Gen2 GC。在非平凡的应用程序中，它们是昂贵的，因为它们需要爬行所有托管对象而不仅仅是新对象。另一个问题是由于Gen2收集比Gen0/1收集更少，内存使用量会大大增加。 - CodesInChaos

@CodeInChaos，也许我在这里有什么误解。虽然LOH对象报告在第二代中，但它们实际上并不像SOH中的对象那样“在”相同的管理结构中，后者被报告为第二代。它们与第二代扫描同时进行管理，但它们是完全独立的，并且完全以不同的方式进行管理。 - Tim Lloyd

@CodeInChaos，它的昂贵程度高度取决于Gen2 SOH收集的昂贵程度。如果Gen2 SOH中没有什么东西，那就一切都很顺利 :) - Tim Lloyd

+1 这是到目前为止在 x86 和 x64 上最快的解决方案。 - Tim Lloyd

显示剩余3条评论

3

您可以自行进行基准测试！记得使用“Release模式”，并在无调试情况下运行（Ctrl + F5）。

class Program
{
    [StructLayout(LayoutKind.Explicit)]
    struct UnionArray
    {
        [FieldOffset(0)]
        public byte[] Bytes;

        [FieldOffset(0)]
        public short[] Shorts;
    }

    unsafe static void Main(string[] args)
    {
        Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;

        byte[] rawData = new byte[10000000];
        new Random().NextBytes(rawData);

        Stopwatch sw1 = Stopwatch.StartNew();

        short[] shorts = new short[rawData.Length / 2];
        short[] channel1 = new short[rawData.Length / 4];
        short[] channel2 = new short[rawData.Length / 4];
        System.Buffer.BlockCopy(rawData, 0, shorts, 0, rawData.Length);
        for (int i = 0, j = 0; i < shorts.Length; i += 2, ++j)
        {
            channel1[j] = shorts[i];
            channel2[j] = shorts[i + 1];
        }

        sw1.Stop();

        Stopwatch sw2 = Stopwatch.StartNew();

        short[] channel1b = new short[rawData.Length / 4];
        short[] channel2b = new short[rawData.Length / 4];

        for (int i = 0, j = 0; i < rawData.Length; i += 4, ++j)
        {
            channel1b[j] = BitConverter.ToInt16(rawData, i);
            channel2b[j] = BitConverter.ToInt16(rawData, i + 2);
        }

        sw2.Stop();

        Stopwatch sw3 = Stopwatch.StartNew();

        short[] shortsc = new UnionArray { Bytes = rawData }.Shorts;
        short[] channel1c = new short[rawData.Length / 4];
        short[] channel2c = new short[rawData.Length / 4];

        for (int i = 0, j = 0; i < shorts.Length; i += 2, ++j)
        {
            channel1c[j] = shortsc[i];
            channel2c[j] = shortsc[i + 1];
        }

        sw3.Stop();

        Stopwatch sw4 = Stopwatch.StartNew();

        short[] channel1d = new short[rawData.Length / 4];
        short[] channel2d = new short[rawData.Length / 4];

        for (int i = 0, j = 0; i < rawData.Length; i += 4, ++j)
        {
            channel1d[j] = (short)((short)(rawData[i + 1]) << 8 | (short)rawData[i]);
            channel2d[j] = (short)((short)(rawData[i + 3]) << 8 | (short)rawData[i + 2]);
            //Equivalent warning-less version
            //channel1d[j] = (short)(((ushort)rawData[i + 1]) << 8 | (ushort)rawData[i]);
            //channel2d[j] = (short)(((ushort)rawData[i + 3]) << 8 | (ushort)rawData[i + 2]);

        }

        sw4.Stop();

        Stopwatch sw5 = Stopwatch.StartNew();

        short[] channel1e = new short[rawData.Length / 4];
        short[] channel2e = new short[rawData.Length / 4];

        fixed (byte* pRawData = rawData)
        fixed (short* pChannel1 = channel1e)
        fixed (short* pChannel2 = channel2e)
        {
            byte* pRawData2 = pRawData;
            short* pChannel1e = pChannel1;
            short* pChannel2e = pChannel2;

            byte* end = pRawData2 + rawData.Length;

            while (pRawData2 < end)
            {
                (*(pChannel1e++)) = *((short*)pRawData2);
                pRawData2 += sizeof(short);
                (*(pChannel2e++)) = *((short*)pRawData2);
                pRawData2 += sizeof(short);
            }
        }

        sw5.Stop();

        Stopwatch sw6 = Stopwatch.StartNew();

        short[] shortse = new short[rawData.Length / 2];
        short[] channel1f = new short[rawData.Length / 4];
        short[] channel2f = new short[rawData.Length / 4];
        System.Buffer.BlockCopy(rawData, 0, shortse, 0, rawData.Length);

        System.Threading.Tasks.Parallel.For(0, shortse.Length / 2, (i) =>
        {
            channel1f[i] = shortse[i * 2];
            channel2f[i] = shortse[i * 2 + 1];
        });

        sw6.Stop();


        if (!channel1.SequenceEqual(channel1b) || !channel1.SequenceEqual(channel1c) || !channel1.SequenceEqual(channel1d) || !channel1.SequenceEqual(channel1e) || !channel1.SequenceEqual(channel1f))
        {
            throw new Exception();
        }

        if (!channel2.SequenceEqual(channel2b) || !channel2.SequenceEqual(channel2c) || !channel2.SequenceEqual(channel2d) || !channel2.SequenceEqual(channel2e) || !channel2.SequenceEqual(channel2f))
        {
            throw new Exception();
        }

        Console.WriteLine("Original: {0}ms", sw1.ElapsedMilliseconds);
        Console.WriteLine("BitConverter: {0}ms", sw2.ElapsedMilliseconds);
        Console.WriteLine("Super-unsafe struct: {0}ms", sw3.ElapsedMilliseconds);
        Console.WriteLine("PVitt shifts: {0}ms", sw4.ElapsedMilliseconds);
        Console.WriteLine("unsafe VirtualBlackFox: {0}ms", sw5.ElapsedMilliseconds);
        Console.WriteLine("TPL: {0}ms", sw6.ElapsedMilliseconds);
        Console.ReadKey();
        return;
    }
}

在x86上，最快的是VirtualBlackFox的不安全代码，其次是C#不安全值类型数组转字节数组的“超级不安全”struct技巧，第三是PVitt。
在x64上，最快的是VirtualBlackFox的不安全代码，其次是PVitt。

- xanatos

@chibacity 你是在没有调试器的发布模式下运行程序吗？（CTRL-F5）区别很大。但是，是的，我会说PVitt是“最好的”、“最安全的”解决方案。 - xanatos

@Xantos 不，你在原始版本中将你的版本标记为最快，实际上它是第三个，落后于 PVitt :P - Tim Lloyd

@chibacity 尝试了在 while (true) 中加入 GC，最终结果相同。但是差异非常小（大约10%）。所以 PVitt 更好。 - xanatos

@chibacity 也许你正在运行64位的系统？当不使用调试器运行调试代码时，我看到了25％的差异。但最终，我不会使用 struct 版本。而且这也不是“我的”版本 :-) 我不想被与那个黑客混淆 :-) :-) - xanatos

非常有趣，我使用的Scratch项目确实是x64。当我改为x86时，不安全版本从约5ms增加到约7ms，而您的版本比PVitt快约25％，但从约8.5ms增加到约12.5ms。 x86和x64之间的差异相当大。在使用x86 PVitt时，它并没有比原始解决方案好多少 - 这是微不足道的。 - Tim Lloyd

显示剩余7条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- PVitt · Accepted Answer

您可以省略复制缓冲区的步骤：

byte[] rawData = //from audio device
short[] channel1 = new short[rawData.Length / 4];
short[] channel2 = new short[rawData.Length / 4];
for (int i = 0, j = 0; i < rawData.Length; i+=4, ++j)
{
    channel1[j] = (short)(((ushort)rawData[i + 1]) << 8 | (ushort)rawData[i]);
    channel2[j] = (short)(((ushort)rawData[i + 3]) << 8 | (ushort)rawData[i + 2]);
}

为了让循环更快，你可以看一下任务并行库（Task Parallel Library），特别是Parallel.For：

[编辑]

System.Threading.Tasks.Parallel.For( 0, shorts.Length/2, ( i ) =>
{
    channel1[i] = shorts[i*2];
    channel2[i] = shorts[i*2+1];
} );

另一个方法是循环展开，但我认为TPL也会提高它的性能。