如何解码二进制音频数据？

Question

如何解码二进制音频数据？

4

我还是个新手，正在进行网页开发并制作一个聊天机器人。但我想先将响应通过谷歌的文本转语音，然后在客户端播放声音。所以，客户端发送消息到服务器 -> 服务器创建响应 -> 服务器向谷歌发送消息 -> 获取音频数据 -> 将其发送给客户端 -> 客户端播放它。我已经完成了最后一步，但现在我无法解决这个问题。

我已经进行了一些搜索，发现有关从二进制数据、音频上下文等播放音频的信息很多，我也创建了一个函数，但它不起作用。以下是我的操作：

export const SendMessage: Client.Common.Footer.API.SendMessage = async message => {
    const baseRoute = process.env.REACT_APP_BASE_ROUTE;
    const port = process.env.REACT_APP_SERVER_PORT;
    const audioContext = new AudioContext();
    let audio: any;
    const url = baseRoute + ":" + port + "/ChatBot";
    console.log("%c Sending post request...", "background: #1fa67f; color: white", url, JSON.stringify(message));
    let responseJson = await fetch(url, {
        method: "POST",
        mode: "cors",
        headers: {
            Accept: "application/json",
            "Content-Type": "application/json"
        },
        body: JSON.stringify(message)
    });
    let response = await responseJson.json();
    await audioContext.decodeAudioData(
        new ArrayBuffer(response.data.audio.data),
        buffer => {
            audio = buffer;
        },
        error => console.log("===ERROR===\n", error)
    );
    const source = audioContext.createBufferSource();
    source.buffer = audio;
    source.connect(audioContext.destination);
    source.start(0);
    console.log("%c Post response:", "background: #1fa67f; color: white", url, response);
};

此函数将消息发送到服务器并获取响应消息和音频数据。在我的response.data.audio.data中有某种二进制数据，但我收到一个错误，说无法解码音频数据（decodeAudioData方法中的错误触发）。我知道数据是有效的，因为在我的服务器上，我使用以下代码将其转换为可以正常播放的mp3文件：

const writeFile = util.promisify(fs.writeFile);
await writeFile("output/TTS.mp3", response.audioContent, "binary");

我几乎不了解二进制数据在这里是如何处理的，可能出了什么问题。我需要指定更多参数来正确解码二进制数据吗？我应该怎样知道呢？我想要了解这里实际发生了什么，而不仅仅是复制粘贴一些解决方案。

编辑：

看起来数组缓冲区没有被正确创建。如果我运行以下代码：

    console.log(response);
    const audioBuffer = new ArrayBuffer(response.data.audio.data);
    console.log("===audioBuffer===", audioBuffer);
    audio = await audioContext.decodeAudioData(audioBuffer);

响应结果如下：

{message: "Message successfully sent.", status: 1, data: {…}}
    message: "Message successfully sent."
    status: 1
    data:
        message: "Sorry, I didn't understand your question, try rephrasing."
        audio:
            type: "Buffer"
            data: Array(14304)
                [0 … 9999]
                [10000 … 14303]
                length: 14304
            __proto__: Array(0)
        __proto__: Object
    __proto__: Object
__proto__: Object

但是缓冲区日志显示为：

===audioBuffer=== 
ArrayBuffer(0) {}
    [[Int8Array]]: Int8Array []
    [[Uint8Array]]: Uint8Array []
    [[Int16Array]]: Int16Array []
    [[Int32Array]]: Int32Array []
    byteLength: 0
__proto__: ArrayBuffer

显然JS不理解我的响应对象中的格式，但这是我从谷歌的语音合成API获得的。也许我从服务器发送它时有问题？就像我之前说过的，在我的服务器上，以下代码将该数组转换为mp3文件：

    const writeFile = util.promisify(fs.writeFile);
    await writeFile("output/TTS.mp3", response.audioContent, "binary");
    return response.audioContent;

同时，response.audioContent也会像这样发送到客户端:


//in index.ts
...
const app = express();
app.use(bodyParser.json());
app.use(cors(corsOptions));

app.post("/TextToSpeech", TextToSpeechController);
...
//textToSpeech.ts
export const TextToSpeechController = async (req: Req<Server.API.TextToSpeech.RequestQuery>, res: Response) => {
    let response: Server.API.TextToSpeech.ResponseBody = {
        message: null,
        status: CONSTANTS.STATUS.ERROR,
        data: undefined
    };
    try {
        console.log("===req.body===", req.body);
        if (!req.body) throw new Error("No message recieved");
        const audio = await TextToSpeech({ message: req.body.message });
        response = {
            message: "Audio file successfully created!",
            status: CONSTANTS.STATUS.SUCCESS,
            data: audio
        };
        res.send(response);
    } catch (error) {
        response = {
            message: "Error converting text to speech: " + error.message,
            status: CONSTANTS.STATUS.ERROR,
            data: undefined
        };
        res.json(response);
    }
};
...

我觉得很奇怪，在我的服务器上，response.audioContent 的日志输出是：

===response.audioContent=== <Buffer ff f3 44 c4 00 00 00 03 48 01 40 00 00 f0 
a3 0f fc 1a 00 11 e1 48 7f e0 e0 87 fc b8 88 40 1c 7f e0 4c 03 c1 d9 ef ff ec 
3e 4c 02 c7 88 7f ff f9 ff ff ... >

但是，在客户端上，它是这样的：

audio:
            type: "Buffer"
            data: Array(14304)
                [0 … 9999]
                [10000 … 14303]
                length: 14304
            __proto__: Array(0)
        __proto__: Object

我尝试将response.data、response.data.audio和response.data.audio.data传递给new ArrayBuffer()，但结果都是相同的空缓冲区。

- Supperhero

2个回答

1

const audioBuffer = new ArrayBuffer(response.data.audio.data);
console.log("===audioBuffer===", audioBuffer);

也许尝试一下

。

const audioBuffer = Buffer.from(response.data.audio);
console.log("===audioBuffer===", audioBuffer);

- Renaud Reguieg

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jeeves · Accepted Answer

在您的代码中有几个问题，您不能通过那个构造函数来填充 ArrayBuffer。您对 decodeAudioData 的调用是异步的，并且会导致 audio 变为 undefined。我建议您更新对 decodeAudioData 的调用为新的基于 promise 的函数。

编辑：对于我之前发布的示例，如果您在呼叫 Google Text to Speech 并返回结果方面做了一些奇怪的事情，那么它将无法工作，因为无论是使用 mp3 还是使用从 Google 返回的响应，只要传递正确的 buffer 引用，它们都可以工作。

实际上，您能够使 mp3 文件正常工作而文字转语音无法正常工作，可能是由于您没有引用从调用 google api 返回的结果中的正确属性。api 调用的响应是一个 Array，因此请确保您在结果数组中引用了 0 索引（请参见下面的 textToSpeech.js）。

以下是完整应用程序的描述。

// textToSpeech.js
const textToSpeech = require('@google-cloud/text-to-speech');
const client = new textToSpeech.TextToSpeechClient();

module.exports = {
    say: async function(text) {
        const request = {
            input: { text },
            voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' },
            audioConfig: { audioEncoding: 'MP3' },
          };
        const response = await client.synthesizeSpeech(request);
        return response[0].audioContent    
    }
}

// server.js
const express = require('express');
const path = require('path');
const app = express();
const textToSpeechService = require('./textToSpeech');

app.get('/', (req, res) => {
    res.sendFile(path.join(__dirname + '/index.html'));
});

app.get('/speech', async (req, res) => {
    const buffer = await textToSpeechService.say('hello world');
    res.json({
        status: `y'all good :)`,
        data: buffer
    })
});

app.listen(3000);

// index.html
<!DOCTYPE html>
<html>
    <script>
        async function play() {
            const audioContext = new AudioContext();
            const request = await fetch('/speech');
            const response = await request.json();
            const arr = Uint8Array.from(response.data.data)
            const audio = await audioContext.decodeAudioData(arr.buffer);
            const source = audioContext.createBufferSource();
            source.buffer = audio;
            source.connect(audioContext.destination);
            source.start(0);
        }
    </script>
    <body>
        <h1>Hello Audio</h1>
        <button onclick="play()">play</button>
    </body>
</html>