PCM Wave文件 - 立体声转单声道

9

我有一个立体声音频文件。将其转换为单声道只是跳过每个字节(在头之后)吗?它以16位有符号PCM格式编码。我可以使用javax.sound.sampled

这是我尝试但不起作用的代码:

WaveFileWriter wfw = new WaveFileWriter();
AudioFormat format = new AudioFormat(Encoding.PCM_SIGNED, 44100, 16, 2, 2, 44100, false);
AudioFormat monoFormat = new AudioFormat(Encoding.PCM_SIGNED, 44100, 16, 1, 2, 44100, false);

byte[] audioData = dataout.toByteArray();
int length = audioData.length;
ByteArrayInputStream bais = new ByteArrayInputStream(audioData);

AudioInputStream stereoStream = new AudioInputStream(bais,format,length);
AudioInputStream monoStream = new AudioInputStream(stereoStream,format,length/2);

wfw.write(monoStream, Type.WAVE, new File(Environment.
                 getExternalStorageDirectory().getAbsolutePath()+"/stegDroid/un-ogged.wav"));

使用Jorbis读取.ogg文件后,此代码用于将其转换为PCM数据。唯一的问题是结果是立体声,而我需要它是单声道,所以如果有其他解决方案,我很乐意听取!

3个回答

10

我有一个立体声音频文件。 把它转换为单声道只需要跳过每隔一个字节吗(在头部之后)?

几乎对了 - 你需要跳过每隔一个采样,而不是字节。 在你的情况下,似乎每个采样的大小是16位=2个字节。 所以你需要取2个字节,跳过2个字节,再取2个字节,以此类推。

AudioInputStream monoStream = new AudioInputStream(stereoStream,format,length/2);

wfw.write(monoStream, Type.WAVE, new File(Environment.getExternalStorageDirectory().getAbsolutePath()+"/stegDroid/un-ogged.wav"));

看起来您只写出文件的前一半而不是写出每个样本的交替部分。此外,您需要修复WAV头,以指定单声道(请参阅monoFormat)。


11
如果你跳过每隔一个样本的话,你只会得到立体声(左/右)通道中的一个。你可能想将左/右通道合并成单声道。一个简单的方法是将样本相加然后除以2(平均)。注意使用适当类型以避免溢出。对于16位采样,确保使用32位整数进行平均值计算。值得注意的是,还有其他混合样本的方法。 - basszero

4

看看这段代码。当我需要操作wav文件中的字节时,它帮了我大忙。


package GlobalUtilities;
import java.applet.Applet; import java.applet.AudioClip; import java.net.URISyntaxException; import java.util.logging.Level; import java.util.logging.Logger; import java.io.*; import java.io.File; import java.net.MalformedURLException; import java.net.URL; import javax.sound.sampled.*;
/** * This class handles the reading, writing, and playing of wav files. It is * also capable of converting the file to its raw byte [] form. * * based on code by Evan Merz modified by Dan Vargo * @author dvargo */ public class Wav { /* WAV File Specification FROM http://ccrma.stanford.edu/courses/422/projects/WaveFormat/ The canonical WAVE format starts with the RIFF header: 0 4 ChunkID Contains the letters "RIFF" in ASCII form (0x52494646 big-endian form). 4 4 ChunkSize 36 + SubChunk2Size, or more precisely: 4 + (8 + SubChunk1Size) + (8 + SubChunk2Size) This is the size of the rest of the chunk following this number. This is the size of the entire file in bytes minus 8 bytes for the two fields not included in this count: ChunkID and ChunkSize. 8 4 Format Contains the letters "WAVE" (0x57415645 big-endian form).
The "WAVE" format consists of two subchunks: "fmt " and "data": The "fmt " subchunk describes the sound data's format: 12 4 Subchunk1ID Contains the letters "fmt " (0x666d7420 big-endian form). 16 4 Subchunk1Size 16 for PCM. This is the size of the rest of the Subchunk which follows this number. 20 2 AudioFormat PCM = 1 (i.e. Linear quantization) Values other than 1 indicate some form of compression. 22 2 NumChannels Mono = 1, Stereo = 2, etc. 24 4 SampleRate 8000, 44100, etc. 28 4 ByteRate == SampleRate * NumChannels * BitsPerSample/8 32 2 BlockAlign == NumChannels * BitsPerSample/8 The number of bytes for one sample including all channels. I wonder what happens when this number isn't an integer? 34 2 BitsPerSample 8 bits = 8, 16 bits = 16, etc.
The "data" subchunk contains the size of the data and the actual sound: 36 4 Subchunk2ID Contains the letters "data" (0x64617461 big-endian form). 40 4 Subchunk2Size == NumSamples * NumChannels * BitsPerSample/8 This is the number of bytes in the data. You can also think of this as the size of the read of the subchunk following this number. 44 * Data The actual sound data.
The thing that makes reading wav files tricky is that java has no unsigned types. This means that the binary data can't just be read and cast appropriately. Also, we have to use larger types than are normally necessary.
In many languages including java, an integer is represented by 4 bytes. The issue here is that in most languages, integers can be signed or unsigned, and in wav files the integers are unsigned. So, to make sure that we can store the proper values, we have to use longs to hold integers, and integers to hold shorts.
Then, we have to convert back when we want to save our wav data.
It's complicated, but ultimately, it just results in a few extra functions at the bottom of this file. Once you understand the issue, there is no reason to pay any more attention to it.
ALSO:
This code won't read ALL wav files. This does not use to full specification. It just uses a trimmed down version that most wav files adhere to.
*/
ByteArrayOutputStream byteArrayOutputStream; AudioFormat audioFormat; TargetDataLine targetDataLine; AudioInputStream audioInputStream; SourceDataLine sourceDataLine; float frequency = 8000.0F; //8000,11025,16000,22050,44100 int samplesize = 16; private String myPath; private long myChunkSize; private long mySubChunk1Size; private int myFormat; private long myChannels; private long mySampleRate; private long myByteRate; private int myBlockAlign; private int myBitsPerSample; private long myDataSize; // I made this public so that you can toss whatever you want in here // maybe a recorded buffer, maybe just whatever you want public byte[] myData;
public Wav() { myPath = ""; }
// constructor takes a wav path public Wav(String tmpPath) { myPath = tmpPath; }
// get/set for the Path property public String getPath() { return myPath; }
public void setPath(String newPath) { myPath = newPath; }
// read a wav file into this class public boolean read() { DataInputStream inFile = null; myData = null; byte[] tmpLong = new byte[4]; byte[] tmpInt = new byte[2];
try { inFile = new DataInputStream(new FileInputStream(myPath));
//System.out.println("Reading wav file...\n"); // for debugging only
String chunkID = "" + (char) inFile.readByte() + (char) inFile.readByte() + (char) inFile.readByte() + (char) inFile.readByte();
inFile.read(tmpLong); // read the ChunkSize myChunkSize = byteArrayToLong(tmpLong);
String format = "" + (char) inFile.readByte() + (char) inFile.readByte() + (char) inFile.readByte() + (char) inFile.readByte();
// print what we've read so far //System.out.println("chunkID:" + chunkID + " chunk1Size:" + myChunkSize + " format:" + format); // for debugging only
String subChunk1ID = "" + (char) inFile.readByte() + (char) inFile.readByte() + (char) inFile.readByte() + (char) inFile.readByte();
inFile.read(tmpLong); // read the SubChunk1Size mySubChunk1Size = byteArrayToLong(tmpLong);
inFile.read(tmpInt); // read the audio format. This should be 1 for PCM myFormat = byteArrayToInt(tmpInt);
inFile.read(tmpInt); // read the # of channels (1 or 2) myChannels = byteArrayToInt(tmpInt);
inFile.read(tmpLong); // read the samplerate mySampleRate = byteArrayToLong(tmpLong);
inFile.read(tmpLong); // read the byterate myByteRate = byteArrayToLong(tmpLong);
inFile.read(tmpInt); // read the blockalign myBlockAlign = byteArrayToInt(tmpInt);
inFile.read(tmpInt); // read the bitspersample myBitsPerSample = byteArrayToInt(tmpInt);
// print what we've read so far //System.out.println("SubChunk1ID:" + subChunk1ID + " SubChunk1Size:" + mySubChunk1Size + " AudioFormat:" + myFormat + " Channels:" + myChannels + " SampleRate:" + mySampleRate);
// read the data chunk header - reading this IS necessary, because not all wav files will have the data chunk here - for now, we're just assuming that the data chunk is here String dataChunkID = "" + (char) inFile.readByte() + (char) inFile.readByte() + (char) inFile.readByte() + (char) inFile.readByte();
inFile.read(tmpLong); // read the size of the data myDataSize = byteArrayToLong(tmpLong);
// read the data chunk myData = new byte[(int) myDataSize]; inFile.read(myData);
// close the input stream inFile.close(); } catch (Exception e) { return false; }
return true; // this should probably be something more descriptive }
// write out the wav file public boolean save() { try { DataOutputStream outFile = new DataOutputStream(new FileOutputStream(myPath + "temp"));
// write the wav file per the wav file format outFile.writeBytes("RIFF"); // 00 - RIFF outFile.write(intToByteArray((int) myChunkSize), 0, 4); // 04 - how big is the rest of this file? outFile.writeBytes("WAVE"); // 08 - WAVE outFile.writeBytes("fmt "); // 12 - fmt outFile.write(intToByteArray((int) mySubChunk1Size), 0, 4); // 16 - size of this chunk outFile.write(shortToByteArray((short) myFormat), 0, 2); // 20 - what is the audio format? 1 for PCM = Pulse Code Modulation outFile.write(shortToByteArray((short) myChannels), 0, 2); // 22 - mono or stereo? 1 or 2? (or 5 or ???) outFile.write(intToByteArray((int) mySampleRate), 0, 4); // 24 - samples per second (numbers per second) outFile.write(intToByteArray((int) myByteRate), 0, 4); // 28 - bytes per second outFile.write(shortToByteArray((short) myBlockAlign), 0, 2); // 32 - # of bytes in one sample, for all channels outFile.write(shortToByteArray((short) myBitsPerSample), 0, 2); // 34 - how many bits in a sample(number)? usually 16 or 24 outFile.writeBytes("data"); // 36 - data outFile.write(intToByteArray((int) myDataSize), 0, 4); // 40 - how big is this data chunk outFile.write(myData); // 44 - the actual data itself - just a long string of numbers } catch (Exception e) { System.out.println(e.getMessage()); return false; }
return true; }
// return a printable summary of the wav file public String getSummary() { //String newline = System.getProperty("line.separator"); String newline = "
"; String summary = "Format: " + myFormat + newline + "Channels: " + myChannels + newline + "SampleRate: " + mySampleRate + newline + "ByteRate: " + myByteRate + newline + "BlockAlign: " + myBlockAlign + newline + "BitsPerSample: " + myBitsPerSample + newline + "DataSize: " + myDataSize + ""; return summary; }
public byte[] getBytes() { read(); return myData; }
/** * Plays back audio stored in the byte array using an audio format given by * freq, sample rate, ect. * @param data The byte array to play */ public void playAudio(byte[] data) { try { byte audioData[] = data; //Get an input stream on the byte array containing the data InputStream byteArrayInputStream = new ByteArrayInputStream(audioData); AudioFormat audioFormat = getAudioFormat(); audioInputStream = new AudioInputStream(byteArrayInputStream, audioFormat, audioData.length / audioFormat.getFrameSize()); DataLine.Info dataLineInfo = new DataLine.Info(SourceDataLine.class, audioFormat); sourceDataLine = (SourceDataLine) AudioSystem.getLine(dataLineInfo); sourceDataLine.open(audioFormat); sourceDataLine.start();
//Create a thread to play back the data and start it running. It will run \ //until all the data has been played back. Thread playThread = new Thread(new PlayThread()); playThread.start(); } catch (Exception e) { System.out.println(e); } }
/** * This method creates and returns an AudioFormat object for a given set * of format parameters. If these parameters don't work well for * you, try some of the other allowable parameter values, which * are shown in comments following the declarations. * @return */ private AudioFormat getAudioFormat() { float sampleRate = frequency; //8000,11025,16000,22050,44100 int sampleSizeInBits = samplesize; //8,16 int channels = 1; //1,2 boolean signed = true; //true,false boolean bigEndian = false; //true,false //return new AudioFormat( AudioFormat.Encoding.PCM_SIGNED, 8000.0f, 8, 1, 1, //8000.0f, false );
return new AudioFormat(sampleRate, sampleSizeInBits, channels, signed, bigEndian); }
public void playWav(String filePath) { try { AudioClip clip = (AudioClip) Applet.newAudioClip(new File(filePath).toURI().toURL()); clip.play(); } catch (Exception e) { Logger.getLogger(Wav.class.getName()).log(Level.SEVERE, null, e); }
}
// =========================== // CONVERT BYTES TO JAVA TYPES // =========================== // these two routines convert a byte array to a unsigned short public static int byteArrayToInt(byte[] b) { int start = 0; int low = b[start] & 0xff; int high = b[start + 1] & 0xff; return (int) (high > 8) & 0x000000FF); b[2] = (byte) ((i >> 16) & 0x000000FF); b[3] = (byte) ((i >> 24) & 0x000000FF); return b; }
// convert a short to a byte array public static byte[] shortToByteArray(short data) { return new byte[]{(byte) (data & 0xff), (byte) ((data >>> 8) & 0xff)}; }
/** * Inner class to play back the data that was saved */ class PlayThread extends Thread {
byte tempBuffer[] = new byte[10000];
public void run() { try { int cnt; //Keep looping until the input // read method returns -1 for // empty stream. while ((cnt = audioInputStream.read(tempBuffer, 0, tempBuffer.length)) != -1) { if (cnt > 0) { //Write data to the internal // buffer of the data line // where it will be delivered // to the speaker. sourceDataLine.write(tempBuffer, 0, cnt); } } //Block and wait for internal // buffer of the data line to // empty. sourceDataLine.drain(); sourceDataLine.close(); } catch (Exception e) { System.out.println(e); System.exit(0); } } } }

2

回答这个问题时已经是2018年了。我有一个类似的情况,并意识到我犯了一个明显的错误。在构造函数的参数中,你的 "format" 参数不正确。

    AudioFormat format = new AudioFormat(Encoding.PCM_SIGNED, 44100, 16, 2, 2, 44100, 
false);

第五个参数(在您的情况下,第二个“2”)代表帧大小。帧大小=样本大小*通道数。因为您的位深度为16,样本大小为2个字节。
样本大小=2
通道数=2
帧大小=样本大小*通道数=4
因此,您的代码行应该是:
    AudioFormat format = new AudioFormat(Encoding.PCM_SIGNED, 44100, 16, 2, 4, 44100, 
false);

此外,您尝试使用过FormatConversionProvider吗?
    javax.sound.sampled.spi.FormatConversionProvider

https://docs.oracle.com/javase/tutorial/sound/converters.html

这个教程帮了我很多,但我认为它假设你已经导入了上述类。

我没有在这个线程中看到这些解决方案的帖子,但也许你已经解决了。无论如何,希望这有所帮助!


在过去的7年中,我在某个时候设法搞清楚了。我甚至不知道自己当时在做什么,而且已经有大约5年没有编写Java代码了。不过还是感谢您的见解! - fredley
哈哈,我明白了,但也许这个答案会帮助将来搜索它的人。我很高兴你解决了问题。这些东西可能会让人头疼。我在编程方面还是个新手,但我希望自己正在取得进展。 - David Boudreaux

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接