生成任意音频文件的缩略图

Question

生成任意音频文件的缩略图

pythonaudiovisualization

6

我希望用最大尺寸为180×180像素的图像来代表音频文件。

我希望生成这个图像，以某种方式给出音频文件的表示，就像SoundCloud的波形（幅度图）一样？

Screenshot of Soundcloud's player

我想知道你是否有相关内容。我已经搜索了一段时间，主要是“音频可视化”和“音频缩略图”，但我没有找到任何有用的信息。

我首先发布在ux.stackexchange.com上, 这是我尝试联系任何正在处理此问题的程序员。

- joar

你想要制作一个工具来完成这个任务，还是想要一个现成的解决方案？ - Koof

那不是一个声谱图，它是一个幅度图。音频的声谱图通常是三维的：x轴上通常代表时间，y轴代表频率，而颜色则代表振幅。 - jscs

谢谢Josh Caswell，正如您所看到的，我对这种波形表示的名称不确定。 - joar

@Koof - 没关系，任何想法都会有帮助。 - joar

不用谢。我想这个澄清可能会帮助你的搜索。 - jscs

显示剩余2条评论

2个回答

1

基于Jiaaro的答案（感谢编写pydub！），并为web2py构建，这是我的意见：

def generate_waveform():
    img_width = 1170
    img_height = 140
    line_color = 180
    filename = os.path.join(request.folder,'static','sounds','adg3.mp3')


    # first I'll open the audio file
    sound = pydub.AudioSegment.from_mp3(filename)

    # break the sound 180 even chunks (or however
    # many pixels wide the image should be)
    chunk_length = len(sound) / img_width

    loudness_of_chunks = [
        sound[ i*chunk_length : (i+1)*chunk_length ].rms
        for i in range(img_width)
    ]
    max_rms = float(max(loudness_of_chunks))
    scaled_loudness = [ round(loudness * img_height/ max_rms)  for loudness in loudness_of_chunks]

    # now convert the scaled_loudness to an image
    im = Image.new('L',(img_width, img_height),color=255)
    draw = ImageDraw.Draw(im)
    for x,rms in enumerate(scaled_loudness):
        y0 = img_height - rms
        y1 = img_height
        draw.line((x,y0,x,y1), fill=line_color, width=1)
    buffer = cStringIO.StringIO()
    del draw
    im = im.filter(ImageFilter.SMOOTH).filter(ImageFilter.DETAIL)
    im.save(buffer,'PNG')
    buffer.seek(0)
    return response.stream(buffer, filename=filename+'.png')

- Remco

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jiaaro · Accepted Answer

你可以将音频分成块，并测量 RMS 值（一个衡量响度的指标）。假设你想要一张宽度为 180 像素的图像。

我会使用 pydub，这是我在 std lib wave 模块周围编写的轻量级包装器：

from pydub import AudioSegment

# first I'll open the audio file
sound = AudioSegment.from_mp3("some_song.mp3")

# break the sound 180 even chunks (or however
# many pixels wide the image should be)
chunk_length = len(sound) / 180

loudness_of_chunks = []
for i in range(180):
    start = i * chunk_length
    end = chunk_start + chunk_length

    chunk = sound[start:end]
    loudness_of_chunks.append(chunk.rms)

for循环可以表示为以下列表推导式，我只是想让它更清晰明了：

loudness_of_chunks = [
    sound[ i*chunk_length : (i+1)*chunk_length ].rms
    for i in range(180)]

现在唯一需要做的就是将RMS缩小到0-180的比例尺（因为您希望图像高度为180px）。

max_rms = max(loudness_of_chunks)

scaled_loudness = [ (loudness / max_rms) * 180 for loudness in loudness_of_chunks]

我会把实际像素绘制留给你，因为我对 PIL 或 ImageMagik 不是很熟悉 :/