使用Ruby aws-sdk追踪文件上传到S3的进度

Question

使用Ruby aws-sdk追踪文件上传到S3的进度

ruby-on-railsfile-uploadamazon-s3progress-bar

8

首先，我知道在SO上有很多类似的问题。过去一周，我已经阅读了大部分（如果不是全部）的这些问题。但我仍然无法使其对我起作用。

我正在开发一个Ruby on Rails应用程序，允许用户将mp3文件上传到Amazon S3。上传本身完美运行，但进度条将极大地改善网站用户体验。

我正在使用aws-sdk gem，这是来自Amazon的官方gem。我在它的文档中寻找上传过程中的回调，但我找不到任何信息。

文件直接一次性上传到S3，因此不需要将其加载到内存中。也不需要多文件上传。

我想我可能需要使用JQuery使其工作，我可以接受这个方法。我发现这个看起来非常有前途：https://github.com/blueimp/jQuery-File-Upload 我甚至尝试按照这里的示例进行操作：https://github.com/ncri/s3_uploader_example 但我就是不能让它对我起作用。

aws-sdk的文档还简要介绍了使用块进行流式上传：

  obj.write do |buffer, bytes|
     # writing fewer than the requested number of bytes to the buffer
     # will cause write to stop yielding to the block
  end

但这几乎没有帮助。如何“写入缓冲区”？我尝试了一些直观的选项，但始终导致超时。而且基于缓冲区如何更新浏览器？

有更好或更简单的解决方案吗？

提前感谢您的帮助。我会非常感激任何关于此主题的帮助。

- DaedalusCoder

2个回答

2

在阅读了AWS gem的源代码后，我已经适应（或大多数复制）了多部分上传方法，以根据已上传的块数产生当前进度。

s3 = AWS::S3.new.buckets['your_bucket']

file = File.open(filepath, 'r', encoding: 'BINARY')
file_to_upload = "#{s3_dir}/#{filename}"
upload_progress = 0

opts = {
  content_type: mime_type,
  cache_control: 'max-age=31536000',
  estimated_content_length: file.size,
}

part_size = self.compute_part_size(opts)

parts_number = (file.size.to_f / part_size).ceil.to_i
obj          = s3.objects[file_to_upload]

begin
    obj.multipart_upload(opts) do |upload|
      until file.eof? do
        break if (abort_upload = upload.aborted?)

        upload.add_part(file.read(part_size))
        upload_progress += 1.0/parts_number

        # Yields the Float progress and the String filepath from the
        # current file that's being uploaded
        yield(upload_progress, upload) if block_given?
      end
    end
end

compute_part_size方法在这里定义，我已经将其修改为：

def compute_part_size options

  max_parts = 10000
  min_size  = 5242880 #5 MB
  estimated_size = options[:estimated_content_length]

  [(estimated_size.to_f / max_parts).ceil, min_size].max.to_i

end

这段代码已在Ruby 2.0.0p0上进行了测试。

- emartini

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Trevor Rowe · Accepted Answer

10

当向 #write 传递一个块时产生的“缓冲区”对象是 StringIO 的一个实例。你可以使用 #write 或 #<< 向缓冲区写入内容。以下是使用块形式上传文件的示例。

file = File.open('/path/to/file', 'r')

obj = s3.buckets['my-bucket'].objects['object-key']
obj.write(:content_length => file.size) do |buffer, bytes|
  buffer.write(file.read(bytes))
  # you could do some interesting things here to track progress
end

file.close

- Trevor Rowe

非常感谢这个。虽然我仍然不确定如何使用循环实时更新页面，但它似乎正在工作。有一件事：流式传输会使上传过程明显变慢吗？ - DaedalusCoder

1

一种选择是在其他位置（如memcache/db等）跟踪进度。然后，您可以让Web浏览器点击一个单独的操作，从不同的操作中轮询进度。流式传输不应该太多地减慢上传速度。您在块内执行的任何操作都会影响速度，因此请确保它们是快速操作。 - Trevor Rowe

这种行为在Ruby 2.0.0中似乎存在问题，并且已被弃用（尽管我在代码中找不到弃用通知）。请参见https://github.com/aws/aws-sdk-ruby/issues/192，Trevor在其中表示：“块形式已被弃用。也就是说，我们支持Ruby 2，并且我会查看为什么会出现故障。” - Andy Triggs

我在1.9.3上面使用以上代码有一些成功，但由于我不理解的原因，有时上传的字节数总和会大于文件大小。 - Andy Triggs

@AndyTriggs 我猜你可能正在打印“字节”？那只是块大小，所以如果你正在做5M块，即使在最后一次迭代中，字节变量也将是5M。例如，对于一个18M的文件，最终你会得到5M + 5M + 5M + 5M，这是18M中的20M。 - ggez44