使用Azure Storage SDK for Python将文件夹中的多个文件上传到Azure Blob存储。

Question

使用Azure Storage SDK for Python将文件夹中的多个文件上传到Azure Blob存储。

pythonazureazure-storageazure-blob-storage

5

我在Windows机器上有一些图片，它们存储在本地文件夹中。我想将所有图片上传到同一个容器的同一个blob中。

我知道如何使用Azure Storage SDKs的BlockBlobService.create_blob_from_path()上传单个文件，但我没有看到一次上传文件夹中所有图片的可能性。

然而，Azure Storage Explorer提供了这个功能，所以肯定有一种方法可以实现这个功能。

是否有函数提供此服务，或者我必须循环遍历文件夹中的所有文件，并多次运行create_blob_from_path()来上传到同一个blob中？

- gehbiszumeis

3

你需要循环处理 :-) 你的意思是对同一个数据块进行操作吗？你会将所有图像上传到同一个容器中吗？ - Thomas

2个回答

1

你可以通过使用多线程来提高上传性能。以下是一些代码示例：

from azure.storage.blob import BlobClient
from threading import Thread
import os


# Uploads a single blob. May be invoked in thread.
def upload_blob(container, file, index=0, result=None):
    if result is None:
        result = [None]

    try:
        # extract blob name from file path
        blob_name = ''.join(os.path.splitext(os.path.basename(file)))

        blob = BlobClient.from_connection_string(
            conn_str='CONNECTION STRING',
            container_name=container,
            blob_name=blob_name
        )

        with open(file, "rb") as data:
            blob.upload_blob(data, overwrite=True)

        print(f'Upload succeeded: {blob_name}')
        result[index] = True # example of returning result
    except Exception as e:
        print(e) # do something useful here
        result[index] = False # example of returning result


# container: string of container name. This example assumes the container exists.
# files: list of file paths.    
def upload_wrapper(container, files):
    # here, you can define a better threading/batching strategy than what is written
    # this code just creates a new thread for each file to be uploaded
    parallel_runs = len(files)
    threads = [None] * parallel_runs
    results = [None] * parallel_runs
    for i in range(parallel_runs):
        t = Thread(target=upload_blob, args=(container, files[i], i, results))
        threads[i] = t
        threads[i].start()

    for i in range(parallel_runs):  # wait for all threads to finish
        threads[i].join()

    # do something with results here

可能有更好的分块策略 - 这只是一个示例，说明对于某些情况，使用线程可能会实现更高的Blob上传性能。

以下是顺序循环方法与上述线程方法之间的一些基准测试（482个图像文件，总共26 MB）：

顺序上传：89秒
线程上传：28秒

我还应该补充一点，您可以考虑通过Python调用azcopy，因为这个工具可能更适合您的特定需求。

- ryanpfalz

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ivan Glasenberg · Accepted Answer

没有直接的方法可以做到这一点。您可以通过 Azure 存储 Python SDK 的 blockblobservice.py 和 baseblobservice.py 进行操作，具体详情请参考链接：blockblobservice.py 和 baseblobservice.py。

正如您所提到的，您应该循环遍历。以下是示例代码：

from azure.storage.blob import BlockBlobService, PublicAccess
import os

def run_sample():
    block_blob_service = BlockBlobService(account_name='your_account', account_key='your_key')
    container_name ='t1s'

    local_path = "D:\\Test\\test"

    for files in os.listdir(local_path):
        block_blob_service.create_blob_from_path(container_name,files,os.path.join(local_path,files))


# Main method.
if __name__ == '__main__':
    run_sample()

本地文件：

代码执行后，它们被上传到Azure：