使用PyDrive(Python)访问文件夹、子文件夹和子文件

13
我有以下来自PyDrive文档的代码,可以访问我的Google Drive中的顶级文件夹。我想从中访问所有文件夹、子文件夹和文件。我该如何做到这一点(我刚开始使用PyDrive)?
#!/usr/bin/python
# -*- coding: utf-8 -*-
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive


gauth = GoogleAuth()
gauth.LocalWebserverAuth() # Creates local webserver and auto handles authentication

#Make GoogleDrive instance with Authenticated GoogleAuth instance
drive = GoogleDrive(gauth)

#Google_Drive_Tree = 
# Auto-iterate through all files that matches this query
top_list = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file in top_list:
    print 'title: %s, id: %s' % (file['title'], file['id'])
    print "---------------------------------------------"

#Paginate file lists by specifying number of max results
for file_list in drive.ListFile({'q': 'trashed=true', 'maxResults': 10}):
    print 'Received %s files from Files.list()' % len(file_list) # <= 10
    for file1 in file_list:
        print 'title: %s, id: %s' % (file1['title'], file1['id'])

我已经检查了以下页面如何列出Google Drive文件夹的所有文件、文件夹、子文件夹和子文件,这似乎是我正在寻找的答案,但代码不再存在。

3个回答

10

这需要使用文件列表进行迭代。根据这个,代码获取文件夹中每个文件的标题和URL链接。通过提供文件夹的id,该代码可以调整为获取特定文件夹,例如ListFolder('id')。下面的示例是查询root

#!/usr/bin/python
# -*- coding: utf-8 -*-
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive

gauth = GoogleAuth()
gauth.LocalWebserverAuth() # Creates local webserver and auto handles authentication

#Make GoogleDrive instance with Authenticated GoogleAuth instance
drive = GoogleDrive(gauth)

def ListFolder(parent):
  filelist=[]
  file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % parent}).GetList()
  for f in file_list:
    if f['mimeType']=='application/vnd.google-apps.folder': # if folder
        filelist.append({"id":f['id'],"title":f['title'],"list":ListFolder(f['id'])})
    else:
        filelist.append({"title":f['title'],"title1":f['alternateLink']})
  return filelist

ListFolder('root')

2
也许需要一些速率限制,不过这个可以考虑。 - Ian Warner

9

您的代码是完全正确的。但是使用 Pydrive 的默认设置,您只能访问根级别的文件和文件夹。在 settings.yaml 文件中更改 oauth_scope 可以解决此问题。

client_config_backend: settings
client_config:
client_id: XXX
client_secret: XXXX

save_credentials: True
save_credentials_backend: file
save_credentials_file: credentials.json

get_refresh_token: True

oauth_scope:
  - https://www.googleapis.com/auth/drive
  - https://www.googleapis.com/auth/drive.metadata

3
我更改了oauth_scope之后,必须删除 credentials.json 文件,创建一个空的新 credentials.json 文件,然后重新授权以使应用程序可以访问新的作用域。 - Ivan Ogai

2
这是我对获取子文件夹中所有文件的看法... 这使你可以通过设置的路径进行查询。不同之处在于它不会为每个文件夹发出1个请求。它创建要查询的文件夹批次。
批处理片段:
'some_id_1234' in parents or 'some_id_1235' in parents or 'some_id_1236' in parents or 'some_id_1237' in parents or 'some_id_1238' in parents or 'some_id_1239' in parents or 'some_id_1240' in parents and trashed=false

您可以一次查询超过 1 个文件夹中的文件。您的查询不能太大,所以如果超过 300 个文件夹('some_id_1234' in parents'),您将开始出现错误,请将批处理大小保持在约 250 左右。
假设您要检查的文件夹有 1,110 个文件夹,并将批处理大小设置为 250。 然后它会进行 5 次单独的请求来查询所有文件夹。
- 请求 1 查询 250 个文件夹
- 请求 2 查询 250 个文件夹
- 请求 3 查询 250 个文件夹
- 请求 4 查询 250 个文件夹
- 请求 5 查询 110 个文件夹
然后,其中的任何子文件夹都将被创建成批处理并递归查询。

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive



def parse_gdrive_path(gd_path):
    if ':' in gd_path:
        gd_path = gd_path.split(':')[1]
    gd_path = gd_path.replace('\\', '/').replace('//', '/')
    if gd_path.startswith('/'):
        gd_path = gd_path[1:]
    if gd_path.endswith('/'):
        gd_path = gd_path[:-1]
    return gd_path.split('/')


def resolve_path_to_id(folder_path):
    _id = 'root'
    folder_path = parse_gdrive_path(folder_path)
    for idx, folder in enumerate(folder_path):
        folder_list = gdrive.ListFile({'q': f"'{_id}' in parents and title='{folder}' and trashed=false and mimeType='application/vnd.google-apps.folder'", 'fields': 'items(id, title, mimeType)'}).GetList()
        _id = folder_list[0]['id']
        title = folder_list[0]['title']
        if idx == (len(folder_path) - 1) and folder == title:
            return _id
    return _id


def get_folder_files(folder_ids, batch_size=100):

    base_query = "'{target_id}' in parents"
    target_queries = []
    query = ''

    for idx, folder_id in enumerate(folder_ids):
        query += base_query.format(target_id=folder_id)
        if len(folder_ids) == 1 or idx > 0 and idx % batch_size == 0:
            target_queries.append(query)
            query = ''
        elif idx != len(folder_ids)-1:
            query += " or "
        else:
            target_queries.append(query)

    for query in target_queries:
        for f in gdrive.ListFile({'q': f"{query} and trashed=false", 'fields': 'items(id, title, mimeType, version)'}).GetList():
            yield f


def get_files(folder_path=None, target_ids=None, files=[]):

    if target_ids is None:
        target_ids = [resolve_path_to_id(folder_path)]

    file_list = get_folder_files(folder_ids=target_ids, batch_size=250)

    subfolder_ids = []

    for f in file_list:
        if f['mimeType'] == 'application/vnd.google-apps.folder':
            subfolder_ids.append(f['id'])
        else:
            files.append(f['title'])

    if len(subfolder_ids) > 0:
        get_files(target_ids=subfolder_ids)

    return files


gauth = GoogleAuth()
gauth.LocalWebserverAuth()

gdrive = GoogleDrive(gauth)


file_list = get_files('/Some/Folder/Path')

for f in file_list:
    print(f)

例如:
您的Google Drive 包含以下内容:
(folder) Root
    (folder) Docs
        (subfolder) Notes
            (subfolder) School
                (file) notes_1.txt
                (file) notes_2.txt
                (file) notes_3.txt
                (file) notes_4.txt
                (file) notes_5.txt
                (subfolder) Important
                    (file) important_notes_1.txt
                    (file) important_notes_2.txt
                    (file) important_notes_3.txt
                (subfolder) Old Notes
                    (file) old_1.txt
                    (file) old_2.txt
                    (file) old_3.txt
                    (subfolder) Secrets
                        (file) secret_1.txt
                        (file) secret_2.txt
                        (file) secret_3.txt
    (folder) Stuff
        (file) nothing.txt
        (file) this-will-not-be-found.txt

如果您想获取“Notes”文件夹/子文件夹中的所有文件,可以执行以下操作:

file_list = get_files('/Docs/Notes')

for f in file_list:
    print(f)

Output:

>> notes_1.txt
>> notes_2.txt
>> notes_3.txt
>> notes_4.txt
>> notes_5.txt
>> important_notes_1.txt
>> important_notes_2.txt
>> important_notes_3.txt
>> old_1.txt
>> old_2.txt
>> old_3.txt
>> secret_1.txt
>> secret_2.txt
>> secret_3.txt

希望这能帮助到某些人 :)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接