我有一个SharePoint目录(内联网)的URL,需要一个API来返回给定URL中该目录中文件列表。如何使用Python实现此操作?
我有一个SharePoint目录(内联网)的URL,需要一个API来返回给定URL中该目录中文件列表。如何使用Python实现此操作?
如果有其他人遇到从SharePoint文件夹获取文件的问题,我在此发布。通过以下链接,我真的很好地解决了这个问题:https://github.com/vgrem/Office365-REST-Python-Client/issues/98。我发现关于如何在HTTP中实现这一点的信息非常丰富,但对于Python则不然,因此希望还有其他人需要更多的Python参考。
我假设您已经设置好了与Sharepoint API的client_id和client_secret。如果没有,您可以使用此链接进行参考:https://learn.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs
基本上,我想获取文件夹中文件的名称/相对URL,然后获取文件夹中最近的文件并将其放入数据框中。我确信这不是“Pythonic”完成此操作的方式,但它能够工作,这对我来说已经足够好了。
!pip install Office365-REST-Python-Client
from office365.runtime.auth.client_credential import ClientCredential
from office365.runtime.client_request_exception import ClientRequestException
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import io
import datetime
import pandas as pd
sp_site = 'https://<org>.sharepoint.com/sites/<my_site>/'
relative_url = "/sites/<my_site/Shared Documents/<folder>/<sub_folder>"
client_credentials = ClientCredential(credentials['client_id'], credentials['client_secret'])
ctx = ClientContext(sp_site).with_credentials(client_credentials)
libraryRoot = ctx.web.get_folder_by_server_relative_path(relative_url)
ctx.load(libraryRoot)
ctx.execute_query()
#if you want to get the folders within <sub_folder>
folders = libraryRoot.folders
ctx.load(folders)
ctx.execute_query()
for myfolder in folders:
print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))
#if you want to get the files in the folder
files = libraryRoot.files
ctx.load(files)
ctx.execute_query()
#create a dataframe of the important file properties for me for each file in the folder
df_files = pd.DataFrame(columns = ['Name', 'ServerRelativeUrl', 'TimeLastModified', 'ModTime'])
for myfile in files:
#use mod_time to get in better date format
mod_time = datetime.datetime.strptime(myfile.properties['TimeLastModified'], '%Y-%m-%dT%H:%M:%SZ')
#create a dict of all of the info to add into dataframe and then append to dataframe
dict = {'Name': myfile.properties['Name'], 'ServerRelativeUrl': myfile.properties['ServerRelativeUrl'], 'TimeLastModified': myfile.properties['TimeLastModified'], 'ModTime': mod_time}
df_files = df_files.append(dict, ignore_index= True )
#print statements if needed
# print("File name: {0}".format(myfile.properties["Name"]))
# print("File link: {0}".format(myfile.properties["ServerRelativeUrl"]))
# print("File last modified: {0}".format(myfile.properties["TimeLastModified"]))
#get index of the most recently modified file and the ServerRelativeUrl associated with that index
newest_index = df_files['ModTime'].idxmax()
newest_file_url = df_files.iloc[newest_index]['ServerRelativeUrl']
# Get Excel File by newest_file_url identified above
response= File.open_binary(ctx, newest_file_url)
# save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) # set file object to start
# load Excel file from BytesIO stream
df = pd.read_excel(bytes_file_obj, sheet_name='Sheet1', header= 0)
下面是另一个有用的链接,您可以查看文件属性:https://learn.microsoft.com/en-us/previous-versions/office/developer/sharepoint-rest-reference/dn450841(v=office.15)。滚动到文件属性部分。
希望这对某个人有所帮助。再次说明,我不是专业人士,大多数情况下我需要更明确、详细的说明。也许其他人也有同样的感觉。
sites/<my_site>
- 这里的my_site
是指什么? - anjaneshdef getFilesList(directoryName):
...
return filesList
# This will tell you if the item is a file or a directory.
def isDirectory(item):
...
return true/false
在SharePoint REST API中,您不能使用“server name/sites/Folder name/Subfolder name/_api/web/lists/getbytitle('Documents')/items?$select=Title”作为URL。
考虑到WebSiteURL是包含您正在尝试获取文件的文档库的站点/子站点的URL,而Documents是文档库的显示名称,URL结构应该像下面这样:
WebSiteURL/_api/web/lists/getbytitle('Documents')/items?$select=Title
如果你想列出元数据字段值,应该在$select中添加逗号分隔的字段名。
小贴士:如果你不确定REST API URL的格式。试着将URL粘贴到Chrome浏览器中(你必须已登录到具有适当权限的SharePoint网站),并查看是否以XML形式得到正确结果。如果成功,则更新REST URL并运行代码。这样可以节省运行Python代码的时间。
https://yourServer/sites/yourSite/_api/web/lists/getbytitle('Documents')/items?$select=Title
这将返回一个文档列表:https://yourServer/sites/yourSite/Documents
参见:https://msdn.microsoft.com/en-us/library/office/dn531433.aspx
当然,您需要适当的权限/凭据才能访问该库。