使用Selenium Webdriver下载文件时如何命名文件

Question

使用Selenium Webdriver下载文件时如何命名文件

pythonseleniumwebdriverselenium-webdriver

13

我看到您可以通过Webdriver设置文件下载的位置，具体方法如下：

fp = webdriver.FirefoxProfile()

fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir",getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/csv")

browser = webdriver.Firefox(firefox_profile=fp)

但是，我想知道是否有类似的方法可以在下载文件时为其命名？最好不要使用与配置文件关联的名称，因为我将通过一个浏览器实例下载大约6000个文件，并且不想为每个下载重新初始化驱动程序。

- user1253952

3个回答

3

我不知道是否存在一个纯Selenium处理程序来解决这个问题，但当我需要对下载的文件进行操作时，下面是我的做法。

Set a loop that polls your download directory for the latest file that does not have a .part extension (this indicates a partial download and would occasionally trip things up if not accounted for. Put a timer on this to ensure that you don't go into an infinite loop in the case of timeout/other error that causes the download not to complete. I used the output of the ls -t <dirname> command in Linux (my old code uses commands, which is deprecated so I won't show it here :) ) and got the first file by using
```
# result = output of ls -t
result = result.split('\n')[1].split(' ')[-1]
```
If the while loop exits successfully, the topmost file in the directory will be your file, which you can then modify using os.rename (or anything else you like).

也许这不是你想要的答案，但希望它能指引你走向正确的方向。

- RocketDonkey

谢谢，我想我会选择类似这样的东西。将首选文件名保存到文件中，然后在下载所有文件后列出它们的创建日期，并在那时重命名。 - user1253952

@user1253952 如果这种方法更好的话，您实际上可以在下载时进行更改。很乐意提供帮助。 - RocketDonkey

+1，这就是我为解决这个问题所做的，基本上是不断地轮询下载目录。 - Arran

@Arran Ha，很高兴听到还有其他人也这样做 :) - RocketDonkey

1

根据所选答案的建议，提供带有代码的解决方案。每次下载后重命名文件。

import os

os.chdir(SAVE_TO_DIRECTORY)
files = filter(os.path.isfile, os.listdir(SAVE_TO_DIRECTORY))
files = [os.path.join(SAVE_TO_DIRECTORY, f) for f in files]  # add path to each file
files.sort(key=lambda x: os.path.getmtime(x))
newest_file = files[-1]
os.rename(newest_file, docName + ".pdf")

_{这个答案是由OP user1253952 发布的编辑，遵循CC BY-SA 3.0，在问题Selenium Webdriver下载文件时命名文件下。}

- vvvvv

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Pavel Daynyak · Accepted Answer

我建议一种有些奇怪的方式：如果可能的话，请不要使用Selenium下载文件。我的意思是获取文件URL并使用urllib库以“手动”的方式下载文件并将其保存到磁盘上。问题在于Selenium没有处理Windows对话框（例如“另存为”对话框）的工具。我不确定，但我怀疑它根本不能处理任何操作系统对话框，请纠正我如果我错了。 :)

以下是一个简单的示例：

import urllib
urllib.urlretrieve( "http://www.yourhost.com/yourfile.ext", "your-file-name.ext")

我们唯一的任务是确保处理所有urllib异常。请参见http://docs.python.org/2/library/urllib.html#urllib.urlretrieve获取更多信息。