Selenium如何以A4纸张格式打印PDF文件

Question

Selenium如何以A4纸张格式打印PDF文件

pythonseleniumgoogle-chromeselenium-chromedriver

8

我有以下用于打印PDF的代码（它可以工作），我只使用Google Chrome进行打印。

def send_devtools(driver, command, params=None):
    # pylint: disable=protected-access
    if params is None:
        params = {}
    resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id
    url = driver.command_executor._url + resource
    body = json.dumps({"cmd": command, "params": params})
    resp = driver.command_executor._request("POST", url, body)
    return resp.get("value")


def export_pdf(driver):
    command = "Page.printToPDF"
    params = {"format": "A4"}
    result = send_devtools(driver, command, params)
    data = result.get("data")
    return data

正如我们所看到的，我正在使用Page.printToPDF以base64格式打印，并将"A4"作为params参数中的format进行传递。

遗憾的是，这个参数似乎被忽略了。我看到一些使用puppeteer的代码使用它（A4格式），我认为这可以帮助我。

即使使用硬编码的宽度和高度（见下文），我也没有成功。

"paperWidth": 8.27,  # inches
"paperHeight": 11.69,  # inches

使用上面的代码，能否将页面设置为A4格式？

- Rodrigo

经过更多的研究，我找到了一种实现您目标的方法。我发布了一个更新的答案，并提供了一个可行的示例来完成您的用例。请告诉我它对您有何作用。 - Life is complex

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Life is complex · Accepted Answer

2021年7月17日更新

我决定使用Python包pdfminer.sixth验证我的原始代码的输出。

from pdfminer.pdfpage import PDFPage
from pdfminer.pdfpage import PDFParser
from pdfminer.pdfpage import PDFDocument

parser = PDFParser(open('test_1.pdf', 'rb'))
doc = PDFDocument(parser)
pageSizesList = []
for page in PDFPage.create_pages(doc):
    print(page.mediabox)
    # output
    [0, 0, 612, 792]

当我将这些点大小转换为英寸时，我感到震惊。尺寸为8.5 x 11，与A4纸张的8.27 x 11.69不相等。当我看到这个问题时，我决定通过查看chromium和selenium源代码来进一步研究。

在chromium源代码中，命令Page.printToPDF位于文件page_handler.cc中。

void PageHandler::PrintToPDF(Maybe<bool> landscape,
                             Maybe<bool> display_header_footer,
                             Maybe<bool> print_background,
                             Maybe<double> scale,
                             Maybe<double> paper_width,
                             Maybe<double> paper_height,
                             Maybe<double> margin_top,
                             Maybe<double> margin_bottom,
                             Maybe<double> margin_left,
                             Maybe<double> margin_right,
                             Maybe<String> page_ranges,
                             Maybe<bool> ignore_invalid_page_ranges,
                             Maybe<String> header_template,
                             Maybe<String> footer_template,
                             Maybe<bool> prefer_css_page_size,
                             Maybe<String> transfer_mode,
                             std::unique_ptr<PrintToPDFCallback> callback)

该函数允许修改参数paper_width和paper_height。这些参数采用double数据类型。C++中的double是一种多用途数据类型，用于编译器内部定义和保存任何数值型数据类型，尤其是任何小数定向的值，可包含整数和分数值。

这些参数具有默认值，这些默认值在Chrome DevTools Protocol中定义如下：

paperWidth：纸张宽度，以英寸为单位。默认值为8.5英寸。
paperHeight：纸张高度，以英寸为单位。默认值为11英寸。

请注意chromium源代码和Chrome DevTools Protocol细节之间参数格式差异。

chromium源代码中的paper_width
Chrome DevTools Protocol中的paperWidth

根据chromium源代码，使用SendCommandAndGetResultWithTimeout调用Page.printToPDF命令。

Status WebViewImpl::PrintToPDF(const base::DictionaryValue& params,
                               std::string* pdf) {
  // https://bugs.chromium.org/p/chromedriver/issues/detail?id=3517
  if (!browser_info_->is_headless) {
    return Status(kUnknownError,
                  "PrintToPDF is only supported in headless mode");
  }
  std::unique_ptr<base::DictionaryValue> result;
  Timeout timeout(base::TimeDelta::FromSeconds(10));
  Status status = client_->SendCommandAndGetResultWithTimeout(
      "Page.printToPDF", params, &timeout, &result);
  if (status.IsError()) {
    if (status.code() == kUnknownError) {
      return Status(kInvalidArgument, status);
    }
    return status;
  }
  if (!result->GetString("data", pdf))
    return Status(kUnknownError, "expected string 'data' in response");
  return Status(kOk);
}

在我的原始回答中，我使用了send_command_and_get_result方法，它类似于SendCommandAndGetResultWithTimeout 方法。

# stub_devtools_client.h
 
Status SendCommandAndGetResult(
     const std::string& method,
     const base::DictionaryValue& params,
     std::unique_ptr<base::DictionaryValue>* result) override;

Status SendCommandAndGetResultWithTimeout(
     const std::string& method,
     const base::DictionaryValue& params,
     const Timeout* timeout,
     std::unique_ptr<base::DictionaryValue>* result) override;

在查看selenium源代码后，如何正确传递命令send_command_and_get_result或send_command_and_get_result_with_timeout不是很清楚。

我在webdriverselenium源代码中注意到了这个函数：

def execute_cdp_cmd(self, cmd, cmd_args):
     """
     Execute Chrome Devtools Protocol command and get returned result

     The command and command args should follow chrome devtools protocol domains/commands, refer to link
     https://chromedevtools.github.io/devtools-protocol/

     :Args:
      - cmd: A str, command name
      - cmd_args: A dict, command args. empty dict {} if there is no command args

     :Usage:
         driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': requestId})

     :Returns:
         A dict, empty dict {} if there is no result to return.
         For example to getResponseBody:

         {'base64Encoded': False, 'body': 'response body string'}

     """
     return self.execute("executeCdpCommand", {'cmd': cmd, 'params': cmd_args})['value']

经过一些研究和测试，我发现这个函数可以用来实现您的用例。

import base64
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfpage import PDFParser
from pdfminer.pdfpage import PDFDocument

chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument('--headless')

browser = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)
browser.get('http://www.google.com')

# use can defined additional parameters if needed
params = {'landscape': False,
          'paperWidth': 8.27,
          'paperHeight': 11.69}

# call the function "execute_cdp_cmd" with the command "Page.printToPDF" with
# parameters defined above
data = browser.execute_cdp_cmd("Page.printToPDF", params)

# save the output to a file.
with open('file_name.pdf', 'wb') as file:
    file.write(base64.b64decode(data['data']))

browser.quit()

# verify the page size of the PDF file created
parser = PDFParser(open('file_name.pdf', 'rb'))
doc = PDFDocument(parser)
pageSizesList = []
for page in PDFPage.create_pages(doc):
    print(page.mediabox)
    # output 
    [0, 0, 594.95996, 840.95996]

输出结果为点，需要转换为英寸。

594.95996 点等于 8.263332777783 英寸
840.95996 点等于 11.6799994445 英寸

8.263332777783 x 11.6799994445 是 A4 纸张大小。

原始帖子 07-13-2021

在调用函数 Page.printToPDF 时，可以传递多个参数。其中两个参数是：

paper_width
paper_height

以下代码将这些参数传递给 Page.printToPDF。

import json
import base64
from selenium import webdriver
from selenium.webdriver.chrome.options import Options


def send_devtools(driver, command, params=None):
    if params is None:
        params = {}
    resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id
    url = driver.command_executor._url + resource
    body = json.dumps({"cmd": command, "params": params})
    resp = driver.command_executor._request("POST", url, body)
    return resp.get("value")


def create_pdf(driver, file_name):
    command = "Page.printToPDF"
    params = {'paper_width': '8.27', 'paper_height': '11.69'}
    result = send_devtools(driver, command,  params)
    save_pdf(result, file_name)
    return


def save_pdf(data, file_name):
    with open(file_name, 'wb') as file:
        file.write(base64.b64decode(data['data']))
    print('PDF created')


chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument('--headless')

browser = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)
browser.get('http://www.google.com')

create_pdf(browser, 'test_pdf_1.pdf')

----------------------------------------
My system information
----------------------------------------
Platform:       maxOS
OS Version:     10.15.7
Python Version: 3.9
Selenium:       3.141.0
pdfminer.sixth: 20201018
----------------------------------------