在.NET中将HTML转换为PDF

Question

在.NET中将HTML转换为PDF

c#htmlpdfitext

510

我想通过将HTML内容传递给函数生成PDF文档。我已经使用了iTextSharp，但当遇到表格时，它的表现不好，布局会变得混乱。

是否有更好的方法？

- SandHurst

您可以使用GemBox.Document来实现此功能。同时，在这里您可以找到将HTML文件转换为PDF文件的示例代码。 - Mario Z

您使用的iTextSharp版本是哪个？能否分享一下您的HTML代码？ - Amedee Van Gasse

1

.NET Core怎么样？ - Piero Alberto

7

可以请您重新打开这个问题吗？许多新产品都提供了这个功能，其他的则已经过时了。如果没有新的回答，这个问题就很难解决。我建议在2022年使用以下网址：https://github.com/hardkoded/puppeteer-sharp#generate-pdf-files。这是一个非常成熟、维护良好、易于使用、基于稳定基础构建的工具。 - Agyss

一个赞不够！ - Osias Jota

显示剩余2条评论

26个回答

125

最近更新：2020年10月

这是我整理的HTML转PDF在.NET中的选项列表（包括一些免费和付费的选项）。

GemBox.Document
- https://www.nuget.org/packages/GemBox.Document/（免费版支持20段落）
- $680 - https://www.gemboxsoftware.com/document/pricelist
- https://www.gemboxsoftware.com/document/examples/c-sharp-convert-html-to-pdf/307
PDF Metamorphosis .Net
HtmlRenderer.PdfSharp
- https://www.nuget.org/packages/HtmlRenderer.PdfSharp/1.5.1-beta1（使用BSD-UNSPECIFIED许可证）
PuppeteerSharp
- https://www.puppeteersharp.com/examples/index.html（使用MIT许可证）
- https://github.com/kblok/puppeteer-sharp
EO.Pdf
- https://www.nuget.org/packages/EO.Pdf/
- $799 - https://www.essentialobjects.com/Purchase.aspx?f=3
WnvHtmlToPdf_x64
- https://www.nuget.org/packages/WnvHtmlToPdf_x64/
- $750 - $1600 - http://www.winnovative-software.com/Buy.aspx
- 演示版 - http://www.winnovative-software.com/demo/default.aspx
IronPdf
Spire.PDF
- https://www.nuget.org/packages/Spire.PDF/（免费版支持10页）
- $599 - $1799 - https://www.e-iceblue.com/Buy/Spire.PDF.html
- https://www.e-iceblue.com/Tutorials/Spire.PDF/Spire.PDF-Program-Guide/Convert-HTML-to-PDF-Customize-HTML-to-PDF-Conversion-by-Yourself.html
Aspose.Html
EvoPDF
- https://www.nuget.org/packages/EvoPDF/
- $450 - $1200 - http://www.evopdf.com/buy.aspx
ExpertPdfHtmlToPdf
- https://www.nuget.org/packages/ExpertPdfHtmlToPdf/
- $550 - $1200 - https://www.html-to-pdf.net/Pricing.aspx
Zetpdf
- https://zetpdf.com
- $299 - $599 - https://zetpdf.com/pricing/
- 不是一个知名或受支持的库 - ZetPDF - Does anyone know the background of this Product?
PDFtron
- https://www.pdftron.com/documentation/samples/cs/HTML2PDFTes
- $4000/year - https://www.pdftron.com/licensing/
WkHtmlToXSharp
- https://github.com/pruiz/WkHtmlToXSharp
- 免费版
- 并发转换作为处理队列实现。
SelectPDF
- https://www.nuget.org/packages/Select.HtmlToPdf/（免费版支持5页）
- $499 - $799 - https://selectpdf.com/pricing/
- https://selectpdf.com/pdf-library-for-net/

如果上述选项都无法帮助您，您可以随时搜索NuGet包：
https://www.nuget.org/packages?q=html+pdf

- Mauricio Gracia Gutierrez

3

你是否测试过性能？我们希望提高当前的转换时间，正在探索其他库以获得更好的性能。 - frno

2

另一个基于wkhtmltopdf的解决方案，甚至可以在Azure Web服务上运行，是DinkToPdf分支：https://github.com/hakanl/DinkToPdf，使用nuget：https://www.nuget.org/packages/Haukcode.DinkToPdf。 - Marko Prcać

4

DinkToPdf是免费的，并且可以在.NET Core中使用。https://www.nuget.org/packages/DinkToPdf/ - Ali Rasouli

2

@FritsJ，列表中有很多选项；-) - Mauricio Gracia Gutierrez

2

更新这个列表！！同时，检查一下这个解决方案：https://github.com/eKoopmans/html2pdf.js#getting-started它让我深入了解了很多，直到 .dotnet 6 把它搞崩了，我不得不重新开始。 - unJordi

显示剩余4条评论

30

我强烈推荐 NReco，真的。它有免费和付费版本，非常值得一试。它在后台使用 wkhtmtopdf，但你只需要一个程序集。太棒了。

使用示例：

通过 NuGet 安装。

var htmlContent = String.Format("<body>Hello world: {0}</body>", DateTime.Now);
var pdfBytes = (new NReco.PdfGenerator.HtmlToPdfConverter()).GeneratePdf(htmlContent);

免责声明：我不是开发者，只是该项目的粉丝 :)

- Kim Tranjan

3

看起来确实非常有用。值得注意的是，截至今天（05/10/15），它是wkhtmltopdf的最受欢迎的.Net封装包（作为Nuget软件包）的下载量最高的。 - ken2k

3

尝试过了，不幸的是我无法让它在 Azure 的网页上运行。 - gabriel14

这个库在我本地运行良好，但是在托管服务器上，我会不时地看到以下错误。有时会生成PDF，但有时会抛出以下错误。 “错误。处理您的请求时发生错误。无法生成PDF：（退出代码：1）”。 - user2347528

它运行良好，符合预期，但我在我的PDF文件中看到了一些质量问题，我们能否改善这个问题？ - Bharat

@VitaliyFedorchenko，我今天尝试了Nreco，但是我收到一个错误消息，说我需要许可证，而我正在尝试免费版本。这真的很痛苦，因为没有办法在事先测试之前购买产品，更不用说我的用例属于那些理论上可以免费支持的nreco了。 - DoctorPrisme

显示剩余7条评论

25

大多数HTML转PDF的转换器依赖于IE来进行HTML解析和渲染，当用户更新IE时，这可能会出现问题。这里有一种不依赖于IE的转换器。

代码大致如下：

EO.Pdf.HtmlToPdf.ConvertHtml(htmlText, pdfFileName);

像许多其他转换器一样，您可以传递文本、文件名或URL。结果可以保存到文件或流中。

- Jason

42

这句话的意思是“它没有用处，因为你必须购买这个图书馆”。 - d1jhoni1b

54

d1jhoni1b这个工具怎么会变得没用了呢？如果它是一款付费工具，那么它可能被认为很贵，但仅从这个标准来看并不意味着它没有用处。 - Don Rolling

4

确实，EO.Pdf并没有使用IE浏览器。但是它似乎会在后台生成32位的 WebKit 浏览器实例。查看进程列表，你会发现它们是指向 EO.PDF DLL 的 rundll32.exe 实例。因此，在我看来，它仍然有点不正规。 - Matt

2

它不支持 media="print"，这真的很痛苦。 - Marat Faskhiev

21

单人开发者许可证价格为650美元。这很昂贵。 - Abhijeet Nagre

显示剩余2条评论

22

对于所有在.net 5及以上寻找工作解决方案的人，这里有一个。

以下是我的工作解决方案。

使用`wkhtmltopdf`：

从这里下载并安装最新版本的wkhtmltopdf。
使用以下代码。

public static string HtmlToPdf(string outputFilenamePrefix, string[] urls,
    string[] options = null,
    string pdfHtmlToPdfExePath = @"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe")
{
    string urlsSeparatedBySpaces = string.Empty;
    try
    {
        //Determine inputs
        if ((urls == null) || (urls.Length == 0))
            throw new Exception("No input URLs provided for HtmlToPdf");
        else
            urlsSeparatedBySpaces = String.Join(" ", urls); //Concatenate URLs

        string outputFilename = outputFilenamePrefix + "_" + DateTime.Now.ToString("yyyy-MM-dd-hh-mm-ss-fff") + ".PDF"; // assemble destination PDF file name

        var p = new System.Diagnostics.Process()
        {
            StartInfo =
            {
                FileName = pdfHtmlToPdfExePath,
                Arguments = ((options == null) ? "" : string.Join(" ", options)) + " " + urlsSeparatedBySpaces + " " + outputFilename,
                UseShellExecute = false, // needs to be false in order to redirect output
                RedirectStandardOutput = true,
                RedirectStandardError = true,
                RedirectStandardInput = true, // redirect all 3, as it should be all 3 or none
                WorkingDirectory = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location))
            }
        };

        p.Start();

        // read the output here...
        var output = p.StandardOutput.ReadToEnd();
        var errorOutput = p.StandardError.ReadToEnd();

        // ...then wait n milliseconds for exit (as after exit, it can't read the output)
        p.WaitForExit(60000);

        // read the exit code, close process
        int returnCode = p.ExitCode;
        p.Close();

        // if 0 or 2, it worked so return path of pdf
        if ((returnCode == 0) || (returnCode == 2))
            return outputFilename;
        else
            throw new Exception(errorOutput);
    }
    catch (Exception exc)
    {
        throw new Exception("Problem generating PDF from HTML, URLs: " + urlsSeparatedBySpaces + ", outputFilename: " + outputFilenamePrefix, exc);
    }
}

然后调用上述方法：HtmlToPdf("test", new string[] { "https://www.google.com" }, new string[] { "-s A5" });
如果您需要将HTML字符串转换为PDF，请微调上述方法，并将Arguments替换为Process StartInfo，如下所示：$@"/C echo | set /p=""{htmlText}"" | ""{pdfHtmlToPdfExePath}"" {((options == null) ? "" : string.Join(" ", options))} - ""C:\Users\xxxx\Desktop\{outputFilename}""";

这种方法的缺点：

截至本答案发布时，最新版本的wkhtmltopdf不支持最新的HTML5和CSS3。因此，如果您尝试导出任何具有CSS GRID的html，则输出结果将与预期不同。
您需要处理并发问题。

使用`chrome headless`：

从这里下载并安装最新的Chrome浏览器。
使用下面的代码。

var p = new System.Diagnostics.Process()
{
    StartInfo =
    {
        FileName = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe",
        Arguments = @"/C --headless --disable-gpu --run-all-compositor-stages-before-draw --print-to-pdf-no-header --print-to-pdf=""C:/Users/Abdul Rahman/Desktop/test.pdf"" ""C:/Users/Abdul Rahman/Desktop/grid.html""",
    }
};

p.Start();

// ...then wait n milliseconds for exit (as after exit, it can't read the output)
p.WaitForExit(60000);

// read the exit code, close process
int returnCode = p.ExitCode;
p.Close();

这将把 html 文件转换为 pdf 文件。
如果您需要将一些 url 转换为 pdf，则使用以下内容作为 Process StartInfo 的 Argument

@"/C --headless --disable-gpu --run-all-compositor-stages-before-draw --print-to-pdf-no-header --print-to-pdf=""C:/Users/Abdul Rahman/Desktop/test.pdf"" ""https://www.google.com""",

此方法的缺点：

这适用于最新的 HTML5 和 CSS3 特性。输出将与您在浏览器中查看的相同，但是当通过 IIS 运行时，您需要在 LocalSystem 身份下运行应用程序的 AppliactionPool，或者您需要为 IISUSRS 提供 read/write 访问权限。

使用 `Selenium WebDriver`：

翻译成中文：

安装 Nuget Packages Selenium.WebDriver 和 Selenium.WebDriver.ChromeDriver。
使用以下代码。

public async Task<byte[]> ConvertHtmlToPdf(string html)
{
    var directory = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.CommonDocuments), "ApplicationName");
    Directory.CreateDirectory(directory);
    var filePath = Path.Combine(directory, $"{Guid.NewGuid()}.html");
    await File.WriteAllTextAsync(filePath, html);

    var driverOptions = new ChromeOptions();
    // In headless mode, PDF writing is enabled by default (tested with driver major version 85)
    driverOptions.AddArgument("headless");
    using var driver = new ChromeDriver(driverOptions);
    driver.Navigate().GoToUrl(filePath);

    // Output a PDF of the first page in A4 size at 90% scale
    var printOptions = new Dictionary<string, object>
    {
        { "paperWidth", 210 / 25.4 },
        { "paperHeight", 297 / 25.4 },
        { "scale", 0.9 },
        { "pageRanges", "1" }
    };
    var printOutput = driver.ExecuteChromeCommandWithResult("Page.printToPDF", printOptions) as Dictionary<string, object>;
    var pdf = Convert.FromBase64String(printOutput["data"] as string);

    File.Delete(filePath);

    return pdf;
}

这种方法的优点：

只需进行Nuget安装，即可与最新的HTML5和CSS3功能一起正常工作。输出将与您在浏览器中查看的相同。

这种方法的缺点：

此方法需要在运行应用程序的服务器上安装最新版本的Chrome浏览器。
如果服务器上的Chrome浏览器版本已更新，则需要更新Selenium.WebDriver.ChromeDriver Nuget包。否则，由于版本不匹配，将引发运行时错误。

如果我们在docker中运行应用程序，则可以克服上述缺点。我们所需要做的就是在使用Dockerfile构建应用程序映像时安装Chrome。

使用此方法，请确保在.csproj文件中添加<PublishChromeDriver>true</PublishChromeDriver>，如下所示：

<PropertyGroup>
  <TargetFramework>net5.0</TargetFramework>
  <LangVersion>latest</LangVersion>
  <Nullable>enable</Nullable>
  <PublishChromeDriver>true</PublishChromeDriver>
</PropertyGroup>

发布项目时，将会发布 chrome driver。

这是我的工作项目存储库链接 - HtmlToPdf

使用 `JavaScript` 中的 `window.print()` 从浏览器生成 PDF

如果用户从浏览器使用您的应用程序，则可以依赖于 JavaScript 并使用必要的 print media css 和 window.print() 从浏览器生成 PDF。例如，在库存应用程序中从浏览器生成发票。

此方法的优点：

不依赖任何工具。
直接从 HTML、CSS 和 JS 在浏览器中生成 PDF。
更快
支持所有最新的 CSS 属性。

此方法的缺点：

在像 Blazor 这样的 SPA 中，我们需要通过 iframe 对页面的部分进行打印。

在尝试了多种可用选项并最终实现了基于Selenium的解决方案后，我花了将近两天时间才得出上述答案，并且它已经可以正常工作。希望这能帮助你并节省你的时间。

- fingers10

我会回过头来回答这个问题。 - ttugates

https://github.com/ststeiger/PdfSharpCore 怎么样？ - airstrike

Selenium WebDriver选项是否需要在托管应用程序的服务器上安装Chrome？ - dalemac

1

@dalemac 是的。需要在服务器上安装 Chrome 浏览器。已更新答案，包含此信息。 - fingers10

1

@KJ 这取决于应用程序托管或运行的操作系统。 - fingers10

显示剩余7条评论

11

您可以使用Google Chrome的无头模式打印到PDF功能。我发现这是最简单但也是最强大的方法。

var url = "https://dev59.com/AnRB5IYBdhLWcg3wpYmo";
var chromePath = @"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe";
var output = Path.Combine(Environment.CurrentDirectory, "printout.pdf");
using (var p = new Process())
    {
        p.StartInfo.FileName = chromePath;
        p.StartInfo.Arguments = $"--headless --disable-gpu --print-to-pdf={output} {url}";
        p.Start();
        p.WaitForExit();
    }

- Leonard AB

嘿，对于拥有的服务器和虚拟专用服务器来说，这真的很酷。谢谢分享。 - mjb

3

为了让 IIS 中的 ASP.NET 运行外部程序并具有写入权限，需要在应用程序池的高级设置中将身份设置为 "LocalSystem"。 - mjb

我喜欢这种方法，但是如果请求的 URL 需要更具体的设置，比如头部信息、Cookie 甚至是 POST 方法，该怎么处理？ - Tấn Nguyên

它能处理 HTML 字符串吗？而不是 URL。 - FritsJ

我有一个问题。PDF转换没有完全加载页面。 - Bibin

@TấnNguyên 你可能需要设置自己的网络服务来完成这个任务，然后输出HTML内容，并将你的网络服务URL提供给Chrome。 - Nacht

7

很可能大多数项目会包装一个C/C++引擎而不是从头开始实现C#解决方案。可以尝试使用Gotenberg项目。

测试一下它：

docker run --rm -p 3000:3000 thecodingmachine/gotenberg:6

Curl示例

curl --request POST \
    --url http://localhost:3000/convert/url \
    --header 'Content-Type: multipart/form-data' \
    --form remoteURL=https://brave.com \
    --form marginTop=0 \
    --form marginBottom=0 \
    --form marginLeft=0 \
    --form marginRight=0 \
    -o result.pdf

C#示例.cs

using System;
using System.Net.Http;
using System.Threading.Tasks;
using System.IO;
using static System.Console;

namespace Gotenberg
{
    class Program
    {
        public static async Task Main(string[] args)
        {
            try
            {
                var client = new HttpClient();            
                var formContent = new MultipartFormDataContent
                    {
                        {new StringContent("https://brave.com/"), "remoteURL"},
                        {new StringContent("0"), "marginTop" }
                    };
                var result = await client.PostAsync(new Uri("http://localhost:3000/convert/url"), formContent);
                await File.WriteAllBytesAsync("brave.com.pdf", await result.Content.ReadAsByteArrayAsync());
            }
            catch (Exception ex)
            {
                WriteLine(ex);
            }
        }
    }
}

编译：

csc sample.cs -langversion:latest -reference:System.Net.Http.dll && mono ./sample.exe

- Alex Nolasco

7

这是一个免费的库，非常容易使用： OpenHtmlToPdf

string timeStampForPdfName = DateTime.Now.ToString("yyMMddHHmmssff");

string serverPath = System.Web.Hosting.HostingEnvironment.MapPath("~/FolderName");
string pdfSavePath = Path.Combine(@serverPath, "FileName" + timeStampForPdfName + ".FileExtension");


//OpenHtmlToPdf Library used for Performing PDF Conversion
var pdf = Pdf.From(HTML_String).Content();

//FOr writing to file from a ByteArray
 File.WriteAllBytes(pdfSavePath, pdf.ToArray()); // Requires System.Linq

- Abhishek Sengupta

那似乎是一个Java库，而不是.net/C#的库。 - Andreas Reiff

@AndreasReiff 不是，我只是放了这段来自 .net 代码的片段。 - Abhishek Sengupta

好的，谢谢。看起来有一个同名的Java库也在Github上。不过也有一个同名的Nuget包。 - Andreas Reiff

1

没错，@AndreasReiff - Abhishek Sengupta

6

2018年的更新，让我们使用标准的HTML+CSS=PDF方程！

对于需要将HTML转换为PDF的需求来说，有一个好消息。正如这个答案所示，W3C标准css-break-3将解决这个问题...它是候选推荐，计划在2017或2018年进行测试后成为最终推荐。

虽然并不完全标准，但有一些解决方案，比如针对C#的插件，就像print-css.rocks展示的那样。

- Peter Krauss

2

print-css.rocks提供的解决方案价格为：PDFreactor 2,950.00美元，Prince 3,800美元，Antenna House Formatter V7 5,000.00美元。而Weasyprint似乎是针对Python的。 - MDave

4

以下是使用iTextSharp（iTextSharp + itextsharp.xmlworker）将HTML + CSS转换为PDF的示例。

using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.tool.xml;


byte[] pdf; // result will be here

var cssText = File.ReadAllText(MapPath("~/css/test.css"));
var html = File.ReadAllText(MapPath("~/css/test.html"));

using (var memoryStream = new MemoryStream())
{
        var document = new Document(PageSize.A4, 50, 50, 60, 60);
        var writer = PdfWriter.GetInstance(document, memoryStream);
        document.Open();

        using (var cssMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssText)))
        {
            using (var htmlMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
            {
                XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, htmlMemoryStream, cssMemoryStream);
            }
        }

        document.Close();

        pdf = memoryStream.ToArray();
}

- Sergey Malyutin

1

请注意，iTextSharp 与 XHtml 一起使用，并且对您的 html 质量非常敏感。它会出现错误，而 SelectPdf 和 HiqPdf 则不会。 - Savage

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Anestis Kivranoglou · Accepted Answer

编辑：新建议 使用PdfSharp的HTML渲染器生成PDF

(尝试了wkhtmltopdf并建议避免后)

HtmlRenderer.PdfSharp是一个完全由C#管理的代码，易于使用，线程安全且最重要的是免费(New BSD许可证)解决方案。

用法

Download HtmlRenderer.PdfSharp nuget package.

Use Example Method.

public static Byte[] PdfSharpConvert(String html)
{
    Byte[] res = null;
    using (MemoryStream ms = new MemoryStream())
    {
        var pdf = TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf(html, PdfSharp.PageSize.A4);
        pdf.Save(ms);
        res = ms.ToArray();
    }
    return res;
}

一个非常好的替代品是免费版本的iTextSharp。

直到4.1.6版本，iTextSharp都是根据LGPL许可证进行许可的，而4.16版本（或可能还有分支）可用作软件包并可自由使用。当然，有人可以使用继续的5+付费版本。

我试图在我的项目中集成wkhtmltopdf解决方案，并遇到了一堆障碍。

出于以下原因，我个人会避免在托管企业应用程序上使用基于wkhtmltopdf的解决方案。

首先，wkhtmltopdf是C++实现的，而不是C#，将其嵌入到您的C#代码中可能会遇到各种问题，特别是在项目的32位和64位构建之间切换时。必须尝试几个解决方法，包括条件性项目构建等，以避免在不同的机器上出现“无效格式异常”。

如果您管理自己的虚拟机，那么没问题。但如果您的项目在受限环境中运行，例如（Azure（实际上由TuesPenchin作者提到）， Elastic Beanstalk等），那么仅为了使wkhtmltopdf工作而配置该环境就是一场噩梦。

wkhtmltopdf在服务器内创建文件，因此您必须管理用户权限并授予wkhtmltopdf运行的位置“写”访问权限。

Wkhtmltopdf作为独立应用程序运行，因此它不受您的IIS应用程序池管理。因此，您必须将其托管为另一台机器上的服务，否则您将在生产服务器内经历处理峰值和内存消耗。

它使用临时文件生成pdf，在像AWS EC2这样具有非常慢磁盘I/O的情况下，这是一个很大的性能问题。

许多用户报告的最受憎的“无法加载DLL'wkhtmltox.dll'”错误。

--- 预编辑区域 ---

对于想在简单应用/环境中从HTML生成PDF的任何人，我将我的旧帖子留作建议。

TuesPechkin

https://www.nuget.org/packages/TuesPechkin/

特别适用于MVC Web应用程序 (但我认为您可以在任何 .net 应用程序中使用它)

Rotativa

https://www.nuget.org/packages/Rotativa/

他们都使用wkhtmtopdf二进制文件将html转换为pdf。它使用webkit引擎渲染页面，因此还可以解析css样式表。

它们提供了与C＃轻松集成的无缝体验。

Rotativa还可以直接从任何Razor视图生成PDF。

此外，对于真实世界的Web应用程序，它们还可以管理线程安全等问题...

在.NET中将HTML转换为PDF

使用wkhtmltopdf：

使用chrome headless：

使用 Selenium WebDriver：

使用 JavaScript 中的 window.print() 从浏览器生成 PDF

2018年的更新，让我们使用标准的HTML+CSS=PDF方程！

使用`wkhtmltopdf`：

使用`chrome headless`：

使用 `Selenium WebDriver`：

使用 `JavaScript` 中的 `window.print()` 从浏览器生成 PDF