用pandoc将markdown转换为docx时如何设置图片大小

Question

用pandoc将markdown转换为docx时如何设置图片大小

22

我使用 Rstudio 中的 Rmarkdown 编写报告。当使用 knitr 将其转换为 html 格式时，knitr 也会生成一个 markdown 文件。我使用以下方式将该文件转换为 pandoc：

pandoc -f markdown -t docx input.md -o output.docx

output.docx 文件很好，但有一个问题：图像的大小被改变了，我需要在 Word 中手动调整图像大小。有没有什么方法可以使用（例如 pandoc 的选项），使得图像大小正确呢？

- Stéphane Laurent

你使用的 Pandoc 版本是哪个？如果使用的是过时的版本，一个可能的解决方法是在 knitr 中渲染较小的图像。 - daroczig

这是版本1.9.4.2。我不想改变knitr内部的大小，因为这些大小在输出的HTML文件中很好。 - Stéphane Laurent

我现在已经尝试了最新的（Windows）Pandoc版本。但是这并没有改变任何事情。 - Stéphane Laurent

3

我很愿意找到这个问题的答案... - Tal Galili

1

@TalGalili 请查看我使用ImageMagick的解决方案。 - Stéphane Laurent

4个回答

4

我还想将一个R markdown转换成html和.docx/.odt格式，并且要求图片的尺寸和分辨率都很好。目前为止，我发现最好的方法是在.md文档中明确定义图形的分辨率和大小（dpi、fig.width和fig.height选项）。如果这样做，你就可以得到适合出版的好图形，并且odt/docx文件也可以使用。但是，如果你使用比默认的72dpi更高的dpi，那么在html文件中图形会看起来太大。以下是我处理这个问题的三种方法（注意，我使用带有spin()语法的R脚本）：

1）在knitr选项中使用out.extra ='WIDTH="75%"'。这将强制所有html图形占用窗口宽度的75%。这是一个快速解决方案，但如果你有不同大小的绘图，则不是最优解。（注意：我更喜欢使用厘米而不是英寸，因此每个地方都有/2.54）

library(knitr)
opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), dpi = 400,
               fig.width = 8/2.54, fig.height = 8/2.54,
               out.extra ='WIDTH="75%"'
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

2) 使用 out.width 和 out.height 在 html 文件中指定图形的像素大小。我使用一个常量“sc”来缩小绘图大小以适应 html 输出。这是更准确的方法，但问题在于每个图形都必须定义 fig.witdth/height 和 out.width/height，这真的很无聊！理想情况下，您应该能够在全局选项中指定例如 out.width = 150*fig.width（其中 fig.width 从块到块变化）。也许有类似的东西是可能的，但我不知道如何做。

#+ echo = FALSE
library(knitr)
sc <- 150
opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), dpi = 400,
                fig.width = 8/2.54, fig.height = 8/2.54,
                out.width = sc*8/2.54, out.height = sc*8/2.54
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54, out.width= sc * 14/2.54, out.height= sc * 10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

请注意，对于这两个解决方案，我认为您不能直接使用pandoc将您的md文件转换为odt（图形不包括在内）。我将md转换为html，然后将html转换为odt（没有尝试过docx）。类似这样（如果以前的R脚本名为“figsize1.R”）：

library(knitr)
setwd("/home/gilles/")
spin("figsize1.R")

system("pandoc figsize1.md -o figsize1.html")
system("pandoc figsize1.html -o figsize1.odt")

3) 只需编译两次您的文档，一次使用低dpi值（约96）用于html输出，一次使用高分辨率（约300）用于odt / docx输出。这是我现在偏爱的方式。主要缺点是必须进行两次编译，但对我来说这不是真正的问题，因为我通常只需要在工作的最后阶段才提供给最终用户odt文件。我经常使用Rstudio中的html笔记本按钮定期编译html。

#+ echo = FALSE
library(knitr)

opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), 
               fig.width = 8/2.54, fig.height = 8/2.54
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

然后使用以下脚本编译这两个输出（注意，您可以直接将md文件转换为html）：

library(knitr)
setwd("/home/gilles")

opts_chunk$set(dpi=96)
spin("figsize3.R", knit=FALSE)
knit2html("figsize3.Rmd")

opts_chunk$set(dpi=400)
spin("figsize3.R")
system("pandoc figsize3.md -o figsize3.odt")

- Gilles San Martin

好的想法，谢谢。我会尽快尝试。欢迎来到这个网站 :) - Stéphane Laurent

你尝试了哪一种解决方案？在不同的代码块中，你是否指定了不同的fig.width/fig.height和/或out.width/out.height？ - Gilles San Martin

第三种解决方案。是的，我总是在每个代码块中指定 fig.width 和 fig.height（但不包括 out.width 和 out.height）。 - Stéphane Laurent

最后，我使用 dpi=400 来提高质量，然后使用 ImageMagick 调整大小，按照我的答案进行操作。 - Stéphane Laurent

1

嗨。好的，我在编织之前没有运行 opts_chunk$set。下次我会尝试。 - Stéphane Laurent

显示剩余2条评论

3

这里有一个使用ImageMagick从R脚本调整图片大小的解决方案。70%的比例似乎是一个不错的选择。

# the path containing the Rmd file :
wd <- "..."
setwd(wd)

# the folder containing the figures :
fig.path <- paste0(wd, "/figure")
# all png figures :
figures <- list.files(fig.path, pattern=".png", all.files=TRUE)

# (safety) create copies of the original files
dir.create(paste0(fig.path,"_copy"))
for(i in 1:length(figures)){
  fig <- paste0(fig.path, "/", figures[i])
  file.copy(fig,"figure_copy")
}

# resize all figures
for(i in 1:length(figures)){
    fig <- paste0(fig.path, "/", figures[i])
    comm <- paste("convert -resize 70%", fig, fig)
    shell(comm)
}

# then run pandoc from a command line  
# or from the pandoc() function :
library(knitr)
pandoc("MyReport.md", "docx")

关于ImageMagick的resize函数，更多信息请访问：www.perturb.org

- Stéphane Laurent

1

@TalGalili 谢谢。但是我担心使用 convert -resize 70% fig1.png fig1.png 调整大小时会有质量损失。 - Stéphane Laurent

为什么有人给这个答案点了踩？它是有效的，我现在经常使用它，并且回答了问题。那么为什么呢？ - Stéphane Laurent

1

抱歉，是我的错，不知怎么的误点了。当我注意到时，已经无法撤销了，太糟糕了。@Tal Galili：请投票支持我作为补偿。虽然“质量损失”的论点是有效的。 - Dieter Menne

1

不，没有。像素图形缩放效果很差，我不建议在重新调整大小后将其发送到某些高级期刊。 - Dieter Menne

1

是的，但这并不值得花费那么多精力。唯一完美的方法是创建pdf或eps，并从中转换到最终大小。这将始终产生完美的输出（好吧，对于最终的10x10像素不适用：]）。你应该可以通过这种方式修改你的概念，但请注意，除了Imagemagick之外，你还需要安装Ghostscript。 - Dieter Menne

显示剩余7条评论

2

这是我的解决方案：通过篡改Pandoc转换的docx文档，因为docx实际上只是一组xml文件，调整图片大小非常简单。

下面是从转换后的docx中提取出来的word/document.xml中图像的样子：

<w:p>
  <w:r>
    <w:drawing>
      <wp:inline>
        <wp:extent cx="1524000" cy="1524000" />
        ...
        <a:graphic>
          <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
            <pic:pic>
              ...
              <pic:blipFill>
                <a:blip r:embed="rId23" />
                ...
              </pic:blipFill>
              <pic:spPr bwMode="auto">
                <a:xfrm>
                  <a:off x="0" y="0" />
                  <a:ext cx="1524000" cy="1524000" />
                </a:xfrm>
                ...
              </pic:spPr>
            </pic:pic>
          </a:graphicData>
        </a:graphic>
      </wp:inline>
    </w:drawing>
  </w:r>
</w:p>

所以，将节点wp:extent和a:ext的cx和cy属性替换为所需值即可完成调整大小工作。以下R代码适用于我。最宽的图像将占据由变量out.width指定的整行宽度，其余部分将按比例调整大小。

require(XML)

## default linewidth (inch) for Word 2003
out.width <- 5.77
docx.file <- "report.docx"

## unzip the docx converted by Pandoc
system(paste("unzip", docx.file, "-d temp_dir"))
document.xml <- "temp_dir/word/document.xml"
doc <- xmlParse(document.xml)
wp.extent <- getNodeSet(xmlRoot(doc), "//wp:extent")
a.blip <- getNodeSet(xmlRoot(doc), "//a:blip")
a.ext <- getNodeSet(xmlRoot(doc), "//a:ext")

figid <- sapply(a.blip, xmlGetAttr, "r:embed")
figname <- dir("temp_dir/word/media/")
stopifnot(length(figid) == length(figname))
pdffig <- paste("temp_dir/word/media/",
                ## in case figure ids in docx are not in dir'ed order
                sort(figname)[match(figid, substr(figname, 1, nchar(figname) - 4))], sep="")

## get dimension info of included pdf figures
pdfsize <- do.call(rbind, lapply(pdffig, function (x) {
    fig.ext <- substr(x, nchar(x) - 2, nchar(x))
    pp <- pipe(paste(ifelse(fig.ext == 'pdf', "pdfinfo", "file"), x, sep=" "))
    pdfinfo <- readLines(pp); close(pp)
    sizestr <- unlist(regmatches(pdfinfo, gregexpr("[[:digit:].]+ X [[:digit:].]+", pdfinfo, ignore.case=T)))
    as.numeric(strsplit(sizestr, split=" x ")[[1]])
}))

## resizing pdf figures in xml DOM, with the widest figure taking up a line's width
wp.cx <- round(out.width*914400*pdfsize[,1]/max(pdfsize[,1]))
wp.cy <- round(wp.cx*pdfsize[, 2]/pdfsize[, 1])
wp.cx <- as.character(wp.cx)
wp.cy <- as.character(wp.cy)
sapply(1:length(wp.extent), function (i)
       xmlAttrs(wp.extent[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i]));
sapply(1:length(a.ext), function (i)
       xmlAttrs(a.ext[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i]));

## save hacked xml back to docx
saveXML(doc, document.xml, indent = F)
setwd("temp_dir")
system(paste("zip -r ../", docx.file, " *", sep=""))
setwd("..")
system("rm -fr temp_dir")

- lcn

你好。我还没有尝试过，但无论如何这看起来是一份很棒的工作（+1）。 - Stéphane Laurent

@StéphaneLaurent 希望我的代码能够自我解释。这里只考虑 PDF 图像。由于缩小的图像保持相对大小，如果最宽的图像太大，其他图像在最终的 docx 文件中会显得非常小。请记住这一点。 - lcn

PDF图形？只生成PNG图形。 - Stéphane Laurent

@StéphaneLaurent，请查看我的更新代码。它现在处理PDF和非PDF图形，仍然是最宽的占用整个宽度线，并且其余部分按比例调整大小。 - lcn

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Stéphane Laurent · Accepted Answer

一种简单的方法是在各自块选项中包含比例因子k：

{r, fig.width=8*k, fig.height=6*k}

在全局块选项中有一个变量dpi:

opts_chunk$set(dpi = dpi)

然后，在全局环境中编织Rmd文件之前，您可以设置dpi和k的值：

dpi <<- 96    
k <<- 1

您可以将它们作为一个块在Rmd文件中设置（例如，在第一个块中设置k）。