使用 "cat" 命令在 R 中将非英文字符写入 .html 文件

3

以下是显示问题的代码:

myPath = getwd()
cat("abcd", append = T, file =paste(myPath,"temp1.html", sep = "\\")) # This is fine
cat("<BR/><BR/><BR/>", append = T, file =paste(myPath,"temp1.html", sep = "\\")) # This is fine
cat("שלום", append = F, file =paste(myPath,"temp1.html", sep = "\\")) # This text gets garbled when the html is opened using google chrome on windows 7.
cat("שלום", append = F, file =paste(myPath,"temp1.txt", sep = "\\")) # but if I open this file in a text editor - the text looks fine

# The text in the HTML folder would look as if I where to run this in R:
(x <- iconv("שלום", from = "CP1252", to = "UTF8") )
# But if I where to try and put it into the file, it wouldn't put anything in:
cat(x, append = T, file =paste(myPath,"temp1.html", sep = "\\")) # empty

编辑: 我也尝试使用以下编码(没有成功)

ff <-file(paste(myPath,"temp1.html", sep = "\\"), encoding="CP1252")
cat("שלום", append = F, file =ff)
ff<-file(paste(myPath,"temp1.html", sep = "\\"), encoding="utf-8")
cat("שלום", append = F, file =ff)
ff<-file(paste(myPath,"temp1.html", sep = "\\"), encoding="ANSI_X3.4-1986")
cat("שלום", append = F, file =ff)
ff<-file(paste(myPath,"temp1.html", sep = "\\"), encoding="iso8859-8")
cat("שלום", append = F, file =ff)

有什么建议吗?谢谢。

看起来你需要一些睡眠... =) Sys.sleep(sample(3600 * .5:8.5, 1)) - aL3xa
请看这个关于使用UTF-8编码保存csv文件的问题链接 - Marek
嗨Marek,当我尝试使用它时,文本会变成"\xf9\xec\xe5\xed"。 - Tal Galili
3个回答

2

问题不在于 R(R 正确地生成了 UTF-8 编码的输出)……只是在没有明确指定编码的情况下,您的 Web 浏览器会假设错误的编码。请改用以下代码片段(从 R 内部):

<html>
    <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8">
    </head>
    <body>
        שלום
    </body>
</html>

这指定了正确的编码方式(UTF-8),因此使浏览器能够正确地处理以下文本。


1

你的代码有些冗余。第5行的temp1.txt是打错了吗(应该是.html)?无论如何,也许你应该在<meta>标签中设置charset

以此为例:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<%
cat("abcd")
cat("<BR/><BR/><BR/>")
cat("שלום")
cat("שלום")
(x <- iconv("שלום", from = "CP1252", to = "UTF8") )
cat(x)
-%>
</body>
</html>

这是一个 brew 代码,所以如果你继续执行 brew,你将得到正确的响应。简而言之,关键字是 charset


1

尝试这种方式

cat("abcd", file = (con <- file("temp1.html", "w", encoding="UTF-8"))); close(con)

谢谢gd047,但它不起作用。 它让我得到这个:שלו×。 而不是שלום。 - Tal Galili

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接