如何在VBA中将UTF-8编码的字符串写入文本文件,例如:
Dim fnum As Integer
fnum = FreeFile
Open "myfile.txt" For Output As fnum
Print #fnum, "special characters: äöüß" 'latin-1 or something by default
Close fnum
在应用程序级别上是否有某些设置?
我在网页上找到了答案:
Dim fsT As Object
Set fsT = CreateObject("ADODB.Stream")
fsT.Type = 2 'Specify stream type - we want To save text/string data.
fsT.Charset = "utf-8" 'Specify charset For the source text data.
fsT.Open 'Open the stream And write binary data To the object
fsT.WriteText "special characters: äöüß"
fsT.SaveToFile sFileName, 2 'Save binary data To disk
肯定不是我所期望的...
object.CreateTextFile(filename[, overwrite[, unicode]])
object.OpenTextFile(filename[, iomode[, create[, format]]])
CreateTextFile:
fileName = "filename"
Set fso = CreateObject("Scripting.FileSystemObject")
Set out = fso.CreateTextFile(fileName, True, True)
out.WriteLine ("Hello world!")
...
out.close
'OpenTextFile
Set fso = CreateObject("Scripting.FileSystemObject")
Set out = fso.OpenTextFile("filename", ForAppending, True, 1)
out.Write "Hello world!"
...
out.Close
更多内容请参阅MSDN文档
FileSystemObject
类,对吧?我应该怎么向这个文件写入数据呢?使用 .Write
吗? - Karsten W.在文件开头写入字节顺序标记(Byte Order Mark),这在UTF-8文件中是不必要的,而且一些应用程序(例如SAP)不喜欢它。
这里有另一种方法可以实现 - 使用API函数WideCharToMultiByte:
Option Explicit
Private Declare Function WideCharToMultiByte Lib "kernel32.dll" ( _
ByVal CodePage As Long, _
ByVal dwFlags As Long, _
ByVal lpWideCharStr As Long, _
ByVal cchWideChar As Long, _
ByVal lpMultiByteStr As Long, _
ByVal cbMultiByte As Long, _
ByVal lpDefaultChar As Long, _
ByVal lpUsedDefaultChar As Long) As Long
Private Sub getUtf8(ByRef s As String, ByRef b() As Byte)
Const CP_UTF8 As Long = 65001
Dim len_s As Long
Dim ptr_s As Long
Dim size As Long
Erase b
len_s = Len(s)
If len_s = 0 Then _
Err.Raise 30030, , "Len(WideChars) = 0"
ptr_s = StrPtr(s)
size = WideCharToMultiByte(CP_UTF8, 0, ptr_s, len_s, 0, 0, 0, 0)
If size = 0 Then _
Err.Raise 30030, , "WideCharToMultiByte() = 0"
ReDim b(0 To size - 1)
If WideCharToMultiByte(CP_UTF8, 0, ptr_s, len_s, VarPtr(b(0)), size, 0, 0) = 0 Then _
Err.Raise 30030, , "WideCharToMultiByte(" & Format$(size) & ") = 0"
End Sub
Public Sub writeUtf()
Dim file As Integer
Dim s As String
Dim b() As Byte
s = "äöüßµ@€|~{}[]²³\ .." & _
" OMEGA" & ChrW$(937) & ", SIGMA" & ChrW$(931) & _
", alpha" & ChrW$(945) & ", beta" & ChrW$(946) & ", pi" & ChrW$(960) & vbCrLf
file = FreeFile
Open "C:\Temp\TestUtf8.txt" For Binary Access Write Lock Read Write As #file
getUtf8 s, b
Put #file, , b
Close #file
End Sub
CreateTextFile(filename, [overwrite [, unicode]])
创建一个文件“作为Unicode或ASCII文件。如果文件以Unicode格式创建,则值为True; 如果文件以ASCII格式创建,则值为False。如果省略,则假定为ASCII文件。”文件存储unicode字符没问题,但是使用什么编码?未编码的Unicode无法在文件中表示。
VBA doc页面中的OpenTextFile(filename[, iomode[, create[, format]]])
提供了第三个选项:
UnicodeEncoding
是UTF-16。文档还描述了UTF-8也是“Unicode编码”,这对我来说很有意义。但我还不知道如何为VBA输出指定UTF-8,也不能确定使用OpenTextFile(,,,1)写入磁盘的数据是否以UTF-16编码。Tamalek的帖子很有帮助。我不想更改所有代码以支持多个UTF8字符串,因此我让我的代码按照原来的方式运行。在文件保存后(使用ANSI代码,因为这是Excel的默认设置),我使用以下代码将文件转换为UTF-8:
Sub convertTxttoUTF(sInFilePath As String, sOutFilePath As String)
Dim objFS As Object
Dim iFile As Double
Dim sFileData As String
'Init
iFile = FreeFile
Open sInFilePath For Input As #iFile
sFileData = Input$(LOF(iFile), iFile)
sFileData = sFileData & vbCrLf
Close iFile
'Open & Write
Set objFS = CreateObject("ADODB.Stream")
objFS.Charset = "utf-8"
objFS.Open
objFS.WriteText sFileData
'Save & Close
objFS.SaveToFile sOutFilePath, 2 '2: Create Or Update
objFS.Close
'Completed
Application.StatusBar = "Completed"
End Sub
我使用这个子程序的方法如下(以下是一个示例):
Call convertTxttoUTF("c:\my.json", "c:\my-UTF8.json")
Sub convertTxttoUTF(sInFilePath As String, sOutFilePath As String)
Dim objStreamUTF8 As Object
Dim objStreamUTF8NoBOM As Object
Dim iFile As Double
Dim sFileData As String
Const adSaveCreateOverWrite = 2
Const adTypeBinary = 1
Const adTypeText = 2
'Init
iFile = FreeFile
Open sInFilePath For Input As #iFile
sFileData = Input(LOF(iFile), iFile)
Close iFile
'Open files
Set objStreamUTF8 = CreateObject("ADODB.Stream")
Set objStreamUTF8NoBOM = CreateObject("ADODB.Stream")
' wrute the fules
With objStreamUTF8
.Charset = "UTF-8"
.Open
.WriteText sFileData
.Position = 0
.SaveToFile sOutFilePath, adSaveCreateOverWrite
.Type = adTypeText
.Position = 3
End With
With objStreamUTF8NoBOM
.Type = adTypeBinary
.Open
objStreamUTF8.CopyTo objStreamUTF8NoBOM
.SaveToFile sOutFilePath, 2
End With
' close the files
objStreamUTF8.Close
objStreamUTF8NoBOM.Close
End Sub
我使用了这个答案来解决文件开头的BOM未知字符问题。
将字符串转换为UTF-8字符串的传统方法如下:
StrConv("hello world",vbFromUnicode)
简单来说:
Dim fnum As Integer
fnum = FreeFile
Open "myfile.txt" For Output As fnum
Print #fnum, StrConv("special characters: äöüß", vbFromUnicode)
Close fnum
无需特殊的COM对象
StrConv(vbFromUnicode)
返回一个字节数组,其中包含将给定的 Unicode 字符串(在 VBA 中所有字符串都是 Unicode)转换为非 Unicode 程序的当前系统代码页的结果。此时,字符串中不在该代码页上的“特殊字符”将丢失(首先您无法在原始文字面量中拥有它们,因为 VBA 代码编辑器不是 Unicode)。 - GSergPrint
,它被设计用于字符串,因此认为传递的数据是常规Unicode字符串,所以它再次将其转换为“从Unicode开始”(https://dev59.com/kIDba4cB1Zd3GeqPBTnZ#23980044),因此从中删除了一半的字符。结果是残缺不全的垃圾出现在文件中。上面显示的代码创建了一个文本文件,长度为14个字节,原始字符串文字包含24个字符。在上述所有情况下,“Unicode”表示“UTF-16”。UTF-8不以任何形式进入场景。 - GSergStrConv
的文档确实很差,但你通过将其结果分配给字节数组来证明了它的效果。备注部分更加准确。否则请参见 https://www.joelonsoftware.com/articles/Unicode.html。您还可以按原样运行代码,并观察无法从生成的文件中恢复字符串“special characters: äöüß”。 - GSerg
Ã
。 - Panagiotis Kanavos