在Excel单元格中将带有标签的HTML文本转换为格式化文本

48
有没有一种方法可以将HTML导入Excel,使其呈现为富文本格式(最好使用VBA)?基本上,当我将其粘贴到Excel单元格时,我希望将其转换为这样的格式:
<html><p>This is a test. Will this text be <b>bold</b> or <i>italic</i></p></html>

转化为:

这是一个测试。这段文本会是 粗体 还是 斜体

7个回答

30

是的,这是可能的。实际上,让Internet Explorer为您完成这项工作。

我的假设

  1. 我假设html文本位于Sheet1的单元格A1中。您也可以使用变量。
  2. 如果您有一个充满html值的列,则只需将下面的代码放入循环中即可。

代码

Sub Sample()
    Dim Ie As Object
    
    Set Ie = CreateObject("InternetExplorer.Application")
    
    With Ie
        .Visible = False
        
        .Navigate "about:blank"
        
        .document.body.InnerHTML = Sheets("Sheet1").Range("A1").Value
        
        .document.body.createtextrange.execCommand "Copy"
        ActiveSheet.Paste Destination:=Sheets("Sheet1").Range("A1")
        
        .Quit
    End With
End Sub

快照

在此输入图片描述


9
这个 VBA 脚本在我的电脑上执行到了 execCommand "Copy" 这一行出错了,我怀疑是因为我安装了 IE11,而该版本不支持这个命令。 - BornToCode
同样的事情也发生在我身上了,BornToCode。 - Dave.Gugg
1
这个解决方案似乎在最新版本的IE和Edge中已经失效了。 - punkcoder
它并没有死,看看tiQu的回答! - David G

15

我遇到了BornToCode在原始解决方案的评论中首先发现的同样的错误。由于不熟悉Excel和VBA,我花了一点时间才想出如何实施tiQU的解决方案。因此,我将其发布为下面的“白痴”解决方案。

  1. 首先在Excel中启用开发人员模式:链接
  2. 选择“开发人员”选项卡>“Visual Basic”
  3. 单击“查看”>“代码”
  4. 粘贴下面的代码,更新需要正确的单元格引用的行。
  5. 点击绿色运行箭头或按F5
Sub Sample()
    Dim Ie As Object
    Set Ie = CreateObject("InternetExplorer.Application")
    With Ie
        .Visible = False
        .Navigate "about:blank"
        .document.body.InnerHTML = Sheets("Sheet1").Range("I2").Value
             'update to the cell that contains HTML you want converted
        .ExecWB 17, 0
             'Select all contents in browser
        .ExecWB 12, 2
             'Copy them
        ActiveSheet.Paste Destination:=Sheets("Sheet1").Range("J2")
             'update to cell you want converted HTML pasted in
        .Quit
    End With
End Sub

请注意,这不适用于 Mac 版 Office(我花了一些时间才发现):ActiveX 不受支持。例如:https://answers.microsoft.com/en-us/mac/forum/macoffice2011-macexcel/macrovba-issue-in-excel-2011-for-mac/ec986aa8-b75e-4b20-888d-ed82536e8a76 - user3445853
这对我非常有效,但如何将图像固定到相关单元格? - Digerkam

11
你可以将HTML代码复制到剪贴板,然后将其以Unicode文本的形式粘贴回来。Excel会在单元格中呈现HTML。请查看此帖子:http://www.dailydoseofexcel.com/archives/2005/02/23/html-in-cells-ii/ 该文章中相关的宏代码如下:
Private Sub Worksheet_Change(ByVal Target As Range)

   Dim objData As DataObject
   Dim sHTML As String
   Dim sSelAdd As String

   Application.EnableEvents = False

   If Target.Cells.Count = 1 Then
      If LCase(Left(Target.Text, 6)) = "<html>" Then
         Set objData = New DataObject

         sHTML = Target.Text

         objData.SetText sHTML
         objData.PutInClipboard

         sSelAdd = Selection.Address
         Target.Select
         Me.PasteSpecial "Unicode Text"
         Me.Range(sSelAdd).Select

      End If
   End If

   Application.EnableEvents = True

End Sub

3
感谢您的询问。在 Excel 2010 中,我必须手动添加“FM20.DLL”到 “工具..引用” 中,以使“Microsoft Forms 2.0 对象库”中的“DataObject”可用。然后我遇到了一个问题,即“Me.PasteSpecial”不能接受“Me”(请不要开玩笑)。虽然“Target.PasteSpecial”出现了错误,但是“Sheets(1).Range("B51").PasteSpecial”可以正常工作。我放弃了尝试将粘贴内容合并到单元格中,只能使用一些 BR 标签来配置 HTML,以强制换行并控制输出宽度。 - AjV Jsy
这里的答案似乎展示了如何在没有添加引用的情况下将数据复制到剪贴板:https://dev59.com/b2Yq5IYBdhLWcg3wwDRA - Mark E.

9
如果IE示例无法工作,请使用此代码。不管怎样,这比启动IE实例更快。 以下是完整解决方案,基于:http://www.dailydoseofexcel.com/archives/2005/02/23/html-in-cells-ii/ 请注意,如果您的innerHTML全部是数字,例如“12345”,则在Excel中HTML格式不会完全生效,因为它对待数字不同。但是如果添加一个字符,例如在末尾添加一个空格,即 12345 + "& nbsp;",则可以正确格式化。
Sub test()
    Cells(1, 1).Value = "<HTML>1<font color=blue>a</font>" & _
                        "23<font color=red>4</font></HTML>"
    Dim rng As Range
    Set rng = ActiveSheet.Cells(1, 1)
    Worksheet_Change rng, ActiveSheet
End Sub


Private Sub Worksheet_Change(ByVal Target As Range, ByVal sht As Worksheet)

    Dim objData As DataObject ' Set a reference to MS Forms 2.0
    Dim sHTML As String
    Dim sSelAdd As String

    Application.EnableEvents = False

    If Target.Cells.Count = 1 Then

            Set objData = New DataObject
            sHTML = Target.Text
            objData.SetText sHTML
            objData.PutInClipboard
            Target.Select
            sht.PasteSpecial Format:="Unicode Text"
    End If

    Application.EnableEvents = True

End Sub

5
提醒其他人,我的默认引用库中没有MS Forms 2.0,因此我不得不添加它。在我的电脑上,它的路径是C:\WINDOWS\system32\FM20.dll。 - Dave.Gugg
3
太棒了!在德国版的Excel中,格式字符串是本地化的,所以我需要使用sht.PasteSpecial Format:="Unicode-Text"。对于其他语言版本,可以在宏中记录一次“Paste Special”(或“Paste Contents”)操作。 - Andre

9

我知道这个帖子很古老,但在分配innerHTML后,ExecWB对我起作用了:

.ExecWB 17, 0
'Select all contents in browser
.ExecWB 12, 2
'Copy them

然后将内容粘贴到Excel中。由于这些方法容易发生运行时错误,但在调试模式下经过一两次尝试后效果良好,如果它遇到错误,您可能需要告诉Excel再试一次。我通过向子程序添加此错误处理程序来解决这个问题,现在它工作正常:

Sub ApplyHTML()
  On Error GoTo ErrorHandler
    ...
  Exit Sub

ErrorHandler:
    Resume 
    'I.e. re-run the line of code that caused the error
Exit Sub
     
End Sub


1
不错。我已经采纳了你的建议并更新了我的帖子。 - Siddharth Rout

5

不错!非常流畅。

我很失望 Excel 不允许我们粘贴到合并单元格,而且将包含换行符的结果粘贴到“目标”单元格下面的连续行中,这意味着它对我根本没有用。我尝试了一些调整(取消合并/重新合并等),但是 Excel 删除了任何包含换行符的内容,所以那条路走不通。

最终,我想出了一个处理简单标签的例程,并且不使用与合并字段导致问题的“本地”Unicode转换器。希望其他人也会发现这有用:

Public Sub AddHTMLFormattedText(rngA As Range, strHTML As String, Optional blnShowBadHTMLWarning As Boolean = False)
    ' Adds converts text formatted with basic HTML tags to formatted text in an Excel cell
    ' NOTE: Font Sizes not handled perfectly per HTML standard, but I find this method more useful!

    Dim strActualText As String, intSrcPos As Integer, intDestPos As Integer, intDestSrcEquiv() As Integer
    Dim varyTags As Variant, varTag As Variant, varEndTag As Variant, blnTagMatch As Boolean
    Dim intCtr As Integer
    Dim intStartPos As Integer, intEndPos As Integer, intActualStartPos As Integer, intActualEndPos As Integer
    Dim intFontSizeStartPos As Integer, intFontSizeEndPos As Integer, intFontSize As Integer

    varyTags = Array("<b>", "</b>", "<i>", "</i>", "<u>", "</u>", "<sub>", "</sub>", "<sup>", "</sup>")

    ' Remove unhandled/unneeded tags, convert <br> and <p> tags to line feeds
    strHTML = Trim(strHTML)
    strHTML = Replace(strHTML, "<html>", "")
    strHTML = Replace(strHTML, "</html>", "")
    strHTML = Replace(strHTML, "<p>", "")
    While LCase(Right$(strHTML, 4)) = "</p>" Or LCase(Right$(strHTML, 4)) = "<br>"
        strHTML = Left$(strHTML, Len(strHTML) - 4)
        strHTML = Trim(strHTML)
    Wend
    strHTML = Replace(strHTML, "<br>", vbLf)
    strHTML = Replace(strHTML, "</p>", vbLf)

    strHTML = Trim(strHTML)

    ReDim intDestSrcEquiv(1 To Len(strHTML))
    strActualText = ""
    intSrcPos = 1
    intDestPos = 1
    Do While intSrcPos <= Len(strHTML)
        blnTagMatch = False
        For Each varTag In varyTags
            If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
                blnTagMatch = True
                intSrcPos = intSrcPos + Len(varTag)
                If intSrcPos > Len(strHTML) Then Exit Do
                Exit For
            End If
        Next
        If blnTagMatch = False Then
            varTag = "<font size"
            If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
                blnTagMatch = True
                intEndPos = InStr(intSrcPos, strHTML, ">")
                intSrcPos = intEndPos + 1
                If intSrcPos > Len(strHTML) Then Exit Do
            Else
                varTag = "</font>"
                If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
                    blnTagMatch = True
                    intSrcPos = intSrcPos + Len(varTag)
                    If intSrcPos > Len(strHTML) Then Exit Do
                End If
            End If
        End If
        If blnTagMatch = False Then
            strActualText = strActualText & Mid$(strHTML, intSrcPos, 1)
            intDestSrcEquiv(intSrcPos) = intDestPos
            intDestPos = intDestPos + 1
            intSrcPos = intSrcPos + 1
        End If
    Loop

    ' Clear any bold/underline/italic/superscript/subscript formatting from cell
    rngA.Font.Bold = False
    rngA.Font.Underline = False
    rngA.Font.Italic = False
    rngA.Font.Subscript = False
    rngA.Font.Superscript = False

    rngA.Value = strActualText

    ' Now start applying Formats!"
    ' Start with Font Size first
    intSrcPos = 1
    intDestPos = 1
    Do While intSrcPos <= Len(strHTML)
        varTag = "<font size"
        If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
            intFontSizeStartPos = InStr(intSrcPos, strHTML, """") + 1
            intFontSizeEndPos = InStr(intFontSizeStartPos, strHTML, """") - 1
            If intFontSizeEndPos - intFontSizeStartPos <= 3 And intFontSizeEndPos - intFontSizeStartPos > 0 Then
                Debug.Print Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
                If Mid$(strHTML, intFontSizeStartPos, 1) = "+" Then
                    intFontSizeStartPos = intFontSizeStartPos + 1
                    intFontSize = 11 + 2 * Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
                ElseIf Mid$(strHTML, intFontSizeStartPos, 1) = "-" Then
                    intFontSizeStartPos = intFontSizeStartPos + 1
                    intFontSize = 11 - 2 * Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
                Else
                    intFontSize = Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
                End If
            Else
                ' Error!
                GoTo HTML_Err
            End If
            intEndPos = InStr(intSrcPos, strHTML, ">")
            intSrcPos = intEndPos + 1
            intStartPos = intSrcPos
            If intSrcPos > Len(strHTML) Then Exit Do
            While intDestSrcEquiv(intStartPos) = 0 And intStartPos < Len(strHTML)
                intStartPos = intStartPos + 1
            Wend
            If intStartPos >= Len(strHTML) Then GoTo HTML_Err ' HTML is bad!
            varEndTag = "</font>"
            intEndPos = InStr(intSrcPos, LCase(strHTML), varEndTag)
            If intEndPos = 0 Then GoTo HTML_Err ' HTML is bad!
            While intDestSrcEquiv(intEndPos) = 0 And intEndPos > intSrcPos
                intEndPos = intEndPos - 1
            Wend
            If intEndPos > intSrcPos Then
                intActualStartPos = intDestSrcEquiv(intStartPos)
                intActualEndPos = intDestSrcEquiv(intEndPos)
                rngA.Characters(intActualStartPos, intActualEndPos - intActualStartPos + 1) _
                    .Font.Size = intFontSize
            End If
        End If
        intSrcPos = intSrcPos + 1
    Loop

    'Now do remaining tags
    intSrcPos = 1
    intDestPos = 1
    Do While intSrcPos <= Len(strHTML)
        If intDestSrcEquiv(intSrcPos) = 0 Then
            ' This must be a Tag!
            For intCtr = 0 To UBound(varyTags) Step 2
                varTag = varyTags(intCtr)
                intStartPos = intSrcPos + Len(varTag)
                While intDestSrcEquiv(intStartPos) = 0 And intStartPos < Len(strHTML)
                    intStartPos = intStartPos + 1
                Wend
                If intStartPos >= Len(strHTML) Then GoTo HTML_Err ' HTML is bad!
                If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
                    varEndTag = varyTags(intCtr + 1)
                    intEndPos = InStr(intSrcPos, LCase(strHTML), varEndTag)
                    If intEndPos = 0 Then GoTo HTML_Err ' HTML is bad!
                    While intDestSrcEquiv(intEndPos) = 0 And intEndPos > intSrcPos
                        intEndPos = intEndPos - 1
                    Wend
                    If intEndPos > intSrcPos Then
                        intActualStartPos = intDestSrcEquiv(intStartPos)
                        intActualEndPos = intDestSrcEquiv(intEndPos)
                        With rngA.Characters(intActualStartPos, intActualEndPos - intActualStartPos + 1).Font
                            If varTag = "<b>" Then
                                .Bold = True
                            ElseIf varTag = "<i>" Then
                                .Italic = True
                            ElseIf varTag = "<u>" Then
                                .Underline = True
                            ElseIf varTag = "<sup>" Then
                                .Superscript = True
                            ElseIf varTag = "<sub>" Then
                                .Subscript = True
                            End If
                        End With
                    End If
                    intSrcPos = intSrcPos + Len(varTag) - 1
                    Exit For
                End If
            Next
        End If
        intSrcPos = intSrcPos + 1
        intDestPos = intDestPos + 1
    Loop
Exit_Sub:
    Exit Sub
HTML_Err:
    ' There was an error with the Tags. Show warning if requested.
    If blnShowBadHTMLWarning Then
        MsgBox "There was an error with the Tags in the HTML file. Could not apply formatting."
    End If
End Sub

请注意,这并不关心标签嵌套的情况,而是要求每个打开的标签都有一个关闭的标签,并且假定最接近打开标签的关闭标签适用于打开标签。正确嵌套的标签将正常工作,而未正确嵌套的标签将不被拒绝,并且可能会或可能不会正常工作。

1
将HTML/Word放入Excel形状并定位到Excel单元格中的方法:
  1. 将我的HTML写入临时文件。
  2. 通过Word Interop打开临时文件。
  3. 从Word复制到剪贴板。
  4. 通过Interop打开Excel。
  5. 将单元格设置为范围并选择。
  6. 以“Microsoft Word文档对象”的形式进行PasteSpecial操作。
  7. 调整Excel行以适应形状高度。
这样,即使HTML中有表格和其他内容,也不会被分割成多个单元格。
    private void btnPutHTMLIntoExcelShape_Click(object sender, EventArgs e)
    {
        var fFile = new FileInfo(@"C:\Temp\temp.html");
        StreamWriter SW = fFile.CreateText();
        SW.Write(hecNote.DocumentHtml);
        SW.Close();

        Word.Application wrdApplication;
        Word.Document wrdDocument;
        wrdApplication = new Word.Application();
        wrdApplication.Visible = true;

        wrdDocument = wrdApplication.Documents.Add(@"C:\Temp\temp.html");
        wrdDocument.ActiveWindow.Selection.WholeStory();
        wrdDocument.ActiveWindow.Selection.Copy();

        Excel.Application excApplication;
        Excel.Workbook excWorkbook;
        Excel._Worksheet excWorksheet;
        Excel.Range excRange = null;

        excApplication = new Excel.Application();
        excApplication.Visible = true;
        excWorkbook = excApplication.Workbooks.Add(Type.Missing);
        excWorksheet = (Excel.Worksheet)excWorkbook.Worksheets.get_Item(1);
        excWorksheet.Name = "Work";
        excRange = excWorksheet.get_Range("A1");
        excRange.Select();

        excWorksheet.PasteSpecial("Microsoft Word Document Object");

        Excel.Shape O = excWorksheet.Shapes.Item(1);

        this.Text = $"{O.Height} x {O.Width}";
        ((Excel.Range)excWorksheet.Rows[1, Type.Missing]).RowHeight = O.Height;
    }

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接