从HTML正文的表格中提取电子邮件地址

3

我希望回复一个网页表单,并从表单中提取电子邮件地址。

由于该网页表单位于表格中,因此ParseTextLinePair()函数返回标签旁边的电子邮件地址为空白。

如何从网页表单中提取电子邮件地址?

Sub ReplywithTemplatev2()
Dim Item As Outlook.MailItem
Dim oRespond As Outlook.MailItem

'Get Email
    Dim intLocAddress As Integer
    Dim intLocCRLF As Integer
    Dim strAddress As String

Set Item = GetCurrentItem()

If Item.Class = olMail Then

        ' find the requestor address
        strAddress = ParseTextLinePair(Item.Body, "Email-Adresse des Ansprechpartners *")


' This sends a response back using a template
Set oRespond = Application.CreateItemFromTemplate("C:\Users\Reply.oft")

With oRespond
    .Recipients.Add Item.SenderEmailAddress
    .Subject = "Your Subject Goes Here"
    .HTMLBody = oRespond.HTMLBody & vbCrLf & _
              "---- original message below ---" & vbCrLf & _
               Item.HTMLBody & vbCrLf

' includes the original message as an attachment
   ' .Attachments.Add Item

   oRespond.To = strAddress

' use this for testing, change to .send once you have it working as desired
    .Display


End With

End If
Set oRespond = Nothing

End Sub

Function GetCurrentItem() As Object
    Dim objApp As Outlook.Application

    Set objApp = Application
    On Error Resume Next
    Select Case TypeName(objApp.ActiveWindow)
        Case "Explorer"
            Set GetCurrentItem = objApp.ActiveExplorer.Selection.Item(1)
        Case "Inspector"
            Set GetCurrentItem = objApp.ActiveInspector.CurrentItem
    End Select

    Set objApp = Nothing
End Function

Function ParseTextLinePair(strSource As String, strLabel As String)
    Dim intLocLabel As Integer
    Dim intLocCRLF As Integer
    Dim intLenLabel As Integer
    Dim strText As String

    ' locate the label in the source text
    intLocLabel = InStr(strSource, strLabel)
    intLenLabel = Len(strLabel)
        If intLocLabel > 0 Then
        intLocCRLF = InStr(intLocLabel, strSource, vbCrLf)
        If intLocCRLF > 0 Then
            intLocLabel = intLocLabel + intLenLabel
            strText = Mid(strSource, _
                            intLocLabel, _
                            intLocCRLF - intLocLabel)
        Else
            intLocLabel = Mid(strSource, intLocLabel + intLenLabel)
        End If
    End If
    ParseTextLinePair = Trim(strText)
End Function

为了澄清,这里有一张表格的图片。

enter image description here


你可以尝试使用 Item.HTMLBody,因为它返回一个结构化的HTML字符串,你可以用它来解析出 <Table> 中适当的 <TD> 元素。不要使用字符串函数来解析HTML,因为有专门设计的库更适合这个任务。否则,如果你能截屏展示一下这个邮件表格的样子,可能会有更简单的方法。 - David Zemens
1
添加了截图。表格有2列9行。 - user3772665
1个回答

3

你是否研究过VBA中的正则表达式?虽然我已经有一段时间没有使用它了,但这里有一个例子。


Option Explicit
Sub Example()
    Dim Item As MailItem
    Dim RegExp As Object
    Dim Search_Email As String
    Dim Pattern As String     
    Dim Matches As Variant

    Set RegExp = CreateObject("VbScript.RegExp")

    Pattern = "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b"

    For Each Item In ActiveExplorer.Selection

        Search_Email = Item.body

        With RegExp
            .Global = False
            .Pattern = Pattern
            .IgnoreCase = True
            Set Matches = .Execute(Search_Email)
        End With

        If Matches.Count > 0 Then
            Debug.Print Matches(0)
        Else
            Debug.Print "Not Found "
        End If

    Next

    Set RegExp = Nothing

End Sub

或者Pattern = "(\S*@\w+\.\w+)" 或者"(\w+(?:\W+\w+)*@\w+\.\w+)"


Regular-expressions.info/tutorial

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b 简单的模式,描述了一个电子邮件地址。

一系列字母、数字、点、下划线、百分号和连字符,后跟一个at符号,再后跟另一系列字母、数字和连字符,最后跟一个单独的点和两个或更多字母

[A-Z0-9._%+-]+ 匹配列表中出现的单个字符

A-Z 在A和Z之间(区分大小写)的单个字符

0-9 在0和9之间的单个字符

._%+- 列表中的单个字符

@ 匹配@字符


量词

Udemy.com/vba-regex/

+---------+---------------------------------------------+------------------------------------------------------------+
| Pattern |                   Meaning                   |                          Example                           |
+---------+---------------------------------------------+------------------------------------------------------------+
|         |                                             |                                                            |
| –       | Stands for  a range                         | a-z means all the letters a to z                           |
| []      | Stands for any one of the characters quoted | [abc] means either a, b or c.[A-Z] means either A, B, …, Z |
| ()      | Used for grouping purposes                  |                                                            |
| |       | Meaning is ‘or’                             | X|Y, means X or Y                                          |
| +       | Matches the character one or more times     | zo+ matches ‘zoo’, but not ‘z’                             |
| *       | Matches the character zero or more times    | “lo*” matches either “l” or “loo”                          |
| ?       | Matches the character zero or once          | “b?ve?” matches the “ve” in “never”.                       |
+---------+---------------------------------------------+------------------------------------------------------------+

Wikibooks.org/wiki/Visual_Basic/Regular_Expressions

https://regex101.com/r/oP2yR0/1


太棒了,谢谢,这对于电子邮件地址有效。我想使用基于表格的解决方案或上面的解决方案的原因是我还想从第一行第二列中检索条目。 - user3772665
1
看起来提取电子邮件地址的问题已经得到了令人满意的回答。请考虑接受它并创建一个新的问题。http://stackoverflow.com/help/accepted-answer - niton
当有多个电子邮件地址时怎么办? - pablo808

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接