如何使用Excel VBA获取href属性

3

我希望能够检索html页面中所有<h3>标签的href属性,目前我已经能够获取其innerText,但是我不知道如何访问href属性。文档中有多个<h3>标签,但目前只需要第一个。稍后我会处理其余部分...

这是我到目前为止得到的代码:

Sub Scrap()

Dim IE As New InternetExplorer
Dim sDD As String
Dim Doc As HTMLDocument

IE.Visible = True
IE.navigate "https://www.oneoiljobsearch.com/senior-reservoir-engineer-jobs/?page=1"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
sDD = Trim(Doc.getElementsByTagName("h3")(0).innerText) 
'sDD contains the string "Senior Reservoir Engineer"
End Sub

以下是要提取数据的HTML文档部分:
  <div class="front_job_details">

    <h3>
        <a href="/jobs/senior-reservoir-engineer-oslo-norway-7?cmp=js&from=job-search-form-2" target="_blank">

        Senior Reservoir Engineer

        </a>
    </h3>

我需要检索的文本是:“/jobs/senior-reservoir-engineer-oslo-norway-7?cmp=js&from=job-search-form-2”

非常感谢您的帮助。

3个回答

2
尝试一下,
dim hr as string

hr = Doc.getElementsByTagName("h3")(0).getElementsByTagName("a")(0).href

debug.print hr

getElementsByTagName集合是从零开始的,但.Length(H3的数量,在其他方法中称为Count)是从一开始的。

dim i as long

for i=0 to Doc.getElementsByTagName("h3").length - 1
    debug.print Doc.getElementsByTagName("h3")(i).getElementsByTagName("a")(0).href
next i

这段代码从每个H3标签中获取第一个<A>标签。你可以复制这个方法来获取每个H3标签中的多个A标签。


现在,我该如何循环遍历文档中的所有<h3>标签以获取所有href?我需要定义某种类型的集合,但不确定如何操作,你能帮忙吗? - Pegaso

1
我会选择以下更健壮的CSS选择器方法来获取类中的所有href。
Option Explicit
Public Sub GetLinks()
    Dim ie As New InternetExplorer, i As Long, aNodeList As Object
    With ie
        .Visible = True
        .navigate "https://www.oneoiljobsearch.com/senior-reservoir-engineer-jobs/?page=1"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set aNodeList = .document.querySelectorAll(".front_job_details [href]")
        For i = 0 To aNodeList.Length - 1
            Debug.Print aNodeList.item(i)
        Next
        Stop                                     '<=delete me after
        'other stuff
        .Quit
    End With
End Sub

0

以下是最终代码,希望对某些人有所帮助...

Sub MultiScrap()

Dim IE As New InternetExplorer
Dim hr As String
Dim Doc As HTMLDocument
Dim i, j, s As Long

Sheets("LNK0").Activate
myHTTP = Cells(1, 2) 'http address root
lval = Cells(2, 2) 'min number to add to root (page=1..)
uval = Cells(3, 2) 'max number to add to root (page=10..)
s = 5

For i = lval To uval 'loop through all pages

    'IE.Visible = True
    IE.navigate myHTTP & i
    Do
    DoEvents
    Loop Until IE.readyState = READYSTATE_COMPLETE
    Set Doc = IE.document

    For j = 0 To Doc.getElementsByTagName("h3").Length - 1
        Cells(s, 1) = s - 4 'Correl
        Cells(s, 2) = i 'Page
        Cells(s, 3) = j 'Row in page
        Cells(s, 4) = Doc.getElementsByTagName("h3")(j).getElementsByTagName("a")(0).href 'Http
        hyperAddres = Cells(s, 4).Value
        hyperTxt = Cells(s, 4).Value
        Cells(s, 4).Hyperlinks.Add _
            Anchor:=Range(Cells(s, 4), Cells(s, 4)), _
            Address:=hyperAddres, _
            TextToDisplay:=hyperTxt 'Hyperlink
        s = s + 1
    Next j
    Doc.Close
Next i

MsgBox "Dishes ready Sir!"

End Sub

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接