我想从html中提取一个数字,介于<td>...</td>
之间。我尝试过以下代码:
$views = "/<td id=\"adv-result-views-(?:.*)\" class=\"spec\">(.*?)<\/td>/";
在 -views- 后面是一个随机数字。如何正确编写代码以忽略搜索中的随机数字?
我想从html中提取一个数字,介于<td>...</td>
之间。我尝试过以下代码:
$views = "/<td id=\"adv-result-views-(?:.*)\" class=\"spec\">(.*?)<\/td>/";
在 -views- 后面是一个随机数字。如何正确编写代码以忽略搜索中的随机数字?
<?php
$htm = '<td id="adv-result-views-190147977" class="spec"> 4 </td>';
$dom = new DOMDocument;
$dom->loadHTML($htm);
echo $content = $dom->getElementsByTagName('td')->item(0)->nodeValue; //4
$html = '<td id="adv-result-views-190147977" class="spec"> 4 </td>';
// get the value of element
echo trim( strip_tags( $html ) );
// get the number in id attribute, replace string with group capture $1
echo preg_replace( '/^.*?id="[\pLl-]+(\d+).*$/s', '$1', $html );
/*
^.*?id=" Any character from the beginning of string, not gready
id=" Find 'id="'
[\pLl-]+ Lower case letter and '-' ( 1 or more times )
(\d+) Group and capture to \1 -> digits (0-9) (1 or more times) -> end of \1
.*$ Any character, gready, until end of the string
*/
// get html withut the number in id attribute
echo preg_replace( '/(^.*?id="[\pLl-]+)(\d+)(.*$)/s', '$1$3', $html );
由于问题标记为正则表达式,这是一个正则表达式示例,但DOM是解析HTML的首选方式(特别是在SO社区中)。
adv-result-views-\d+
- bansi