使用PHP从HTML页面中提取图像URL

Question

使用PHP从HTML页面中提取图像URL

6

如何使用PHP从此链接中提取文章图片？

我了解到不能使用正则表达式来实现。 http://www.huffingtonpost.it/2013/07/03/stupri-piazza-tahrir-durante-proteste-anti-morsi_n_3538921.html?utm_hp_ref=italy 非常感谢。

- michele

https://dev59.com/X3I-5IYBdhLWcg3wq6do#1732454 - Ignacio Vazquez-Abrams

谢谢，那我该怎么做呢？ - michele

3个回答

2

你可以/必须使用DOM解析HTML。这里是一个与你情况类似的示例：

$curlResource = curl_init('http://www.huffingtonpost.it/2013/07/03/stupri-piazza-tahrir-durante-proteste-anti-morsi_n_3538921.html?utm_hp_ref=italy');
curl_setopt($curlResource, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlResource, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curlResource, CURLOPT_AUTOREFERER, true);

$page = curl_exec($curlResource);
curl_close($curlResource);


$domDocument = new DOMDocument();
$domDocument->loadHTML($page);

$xpath = new DOMXPath($domDocument);

$urlXpath = $xpath->query("//img[@id='img_caption_3538921']/@src");

$url = $urlXpath->item(0)->nodeValue;

echo $url;

慢慢来，学习一些DOM和XPATH是值得的。

- Aurimas Ličkus

1

尝试这个...

$content=file_get_contents($url);
if (preg_match("/src=[\"\'][^\'\']+[\"\']/", $content, $matches)) 
{
    echo "Match was found <br />";
    echo $matches[0];
}

- Krishna

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Nidhin Joseph · Accepted Answer

4

$content=file_get_contents($url);
if (preg_match("/<img.*src=\"(.*)\".*class=\".*pinit\".*>/", $content, $matches)) 
{
echo "Match was found <br />";
echo $matches[0];
}

$matches[0]将打印整个图像标记。如果您只想提取URL，则可以使用$matches[1]来获得相同的结果 :)

- Nidhin Joseph

我试图对“http://techcrunch.com/2014/05/09/facebook-is-down-for-many/”做同样的事情，但它没有返回任何内容。我知道<img>在这里：<img src="http://tctechcrunch2011.files.wordpress.com/2014/05/screen-shot-2014-05-09-at-5-09-36-pm.png?w=738" class="" />但即使进行了几次更改，它仍然没有返回任何内容。任何帮助都将不胜感激 _/_ - Saurabh Rana

那个正则表达式非常特定于特定网页中的模式。试试这个。

if (preg_match("/";
    echo $matches[0];
}

工作原理：该正则表达式将搜索图像标记内的src属性，然后提取假定在双引号内的图像URL。您可以根据自己的要求进行修改。 - Nidhin Joseph