从外部网站获取标题和元标记

Question

从外部网站获取标题和元标记

68

我想尝试弄清楚如何获取

<title>A common title</title>
<meta name="keywords" content="Keywords blabla" />
<meta name="description" content="This is the description" />

即使以任何顺序排列，我已经听说过PHP Simple HTML DOM解析器，但我不想使用它。除了使用PHP Simple HTML DOM解析器之外，是否有其他解决方案？

如果是无效的HTML，preg_match将无法完成这项工作？

cURL能否像preg_match一样做这样的事情？

Facebook做了类似的事情，但是它通过正确使用来实现：

<meta property="og:description" content="Description blabla" />

我希望有这样一个功能，当有人发表一个链接时，它可以获取该链接的标题和元标签。如果没有元标签，则忽略或让用户自己设置（但我以后会自己处理）。

- MacMac

22个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Khandad Niazi · Answer 1

以下是PHP简单DOM HTML类的两行代码，用于获取页面META详细信息。

$html = file_get_html($link);
$meat_description = $html->find('head meta[name=description]', 0)->content;
$meat_keywords = $html->find('head meta[name=keywords]', 0)->content;

- Keith Turkowski · Answer 2

如果您想简洁地处理糟糕的 HTML/无效的 URL，这是我所使用的方法。

@if(substr(get_headers('https://www.google.com/')[0], 9, 3) == 200)
{
    @$title = preg_replace('/.*<title>(.*)<\/title>.*|.*/si', '\1',file_get_contents('https://www.google.com/'),1);
    @$desc = get_meta_tags('https://www.google.com/')['description']??'';
}