使用PHP从字符串中提取URL

Question

使用PHP从字符串中提取URL

phpurl

47

如何使用PHP来识别字符串中的URL并将其存储到数组中？

如果URL包含逗号，则无法使用explode函数获得正确结果。

- Azraar Azward

请参见 https://dev59.com/v2bWa4cB1Zd3GeqPV173#11588614 - Avatar

2

使用以下代码：

preg_match_all("/\b((https?):\/\/)?([a-z0-9-.]*)\.([a-z]{2,3})([-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$])/i", $string, $match);

- Harsh Patel

6个回答

9

请尝试使用以下正则表达式：

$regex = '/https?\:\/\/[^\",]+/i';
preg_match_all($regex, $string, $matches);
echo "<pre>";
print_r($matches[0]);

希望这对你有用。

- JiteshNK

1

当URL没有用逗号分隔时，此查询是“贪婪”的。 - patrick

5

你可以在这里尝试正则表达式：

$string = "The text you want to filter goes here. http://google.com, https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/";

preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $string, $match);

echo "<pre>";
print_r($match[0]); 
echo "</pre>";

这将会产生以下输出：

Array
(
  [0] => http://google.com
  [1] => https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/
)

- Object Manipulator

4

输出数组应该有3个结果，而不是2个。这三个结果是：http://google.com、https://www.youtube.com/watch?v=K_m7NEDMrV0 和 https://instagram.com/hellow/。 - Azraar Azward

1

[\w\d]+ === [\w]+ - ion

4

尝试这个。

function getUrls($string)
{
$regex = '/https?\:\/\/[^\" ]+/i';
preg_match_all($regex, $string, $matches);
return ($matches[0]);
}
$urls = getUrls($string);
print_r($urls);

或者

$str = '<a href="http://foobar.com"> | Hello world Im a http://google.fr |     Did you mean:http://google.fr/index.php?id=1&b=6#2310';
$pattern = '`.*?((http|ftp)://[\w#$&+,\/:;=?@.-]+)[^\w#$&+,\/:;=?@.-]*?`i';
if (preg_match_all($pattern,$str,$matches)) 
{
print_r($matches[1]);
}

它将会起作用。

- khan

不，它仍然给出了2个结果。有3个URL，但只返回了2个。你能看到吗？Array（[0] => http://google.com，[1] => https://www.youtube.com/watch?v=K_m7NEDMrV0，https://instagram.com/hellow/） - Azraar Azward

可能这个链接会对你有所帮助：https://dev59.com/-m855IYBdhLWcg3wXC5- - khan

你能提供一个使用该正则表达式的例子吗？ - Azraar Azward

不行，它对我的字符串无效。

$string =“您要过滤的文本在此处。http://google.com，https://www.youtube.com/watch?v=K_m7NEDMrV0，https://instagram.com/hellow/”;

- Azraar Azward

4

$urlstring = "The text you want to filter goes here. http://google.com, https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/";

preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $urlstring , $result);

print_r($result[0]);

- Prassd Nidode

不，它仍然只给出2个URL。结果应该给出3个URL。 - Azraar Azward

2

$string = "The text you want to filter goes here. http://google.com,
https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/";

preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#',
$string, $match);

echo "<pre>"; $arr = explode(",", $match[0][1]);
print_r($match[0][0]); print_r($arr); echo "</pre>";

- Prassd Nidode

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- aampudia · Accepted Answer

正则表达式是解决你问题的答案。参照对象操作器的答案......唯一需要排除的是“逗号”，所以你可以尝试使用这段代码，它将排除它们并将给出3个分离的URL作为输出：

$string = "The text you want to filter goes here. http://google.com, https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/";

preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $string, $match);

echo "<pre>";
print_r($match[0]); 
echo "</pre>";

输出结果为

Array
(
    [0] => http://google.com
    [1] => https://www.youtube.com/watch?v=K_m7NEDMrV0
    [2] => https://instagram.com/hellow/
)