在PHP中将纯文本URL转换为HTML超链接

65

我有一个简单的评论系统,人们可以在纯文本字段中提交超链接。当我从数据库中将这些记录显示回网页时,我应该使用PHP中的什么RegExp来将这些链接转换为HTML类型的锚链接呢?

我不希望算法对任何其他类型的链接执行此操作,只想针对http和https。


1
今天有一个类似的问题被问到了:https://dev59.com/vnI-5IYBdhLWcg3wO1rl - BalusC
3
查看问题:*https://dev59.com/PHM_5IYBdhLWcg3w3nbz - Søren Løvborg
15个回答

1
我正在使用一个源自question2answer的函数,它可以接受纯文本甚至是带有html格式的纯文本链接:
// $html holds the string
$htmlunlinkeds = array_reverse(preg_split('|<[Aa]\s+[^>]+>.*</[Aa]\s*>|', $html, -1, PREG_SPLIT_OFFSET_CAPTURE)); // start from end so we substitute correctly
foreach ($htmlunlinkeds as $htmlunlinked)
{ // and that we don't detect links inside HTML, e.g. <img src="http://...">
    $thishtmluntaggeds = array_reverse(preg_split('/<[^>]*>/', $htmlunlinked[0], -1, PREG_SPLIT_OFFSET_CAPTURE)); // again, start from end
    foreach ($thishtmluntaggeds as $thishtmluntagged)
    {
        $innerhtml = $thishtmluntagged[0];
        if(is_numeric(strpos($innerhtml, '://'))) 
        { // quick test first
            $newhtml = qa_html_convert_urls($innerhtml, qa_opt('links_in_new_window'));
            $html = substr_replace($html, $newhtml, $htmlunlinked[1]+$thishtmluntagged[1], strlen($innerhtml));
        }
    }
}   
echo $html;

function qa_html_convert_urls($html, $newwindow = false)
/*
    Return $html with any URLs converted into links (with nofollow and in a new window if $newwindow).
    Closing parentheses/brackets are removed from the link if they don't have a matching opening one. This avoids creating
    incorrect URLs from (http://www.question2answer.org) but allow URLs such as http://www.wikipedia.org/Computers_(Software)
*/
{
    $uc = 'a-z\x{00a1}-\x{ffff}';
    $url_regex = '#\b((?:https?|ftp)://(?:[0-9'.$uc.'][0-9'.$uc.'-]*\.)+['.$uc.']{2,}(?::\d{2,5})?(?:/(?:[^\s<>]*[^\s<>\.])?)?)#iu';

    // get matches and their positions
    if (preg_match_all($url_regex, $html, $matches, PREG_OFFSET_CAPTURE)) {
        $brackets = array(
            ')' => '(',
            '}' => '{',
            ']' => '[',
        );

        // loop backwards so we substitute correctly
        for ($i = count($matches[1])-1; $i >= 0; $i--) {
            $match = $matches[1][$i];
            $text_url = $match[0];
            $removed = '';
            $lastch = substr($text_url, -1);

            // exclude bracket from link if no matching bracket
            while (array_key_exists($lastch, $brackets)) {
                $open_char = $brackets[$lastch];
                $num_open = substr_count($text_url, $open_char);
                $num_close = substr_count($text_url, $lastch);

                if ($num_close == $num_open + 1) {
                    $text_url = substr($text_url, 0, -1);
                    $removed = $lastch . $removed;
                    $lastch = substr($text_url, -1);
                }
                else
                    break;
            }

            $target = $newwindow ? ' target="_blank"' : '';
            $replace = '<a href="' . $text_url . '" rel="nofollow"' . $target . '>' . $text_url . '</a>' . $removed;
            $html = substr_replace($html, $replace, $match[1], strlen($match[0]));
        }
    }

    return $html;
}

由于接受包含括号和其他字符的链接,代码有点多,但这可能会有所帮助。

1

我建议不要像这样在现场做太多事情。我更喜欢使用简单的编辑器界面,就像在stackoverflow中使用的那个一样。它被称为Markdown


1

在HTML中查找纯文本链接

我非常喜欢这个答案 - 但是我需要一种解决可能存在于非常简单的HTML文本中的纯文本链接的方法:

<p>I found a really cool site you might like:</p>
<p>www.stackoverflow.com</p>

这意味着我需要使正则表达式忽略HTML字符<>

正则表达式调整

所以我改变了部分模式,使用了[^\s\>\<]代替\S

  • \S - 非空白字符;匹配任何非空白字符(制表符、空格、换行符)
  • [^] - 否定集;匹配不在集合中的任何字符

此答案中我的函数版本

我需要另一种格式除了HTML,所以我将正则表达式与它们的替换分开,以适应此需求。

我还添加了一种方法来将找到的链接/电子邮件仅返回为数组,以便我可以将它们保存为我的帖子上的关系(非常适合稍后制作元卡和分析!)。

更新:连续的句点被匹配了

我发现像there...it这样的文本也被匹配了 - 所以我想确保我不会得到包含连续点的任何匹配。

注意:为了解决这个问题,我添加了一个额外的格式字符串来撤销匹配它们,以避免重新做这些本来可靠的URL正则表达式。
/***
 * based on this answer: https://dev59.com/PnI-5IYBdhLWcg3wO1rl#49689245
 *
 * @var $text String
 * @var $format String - html (<a href=""...), short ([link:https://somewhere]), other (https://somewhere)
 */
public function formatLinksInString(
    $string,
    $format = 'html', 
    $returnMatches = false
) {
    $formatProtocol = $format == 'html'
        ? '<a href="$0" target="_blank" title="$0">$0</a>'
        : ($format == 'short' || $returnMatches ? '[link:$0]' : '$0');

    $formatSansProtocol = $format == 'html'
        ? '<a href="//$0" target="_blank" title="$0">$0</a>'
        : ($format == 'short' || $returnMatches ? '[link://$0]' : '$0');

    $formatMailto = $format == 'html'
        ? '<a href="mailto:$1" target="_blank" title="$1">$1</a>'
        : ($format == 'short' || $returnMatches ? '[mailto:$1]' : '$1');

    $regProtocol = '/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(\/[^\<\>\s]*)?/';
    $regSansProtocol = '/(?<=\s|\A|\>)([0-9a-zA-Z\-\.]+\.[a-zA-Z0-9\/]{2,})(?=\s|$|\,|\<)/';
    $regEmail = '/([^\s\>\<]+\@[^\s\>\<]+\.[^\s\>\<]+)\b/';
    $consecutiveDotsRegex = $format == 'html'
        ? '/<a[^\>]+[\.]{2,}[^\>]*?>([^\<]*?)<\/a>/'
        : '/\[link:.*?\/\/([^\]]+[\.]{2,}[^\]]*?)\]/';

    // Protocol links
    $formatString = preg_replace($regProtocol, $formatProtocol, $string);
    // Sans Protocol Links
    $formatString = preg_replace($regSansProtocol, $formatSansProtocol, $formatString); // use formatString from above
    // Email - Mailto - Links
    $formatString = preg_replace($regEmail, $formatMailto, $formatString); // use formatString from above
    // Prevent consecutive periods from getting captured
    $formatString = preg_replace($consecutiveDotsRegex, '$1', $formatString);

    if ($returnMatches) {
        // Find all [x:link] patterns
        preg_match_all('/\[.*?:(.*?)\]/', $formatString, $matches);

        current($matches); // to move pointer onto groups
        return next($matches); // return the groups
    }

    return $formatString;
}

0
$string = 'example.com
www.example.com
http://example.com
https://example.com
http://www.example.com
https://www.example.com';

preg_match_all('#(\w*://|www\.)[a-z0-9]+(-+[a-z0-9]+)*(\.[a-z0-9]+(-+[a-z0-9]+)*)+(/([^\s()<>;]+\w)?/?)?#i', $string, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
foreach (array_reverse($matches) as $match) {
  $a = '<a href="'.(strpos($match[1][0], '/') ? '' : 'http://') . $match[0][0].'">' . $match[0][0] . '</a>';
  $string = substr_replace($string, $a, $match[0][1], strlen($match[0][0]));
}

echo $string;

结果:

example.com
<a href="http://www.example.com">www.example.com</a>
<a href="http://example.com">http://example.com</a>
<a href="https://example.com">https://example.com</a>
<a href="http://www.example.com">http://www.example.com</a>
<a href="https://www.example.com">https://www.example.com</a>

我喜欢这个解决方案的原因是它还将www.example.com转换为http://www.example.com,因为<a href="www.example.com"></a>不起作用(没有http/https协议,它会指向yourdomain.com/www.example.com)。

-2

如果我理解正确,您想要将普通文本转换为http链接。以下是我认为可以帮助您的内容:

<?php

   $list = mysqli_query($con,"SELECT * FROM list WHERE name = 'table content'"); 
   while($row2 = mysqli_fetch_array($list)) {
echo "<a target='_blank' href='http://www." . $row2['content']. "'>" . $row2['content']. "</a>";

   }  
?>

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接