PHP从URL中删除域名

Question

PHP从URL中删除域名

3

我知道网络上有很多关于这个主题的信息，但是我似乎无法以我想要的方式理解它。

我正在尝试构建一个函数来从 url 中提取域名：

http://blabla.com    blabla
www.blabla.net       blabla
http://www.blabla.eu blabla

只需要域名的纯名称。

使用parse_url函数可以过滤掉域名，但这并不足够。我有3个函数可以去除域名，但仍然会得到一些错误的输出。

function prepare_array($domains)
{
    $prep_domains = explode("\n", str_replace("\r", "", $domains)); 
    $domain_array = array_map('trim', $prep_domains); 

    return $domain_array;
}

function test($domain) 
{
    $domain = explode(".", $domain);
    return $domain[1];
}

function strip($url) 
{ 
   $url = trim($url);
   $url = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url); 
   $url = preg_replace("/\/.*$/is" , "" ,$url); 
   return $url; 
}

允许使用任何可能的域名、URL和扩展名。函数完成后，必须返回仅包含域名本身的数组。

更新：感谢大家的建议！在你们的帮助下，我已经解决了这个问题。

function test($url) 
{   
    // Check if the url begins with http:// www. or both
    // If so, replace it
    if (preg_match("/^(http:\/\/|www.)/i", $url))
    {
        $domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
    }
    else
    {
        $domain = $url;
    }

    // Now all thats left is the domain and the extension
    // Only return the needed first part without the extension    
    $domain = explode(".", $domain);

    return $domain[0];
}

- Rob

尝试使用 parse_url 函数来完成此操作。http://php.net/manual/function.parse-url.php - ChoiZ

子域名怎么样？ - James Dunne

5个回答

2

啊，你的问题在于顶级域名可以是一个或两个部分，例如.com与.co.uk。

我的建议是维护一个TLD列表。使用parse_url解析后，遍历这个列表并寻找匹配项。剥离TLD，然后以“.”作为分隔符进行分割，最后一部分将是你想要的格式。

虽然这种方法看起来不是特别高效，但由于TLD不断增加，我无法想象其他确定性的方法。

- James Dunne

2

好的...这很混乱，你应该花些时间优化和缓存之前派生的域名。你还应该有一个友好的NameServer，最后一个问题是域名必须在其DNS中具有"A"记录。

此操作尝试将域名反向组装，直到它可以解析为DNS中的"A"记录。

无论如何，这一直困扰着我，所以我希望这个答案能够帮助你：

<?php
$wsHostNames = array(
    "test.com",
    "http://www.bbc.com/news/uk-34276525",
    "google.uk.co"
);
foreach ($wsHostNames as $hostName) {
    echo "checking $hostName" . PHP_EOL;
    $wsWork = $hostName;
    //attempt to strip out full paths to just host
    $wsWork = parse_url($hostName, PHP_URL_HOST);
    if ($wsWork != "") {
        echo "Was able to cleanup $wsWork" . PHP_EOL;
        $hostName = $wsWork;
    } else {
        //Probably had no path info or malformed URL
        //Try to check it anyway
        echo "No path to strip from $hostName" . PHP_EOL;
    }

    $wsArray = explode(".", $hostName); //Break it up into an array.

    $wsHostName = "";
    //Build domain one segment a time probably
    //Code should be modified not to check for the first segment (.com)
    while (!empty($wsArray)) {
        $newSegment = array_pop($wsArray);
        $wsHostName = $newSegment . $wsHostName;
        echo "Checking $wsHostName" . PHP_EOL;
        if (checkdnsrr($wsHostName, "A")) {
            echo "host found $wsHostName" . PHP_EOL;
            echo "Domain is $newSegment" . PHP_EOL;
            continue(2);
        } else {
            //This segment didn't resolve - keep building
            echo "No Valid A Record for $wsHostName" . PHP_EOL;
            $wsHostName = "." . $wsHostName;
        }
    }
    //if you get to here in the loop it could not resolve the host name

}
?>

- Jim_M

1

尝试使用 preg_replace。

类似于 $domain = preg_replace($regex, '$1', $url); regex

- luis martinez

这并没有回答问题，因为链接中提供的正则表达式没有任何捕获组。 - Zsw

1

function test($url) 
{   
    // Check if the url begins with http:// www. or both
    // If so, replace it
    if (preg_match("/^(http:\/\/|www.)/i", $url))
    {
        $domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
    }
    else
    {
        $domain = $url;
    }

    // Now all thats left is the domain and the extension
    // Only return the needed first part without the extension    
    $domain = explode(".", $domain);

    return $domain[0];
}

- Rob

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jim_M · Accepted Answer

3

How about

$wsArray = explode(".",$domain); //Break it up into an array. 
$extension = array_pop($wsArray); //Get the Extension (last entry)
$domain = array_pop($wsArray); // Get the domain

http://php.net/manual/en/function.array-pop.php

- Jim_M

实际上，ChoiZ在上面的评论可能是更好的答案。 - Jim_M

这个答案不适用于.co.uk和类似的域名。 - Zsw

很遗憾，这并不能完成任务。当我输入http://google.com时，它返回的是http://google。 - Rob

昨晚我提供了一个新的解决方案。不过，我想确认一下你的评论。最初看起来你试图从任何顶级域名中仅提取名称“blablabla”。但现在你说如果输入“google.com”，它只会给出谷歌。这难道不是你要寻找的结果吗？ - Jim_M

Stackoverflow更改了原始内容，即：“http://google.com”，但仍未输出我键入的内容。 - Rob