file_get_contents在URL不存在时的处理方式

Question

file_get_contents在URL不存在时的处理方式

83

我正在使用file_get_contents()来访问一个URL。

file_get_contents('http://somenotrealurl.com/notrealpage');

如果URL不存在，它会返回这个错误信息。我该如何让它优雅地报错，以便我知道页面不存在并根据情况采取行动，而不显示这个错误信息？

file_get_contents('http://somenotrealurl.com/notrealpage') 
[function.file-get-contents]: 
failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found 
in myphppage.php on line 3

例如在Zend中，您可以这样说：if ($request->isSuccessful())

$client = New Zend_Http_Client();
$client->setUri('http://someurl.com/somepage');

$request = $client->request();

if ($request->isSuccessful()) {
 //do stuff with the result
}

- sami

尝试使用流上下文：https://dev59.com/83zaa4cB1Zd3GeqPRIne，file_get_contents在幕后使用fopen。 - rsk82

8个回答

77

在PHP中，您可以在这样的命令前加上@符号来抑制此类警告。

@file_get_contents('http://somenotrealurl.com/notrealpage');

file_get_contents() 函数在发生错误时会返回 FALSE，所以如果你将返回结果与此进行比较，就可以处理错误。

$pageDocument = @file_get_contents('http://somenotrealurl.com/notrealpage');

if ($pageDocument === false) {
    // Handle error
}

- Orbling

3

我不仅想要抑制错误，还想知道URL是否有效。 - sami

请注意，如果服务器宕机，该函数可能会阻塞一段时间。 - Alex Jasmin

错误抑制仍会出现原始问题报告的错误。很奇怪，但确实发生了。 - YOMorales

2

非常感谢，这对我来说是完美的解决方案。 - Jam

1

你真的救了我的一天。我浪费了时间尝试实现其他解决方案，直到我尝试了你的方法。非常感谢！ - Vickar

显示剩余2条评论

34

每次使用file_get_contents函数调用带有http包装器的URL时，都会在本地范围内创建一个变量：$http_response_header。

该变量包含所有HTTP标头。与get_headers()函数相比，此方法更好，因为仅执行一次请求。

注意：2个不同的请求可能会以不同的方式结束。例如，get_headers()将返回503，而file_get_contents()将返回200。您将获得适当的输出，但由于get_headers()调用中的503错误，您将无法使用它。

function getUrl($url) {
    $content = file_get_contents($url);
    // you can add some code to extract/parse response number from first header. 
    // For example from "HTTP/1.1 200 OK" string.
    return array(
            'headers' => $http_response_header,
            'content' => $content
        );
}

// Handle 40x and 50x errors
$response = getUrl("http://example.com/secret-message");
if ($response['content'] === FALSE)
    echo $response['headers'][0];   // HTTP/1.1 401 Unauthorized
else
    echo $response['content'];

这种方法还允许您跟踪存储在不同变量中的少数请求标头，因为如果使用 file_get_contents()，$http_response_header 会在本地作用域中被覆盖。

- Grzegorz

1

这太完美了，它可以避免额外的请求，我给它点赞。我正在处理成千上万个URL的缓存生成问题，所以不得不重复请求就太荒谬了。 - jenovachild

16

虽然file_get_contents非常简洁和方便，但我倾向于使用Curl库来获得更好的控制。以下是一个示例。

function fetchUrl($uri) {
    $handle = curl_init();

    curl_setopt($handle, CURLOPT_URL, $uri);
    curl_setopt($handle, CURLOPT_POST, false);
    curl_setopt($handle, CURLOPT_BINARYTRANSFER, false);
    curl_setopt($handle, CURLOPT_HEADER, true);
    curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($handle, CURLOPT_CONNECTTIMEOUT, 10);

    $response = curl_exec($handle);
    $hlength  = curl_getinfo($handle, CURLINFO_HEADER_SIZE);
    $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
    $body     = substr($response, $hlength);

    // If HTTP response is not 200, throw exception
    if ($httpCode != 200) {
        throw new Exception($httpCode);
    }

    return $body;
}

$url = 'http://some.host.com/path/to/doc';

try {
    $response = fetchUrl($url);
} catch (Exception $e) {
    error_log('Fetch URL failed: ' . $e->getMessage() . ' for ' . $url);
}

- nikc.org

是的，curl库要好得多 - 我个人从不使用file_get_contents()来获取URL，我不喜欢使用那样的流包装器，感觉有点不稳定。 - Orbling

7

您可以在选项中添加 'ignore_errors' => true：

$options = [
    'http' => [
        'ignore_errors' => true,
        'header' => "Content-Type: application/json\r\n",
    ],
];
$context = stream_context_create($options);
$result = file_get_contents('http://example.com', false, $context);

在这种情况下，您将能够阅读来自服务器的响应。

- alniks

5

为了避免双重请求，正如Orbling在ynh的答案中所评论的那样，您可以将他们的答案结合起来。如果您首先获得了有效的响应，请使用该响应。如果没有，请找出问题所在（如果需要）。

$urlToGet = 'http://somenotrealurl.com/notrealpage';
$pageDocument = @file_get_contents($urlToGet);
if ($pageDocument === false) {
     $headers = get_headers($urlToGet);
     $responseCode = substr($headers[0], 9, 3);
     // Handle errors based on response code
     if ($responseCode == '404') {
         //do something, page is missing
     }
     // Etc.
} else {
     // Use $pageDocument, echo or whatever you are doing
}

- Kuijkens

5

简单实用（易于在任何地方使用）：

function file_contents_exist($url, $response_code = 200)
{
    $headers = get_headers($url);

    if (substr($headers[0], 9, 3) == $response_code)
    {
        return TRUE;
    }
    else
    {
        return FALSE;
    }
}

示例：

$file_path = 'http://www.google.com';

if(file_contents_exist($file_path))
{
    $file = file_get_contents($file_path);
}

- tfont

3

$url = 'https://www.yourdomain.com';

普通

function checkOnline($url) {
    $headers = get_headers($url);
    $code = substr($headers[0], 9, 3);
    if ($code == 200) {
        return true;
    }
    return false;
}

if (checkOnline($url)) {
    // URL is online, do something..
    $getURL = file_get_contents($url);     
} else {
    // URL is offline, throw an error..
}

专业版

if (substr(get_headers($url)[0], 9, 3) == 200) {
    // URL is online, do something..
}

什么鬼水平

(substr(get_headers($url)[0], 9, 3) == 200) ? echo 'Online' : echo 'Offline';

- SixSense

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ynh · Accepted Answer

130

你需要检查HTTP响应状态码：

function get_http_response_code($url) {
    $headers = get_headers($url);
    return substr($headers[0], 9, 3);
}
if(get_http_response_code('http://somenotrealurl.com/notrealpage') != "200"){
    echo "error";
}else{
    file_get_contents('http://somenotrealurl.com/notrealpage');
}

- ynh

7

如果您需要知道请求失败的原因，例如检查状态码（例如404与503可能需要有所不同的处理），则此技术比我的更可取。如果不需要知道失败原因，则此技术可能会引入两个请求，这时忽略错误可能更好。 - Orbling

1

虽然这是一个不错的解决方案，但它没有考虑其他像500一样的HTTP错误代码。因此，一个简单的调整可能是： $headers = get_headers($uri);if (stripos($headers[0], '40') !== false || stripos($headers[0], '50') !== false) {...处理错误...} - YOMorales

18

我认为这段代码有问题。如果file_get_contents返回false，你应该只调用一次get_headers。重复调用每个URL并没有多大意义，除非你预计大多数的URL都会失败。很遗憾的是，如果出现4xx或5xx状态码， $http_response_header将为空。因此我们根本不需要get_headers。 - mgutt

太棒了！谢谢。 - moreirapontocom

1

这段代码有些浪费，因为它会发出两次相同的请求。最好检查$http_response_header - https://www.php.net/manual/en/reserved.variables.httpresponseheader.php - donatJ

我非常同意mgutt的观点 - file_get_contents在设计上存在严重缺陷。它会削减效率一半。我怀疑使用curl是更好的选择，就效率而言。 - Daniel Bengtsson