file_get_contents()在URL中包含特殊字符时失败

Question

file_get_contents()在URL中包含特殊字符时失败

4

我需要获取一些包含瑞典字母的URL。如果以https://en.wikipedia.org/wiki/Åland_Islands为例，直接将其作为参数传递给file_get_contents函数是可以正常工作的。但是，如果先通过urlencode对该URL进行编码，那么调用将会失败，出现以下信息：

failed to open stream: No such file or directory

尽管file_get_contents函数的文档说明如下：

注意：如果你打开一个包含特殊字符（比如空格）的URI，你需要使用urlencode()对其进行编码。

因此，例如运行以下代码：

error_reporting(E_ALL);
ini_set("display_errors", true);

$url = urlencode("https://en.wikipedia.org/wiki/Åland_Islands");

$response = file_get_contents($url);
if($response === false) {
    die('file get contents has failed');
}
echo $response;

您会收到错误提示。如果您只是从代码中删除“urlencode”，那么它将正常运行。

我面临的问题是，我的URL中有一个参数来自提交的表单。由于PHP始终通过urlencode处理提交的值，因此我构建的URL中的瑞典字符将导致出现错误。

我该如何解决这个问题？

- Digital Ninja

2个回答

-1

使用这个

$usableURL = mb_convert_encoding($url,'HTML-ENTITIES');

- Alex Andrei

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dan Belden · Accepted Answer

问题很可能是由于urlencode转义了您的协议：

https://en.wikipedia.org/wiki/Åland_Islands
https%3A%2F%2Fen.wikipedia.org%2Fwiki%2F%C3%85land_Islands

这也是我曾经遇到的一个问题，只有通过尝试仅对必须进行转义的内容进行转义才能解决：

https://en.wikipedia.org/wiki/Åland_Islands
https://en.wikipedia.org/wiki/%C3%85land_Islands

这对于字符位置的不同会显得比较复杂。我通常选择编码补丁解决方案，但我与一些人合作时发现他们更喜欢只针对URL的动态段进行编码。

这是我的处理方法：

https://en.wikipedia.org/wiki/Åland_Islands
https%3A%2F%2Fen.wikipedia.org%2Fwiki%2F%C3%85land_Islands
https://en.wikipedia.org/wiki/%C3%85land_Islands

代码：

$url = 'https://en.wikipedia.org/wiki/Åland_Islands';
$encodedUrl = urlencode($url);
$fixedEncodedUrl = str_replace(['%2F', '%3A'], ['/', ':'], $encodedUrl);

希望这能有所帮助。