PHP删除网址字符串中所有包含并且之后的内容，包括.html

Question

3

我正在尝试删除网址字符串中'.html'及其后面的所有内容。当前（失败的）代码如下：

$input = 'http://example.com/somepage.html?foo=bar&baz=x';
$result = preg_replace("/(.html)[^.html]+$/i",'',$input);

期望的结果:

value of $result is 'http://example.com/somepage'

以下是一些与输入 $input 对应的相同结果 $result 的其他示例:

http://example.com/somepage
http://example.com/somepage.html
http://example.com/somepage.html?url=http://example.com/index.html

- robotrobot

3个回答

3

为什么不使用 parse_url 替代？

- Tamás Pap

返回的数组中，路径元素仍然会包含“.html”吗？ - robotrobot

它会，但是简单的rtrim将其移除，你可以只使用内置函数来解决这个问题。 - Tamás Pap

解析URL的底层实现不是使用正则表达式吗？ - robotrobot

0

如果您在使用 preg_replace() 时遇到语法问题，您也可以使用 explode()：

$input = explode(".html", $input);
$result = $input[0];

- Robert

我喜欢这种简单明了的风格。（会在 http://example.com/somedir.html/somefile.html 上出现问题，是的，那是一个可能的 URL 哈哈） - sg3s

不，".html" 作为分隔符，所以上述解决方案在您的示例中不会出错。如果您将 "," 视为 CSV 字符串中的分隔符，则情况相同。 - Robert

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Lekensteyn · Accepted Answer

你的正则表达式错误，它只匹配以<一个字符> "html" <任意多个字符 (可以是., h, t, m或l)> 结尾的字符串。由于 preg_replace 如果没有匹配到就会返回原始字符串，所以你只需要匹配字面上的 .html 并忽略它后面的任何内容即可。

$result = preg_replace('/\.html.*/', '', $input);