简化URL清理

Question

简化URL清理

4

我正在尝试进行一些基本的URL清理，以便于...

www.google.com
www.google.com/
http://google.com
http://google.com/
https://google.com
https://google.com/

被替换为http://www.google.com（或在https://开头的情况下为https://www.google.com）。

基本上，我想检查一个正则表达式中是否有http/https开头和/结尾。

我尝试了这样的代码：

"https://google.com".match(/^(http:\/\/|https:\/\/)(.*)(\/)*$/) 在这种情况下，我得到： => #<MatchData "https://google.com" 1:"https://" 2:"google.com" 3:nil> 这是好的。

不幸的是，对于：

"https://google.com/".match(/^(http:\/\/|https:\/\/)(.*)(\/)*$/) 我得到： => #<MatchData "https://google.com/" 1:"https://" 2:"google.com/" 3:nil> ，但我想要 2:"google.com" 3:"/"

有什么想法吗？

- Marcin Doliwa

1

顺便问一下，你是怎么处理最后一个带有额外空格的 URL 的？ - Jerry

好问题，谢谢。我会处理它的。 - Marcin Doliwa

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tom Lord · Accepted Answer

很明显，如果你发现错误的话；）

你试图：

^(http:\/\/|https:\/\/)(.*)(\/)*$

答案是使用以下方法：

^(http:\/\/|https:\/\/)(.*?)(\/)*$

这使得该运算符成为“非贪婪”的，因此尾随的正斜杠不会被“.”运算符吞掉。

编辑：

实际上，您应该使用：

^(http:\/\/|https:\/\/)?(www\.)?(.*?)(\/)*$

那么这样，你也将匹配前两个没有“http(s)：//”的示例。你还分离出“www”部分的值/存在。实际运用中：http://www.rubular.com/r/VUoIUqCzzX 编辑2：

我有点无聊，想把它完善一下:P

给你：

^(https?:\/\/)?(?:www\.)?(.*?)\/?$

现在，你需要做的就是用第一个匹配项（如果为nil，则使用“http://”），然后添加“www.”和第二个匹配项来替换你的网站。

示例：http://www.rubular.com/r/YLeO5cXcck 18个月后的编辑：

看看我的神奇Ruby宝石，它将帮助解决你的问题！ https://github.com/tom-lord/regexp-examples

/(https?:\/\/)?(?:www\.)?google\.com\/?/.examples # => 
  ["google.com",
   "google.com/",
   "www.google.com",
   "www.google.com/",
   "http://google.com",
   "http://google.com/",
   "http://www.google.com",
   "http://www.google.com/",
   "https://google.com",
   "https://google.com/",
   "https://www.google.com",
   "https://www.google.com/"]

/(https?:\/\/)?(?:www\.)?google\.com\/?/.examples.map(&:subgroups) # =>
  [[],
   [],
   [],
   [],
   ["http://"],
   ["http://"],
   ["http://"],
   ["http://"],
   ["https://"],
   ["https://"],
   ["https://"],
   ["https://"]]