Ruby on Rails URL验证（正则表达式）

Question

Ruby on Rails URL验证（正则表达式）

5

我正在尝试使用正则表达式来验证我的Rails模型中URL的格式。我已经在Rubular上使用URL http://trentscott.com 测试了正则表达式并得到匹配结果。

你知道为什么当我在Rails应用程序中测试时，它会失败并显示“名称无效”吗？

代码：

  url_regex = /^((http|https):\/\/)?[a-z0-9]+([-.]{1}[a-z0-9]+).[a-z]{2,5}(:[0-9]{1,5})?(\/.)?$/ix

  validates :serial, :presence => true
  validates :name, :presence => true,
                   :format    => {  :with => url_regex  }

- Trent Scott

一个小小的注意事项，实际上这个问题要求检查一个URL而不是一个域名，该域名是trentscott.com。 - ian

已编辑和修复。希望谷歌能够重新索引它。 - Michael Chaney

5个回答

10

(我喜欢Thomas Hupkens的答案，但是为了其他人的查看，我推荐使用Addressable)

不建议使用正则表达式来验证URL。

使用Ruby的URI库或者像Addressable这样的替代库，它们都可以轻松进行URL验证。与URI不同，Addressable还可以处理国际字符和顶级域名。

示例用法：

require 'addressable/uri'

Addressable::URI.parse("кц.рф") # Works

uri = Addressable::URI.parse("http://example.com/path/to/resource/")
uri.scheme
#=> "http"
uri.host
#=> "example.com"
uri.path
#=> "/path/to/resource/"

你可以构建自定义验证，例如：

class Example
  include ActiveModel::Validations

  ##
  # Validates a URL
  #
  # If the URI library can parse the value, and the scheme is valid
  # then we assume the url is valid
  #
  class UrlValidator < ActiveModel::EachValidator
    def validate_each(record, attribute, value)
      begin
        uri = Addressable::URI.parse(value)

        if !["http","https","ftp"].include?(uri.scheme)
          raise Addressable::URI::InvalidURIError
        end
      rescue Addressable::URI::InvalidURIError
        record.errors[attribute] << "Invalid URL"
      end
    end
  end

  validates :field, :url => true
end

代码来源

- danneu

1

在看了addressable之后，我认为它毫无疑问是最好的选择，谢谢。 - stephenmurdoch

+1 是可寻址的，但不要假设它会引发任何异常，因为它不会。Addressable::URI.parse 会尽力解析 URI，即使失败也不会抛出异常。例如，假设您想验证一个不正确的 URI，如：http://http://thing.com。Addressable 将调用 http 作为方案，并将 http 视为域名，因为它将冒号视为端口分隔符。不会引发任何错误。 - onetwopunch

7

您输入的网址（http://trentscott.com）没有子域名，但正则表达式正在检查是否有子域名。

domain_regex = /^((http|https):\/\/)[a-z0-9]*(\.?[a-z0-9]+)\.[a-z]{2,5}(:[0-9]{1,5})?(\/.)?$/ix

更新

你不需要在 ((http|https):\/\/) 后面加问号，除非协议有时会缺失。我还转义了 . 因为它将匹配任何字符。我不确定上面的分组是用来干什么的，但这里有一个更好的版本，支持破折号并按部分进行分组。

domain_regex = /^((http|https):\/\/) 
(([a-z0-9-\.]*)\.)?                  
([a-z0-9-]+)\.                        
([a-z]{2,5})
(:[0-9]{1,5})?
(\/)?$/ix

- cordsen

谢谢。这解决了错误，但是现在像“abcd”这样的条目是有效的。对如何修复有什么想法吗？ - Trent Scott

1

更新应该可以正常工作。我还做了一件事，就是将 [-.] 替换为 \。 - cordsen

这不处理国际域名，可以用ASCII表示，例如：www.xn--b1akcweg3a.xn--p1ai。是的，在您的域中会得到双破折号，这是合法的，以及顶级域（最右边的组件）长度超过3个字符。 - David Keener

@cordsen：如果我想在Ruby中为包含任何非ASCII字符或中文字符的URL编写正则表达式，该怎么办？例如，http://www.詹姆斯.com/。请问你能告诉我如何解决这个问题吗？ - huzefa biyawarwala

1

尝试一下。

它对我有效。

/(ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?/

- Premanandh Selvakumarasamy

0

这将包括国际主机处理，例如abc.com.it，其中.it部分是可选的。

match '/:site', to: 'controller#action' , constraints: { site: /[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}(.[a-zA-Z]{2,63})?/}, via: :get, :format => false

- Maged Makled

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Thomas Hupkens · Accepted Answer

在这里你不需要使用正则表达式。Ruby有一种更可靠的方法来做到这一点：

# Use the URI module distributed with Ruby:

require 'uri'

unless (url =~ URI::regexp).nil?
    # Correct URL
end

（本答案来自此帖子：）