I have a URL like
https://endpoint/v1.0/album/id/photo/id/
其中endpoint
是一个变量。我想提取"/v1.0/album/id/photo/id/
"。
如何使用Ruby正则表达式提取"endpoint"后面的所有内容?
I have a URL like
https://endpoint/v1.0/album/id/photo/id/
其中endpoint
是一个变量。我想提取"/v1.0/album/id/photo/id/
"。
如何使用Ruby正则表达式提取"endpoint"后面的所有内容?
这里我们开始:
2.0.0-p451 :001 > require 'uri'
=> true
2.0.0-p451 :002 > URI('https://endpoint/v1.0/album/id/photo/id/').path
=> "/v1.0/album/id/photo/id/"
2.0.0-p451 :003 >
阅读此基本示例。
(?<name>)
和在末尾加上/x
标志以允许格式中的空格的简单正则表达式即可。url = 'https://endpoint/v1.0/album/id/photo/id/'
re = /
^ # beginning of string
(?<scheme> https? ) # http or s
:\/\/ # seperator
(?<domain> [[a-zA-Z0-9]\.-]+? ) # many alnum, -'s or .'s
(?<path> \/.+ ) # forward slash on is the path
/x
res = url.match re
res[:path] if res
这相比于URI而言相形见绌。
domain = 'endpoint'
link = "https://#{domain}/v1.0/album/id/photo/id/"
path = link.gsub("https://#{domain}", '')
# => "/v1.0/album/id/photo/id/"
您可以通过更改“domain”变量来调整域名。 我使用了String.gsub函数将您链接的第一部分替换为空字符串(第3行完成的正则表达式部分实际上非常简单!它只是http:// endpoint),这意味着路径是字符串中唯一保留的部分。
URI RFC文档用于解析URL的模式:
Appendix B. Parsing a URI Reference with a Regular Expression
As the "first-match-wins" algorithm is identical to the "greedy"
disambiguation method used by POSIX regular expressions, it is
natural and commonplace to use a regular expression for parsing the
potential five components of a URI reference.
The following line is the regular expression for breaking-down a
well-formed URI reference into its components.
Berners-Lee, et al. Standards Track [Page 50]
RFC 3986 URI Generic Syntax January 2005
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability;
they indicate the reference points for each subexpression (i.e., each
paired parenthesis). We refer to the value matched for subexpression
<n> as $<n>. For example, matching the above expression to
http://www.ics.uci.edu/pub/ietf/uri/#Related
results in the following subexpression matches:
$1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 = <undefined>
$7 = <undefined>
$8 = #Related
$9 = Related
where <undefined> indicates that the component is not present, as is
the case for the query component in the above example. Therefore, we
can determine the value of the five components as
scheme = $2
authority = $4
path = $5
query = $7
fragment = $9
基于此:
URL_REGEX = %r!^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?!
'https://endpoint/v1.0/album/id/photo/id/'.match(URL_REGEX).captures
# => ["https:",
# "https",
# "//endpoint",
# "endpoint",
# "/v1.0/album/id/photo/id/",
# nil,
# nil,
# nil,
# nil]
'https://endpoint/v1.0/album/id/photo/id/'.match(URL_REGEX).captures[4]
# => "/v1.0/album/id/photo/id/"