如何使用正则表达式解析文本中不带尾部斜杠的路径？

Question

如何使用正则表达式解析文本中不带尾部斜杠的路径？

3

我有一个包含很多路径以及其他文本的日志。我想从日志中获取没有尾随斜杠的特定路径。
如何使用正则表达式实现？

例如，文本如下：

some text /dir1/dir2/ some text
some text /dir1/dir3 some text

我想获取这些匹配项：

/dir1/dir2
/dir1/dir3

我尝试了使用正向先行断言的不同方法，例如：

\/dir1[^\s]*(?=\/)

但它们没有发挥作用。我会感激任何支持。

- tkmamedov

2

\/dir1(\/+[^\/\s]+)*？ - jhnc

@jhnc，请您把您的回答放在“答案部分”，这样我就可以将它标记为解决我的问题的答案了。 - tkmamedov

3个回答

1

根据您的定义，您正在寻找空格分隔集合中具有前导斜杠的任何内容。因此：

s = 'some text /dir1/dir2/ some text'

print([x for x in s.split() if x[0] == '/'])

输出：

/dir1/dir2/

无论您输入什么字符串，这都会起作用。

- Synthase

-1

\/.*\/\S*

\/ - 匹配正斜杠

.* - 匹配任意字符，无限次数

\/ - 匹配正斜杠

/\S* - 匹配任意非空白字符，无限次数

假设在/dir1/dir2/或/dir1/dir2后总是有一个空格，则此方法可行。

- SimpleNiko

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ryszard Czech · Accepted Answer

使用

\/dir1(?:\/[^\/\s]+)*

请查看正则表达式证明。

说明

--------------------------------------------------------------------------------
  \/                       '/'
--------------------------------------------------------------------------------
  dir1                     'dir1'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \/                       '/'
--------------------------------------------------------------------------------
    [^\/\s]+                 any character except: '\/', whitespace
                             (\n, \r, \t, \f, and " ") (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )*                       end of grouping