C#中用于匹配路径的正则表达式

6

我是一名初学者,需要使用正则表达式从下面的行中提取路径:

XXXX       c:\mypath1\test
YYYYYYY             c:\this is other path\longer
ZZ        c:\mypath3\file.txt

我需要实现一个方法,返回给定行的路径。第一列是一个单词,有1个或多个字符,从不为空,第二列是路径。分隔符可以是1个或多个空格,或者一个或多个制表符,或两者都。


输入是一个文件还是逐行输入? - Royi Namir
是的。对于行和文件的处理是不同的。除非你从文本文件逐行阅读它,然后你还需要注意换行符等。 - Royi Namir
3个回答

7

我觉得您只是想要

string[] bits = line.Split(new char[] { '\t', ' ' }, 2,
                           StringSplitOptions.RemoveEmptyEntries);
// TODO: Check that bits really has two entries
string path = bits[1];

(假设第一列不包含空格或制表符。)
编辑:作为正则表达式,您可能只需要执行以下操作:
Regex regex = new Regex(@"^[^ \t]+[ \t]+(.*)$");

示例代码:

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main(string[] args)
    {
        string[] lines = 
        {
            @"XXXX       c:\mypath1\test",
            @"YYYYYYY             c:\this is other path\longer",
            @"ZZ        c:\mypath3\file.txt"
        };

        foreach (string line in lines)
        {
            Console.WriteLine(ExtractPathFromLine(line));
        }
    }

    static readonly Regex PathRegex = new Regex(@"^[^ \t]+[ \t]+(.*)$");

    static string ExtractPathFromLine(string line)
    {
        Match match = PathRegex.Match(line);
        if (!match.Success)
        {
            throw new ArgumentException("Invalid line");
        }
        return match.Groups[1].Value;
    }    
}

路径中可能包含空格,因此第二个路径非常糟糕。 - xanatos
@Jon:抱歉,我需要一个正则表达式,因为我正在使用.NET 1.1,而且我无法访问StringSplitOptions.RemoveEmptyEntries重载。不管怎样,还是谢谢! - Daniel Peñalba
@DanielPeñalba:一开始说这个会很有用——现在要求.NET 1.1非常罕见。我会进行编辑。 - Jon Skeet

5
StringCollection resultList = new StringCollection();
try {
    Regex regexObj = new Regex(@"(([a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)?(\\?(?:[^\\/:*?""<>|\r\n]+\\)+)[^\\/:*?""<>|\r\n]+)");
    Match matchResult = regexObj.Match(subjectString);
    while (matchResult.Success) {
        resultList.Add(matchResult.Groups[1].Value);
        matchResult = matchResult.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

故障:

@"
(                             # Match the regular expression below and capture its match into backreference number 1
   (                             # Match the regular expression below and capture its match into backreference number 2
      |                             # Match either the regular expression below (attempting the next alternative only if this one fails)
         [a-z]                         # Match a single character in the range between “a” and “z”
         :                             # Match the character “:” literally
      |                             # Or match regular expression number 2 below (the entire group fails if this one fails to match)
         \\                            # Match the character “\” literally
         \\                            # Match the character “\” literally
         [a-z0-9_.$]                   # Match a single character present in the list below
                                          # A character in the range between “a” and “z”
                                          # A character in the range between “0” and “9”
                                          # One of the characters “_.$”
            +                             # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
         \\                            # Match the character “\” literally
         [a-z0-9_.$]                   # Match a single character present in the list below
                                          # A character in the range between “a” and “z”
                                          # A character in the range between “0” and “9”
                                          # One of the characters “_.$”
            +                             # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   )?                            # Between zero and one times, as many times as possible, giving back as needed (greedy)
   (                             # Match the regular expression below and capture its match into backreference number 3
      \\                            # Match the character “\” literally
         ?                             # Between zero and one times, as many times as possible, giving back as needed (greedy)
      (?:                           # Match the regular expression below
         [^\\/:*?""<>|\r\n]             # Match a single character NOT present in the list below
                                          # A \ character
                                          # One of the characters “/:*?""<>|”
                                          # A carriage return character
                                          # A line feed character
            +                             # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
         \\                            # Match the character “\” literally
      )+                            # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   )
   [^\\/:*?""<>|\r\n]             # Match a single character NOT present in the list below
                                    # A \ character
                                    # One of the characters “/:*?""<>|”
                                    # A carriage return character
                                    # A line feed character
      +                             # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
"

1
这看起来非常复杂,基本上需要获取第一组空格/制表符之后的所有内容。 - Jon Skeet
@JonSkeet 我同意。这是一个更通用的 Windows 路径正则表达式。 - FailedDev
@FailedDev,例如对于“k:\ test \ test”就无法正常工作。如果我试图传递像**\test\t><*st**这样的路径,它将是有效的。我在这里找到了这个正则表达式^(?:[c-zC-Z]\:|\\)(\\[a-zA-Z_\-\s0-9\.]+)+。我认为它可以正确地验证路径。发现它在此处(https://www.codeproject.com/Tips/216238/Regular-Expression-to-Validate-File-Path-and-Exten)。 - Potato

0

正则表达式测试器是一个很好的网站,可以快速测试正则表达式。

Regex.Matches(input, "([a-zA-Z]*:[\\[a-zA-Z0-9 .]*]*)");

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接