如何在C#中将字符串真正拆分为字符串数组而不丢失其部分？

Question

如何在C#中将字符串真正拆分为字符串数组而不丢失其部分？

4

我拥有什么

string ImageRegPattern = @"http://[\w\.\/]*\.jpg|http://[\w\.\/]*\.png|http://[\w\.\/]*\.gif";
string a ="http://www.dsa.com/asd/jpg/good.jpgThis is a good dayhttp://www.a.com/b.pngWe are the Best friendshttp://www.c.com";

我想要什么

string[] s;
s[0] = "http://www.dsa.com/asd/jpg/good.jpg";
s[1] = "This is a good day";
s[2] = "http://www.a.com/b.png";
s[3] = "We are the Best friendshttp://www.c.com";

提示：
如果URL可以像下面这样分割，那就更好了，但如果不行也没关系。

s[3] = "We are the Best friends";
s[4] = "http://www.c.com";

问题是什么
我尝试使用下面的代码来分割字符串，

string[] s= Regex.Split(sourceString, ImageRegPattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

但结果不太好，似乎Split方法取出了所有与ImageRegPattern匹配的字符串。但是我希望它们留下来。我在MSDN的RegEx页面上检查了一下，似乎没有合适的方法满足我的需求。那么该怎么办呢？

- Albert Gao

1

我认为没有通用的解决方案来拆分该字符串（当然，你可以构造一些方法来做到这一点，但它将非常具体）。从RegEx中得不到任何返回，因为它是在匹配上进行拆分的。个人而言，我会更改字符串的格式，除非有充分的理由不这样做，否则应该只需向字符串添加分隔符即可。 - evanmcdonnal

2

给定一个逗号分隔的列表，Regex.Split("1,2,3", ",") 将返回数组 ["1","2","3"]。您提供的模式定义了分隔符，而不是您想要保留的内容。在这里，Regex.Split 不是您想要使用的工具。您正在尝试保留文本和分隔符，这不是 Split 的作用。 - Jim Mischel

4个回答

1

不要低估正则表达式的威力regex：

(.*?)([A-Z][\w\s]+(?=http|$))

说明：

(.*?)：匹配并捕获大写字母前的所有内容，其中你将找到url
(：开始捕获组
- [A-Z]：匹配一个大写字母
- [\w\s]+：匹配任何字符a-z、A-Z、0-9、_、\n、\r、\t、\f和空格，至少1次
- (?=http|$)：向前查找，检查接下来的内容是否为http或行尾
- )：关闭组（在此处你将找到文本）

在线演示

_{注意：此解决方案用于匹配字符串，而不是拆分字符串。}

- HamZa

0

我认为您需要一个多步骤的过程来插入一个分隔符，然后可以由String.Split命令使用：

resultString = Regex.Replace(rawString, @"(http://.*?/\w+\.(jpg|png|gif))", "|$1|", RegexOptions.IgnoreCase);
if (a.StartsWith("|")
   a = a.Substring(1);
string a = resultString.Split('|');

- Dave Michener

0

显而易见的答案当然是不要使用split，而是匹配图像模式并检索它们。话虽如此，使用split也并非不可能。

string ImageRegPattern = @"(?=(http://[\w./]*?\.jpg|http://[\w./]*?\.png|http://[\w./]*?\.gif))|(?<=(\.jpg|\.png|\.gif))"

这将匹配字符串中任何一个点，该点要么后跟图像URL，要么前面是.jpg、.gif或.png。

我真的不建议这样做，我只是说你可以这样做。

- melwil

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- FishBasketGordo · Accepted Answer

你需要像这个方法一样，先找到所有匹配项，然后将它们与它们之间不匹配的字符串一起收集到一个列表中。

更新：添加条件以处理如果没有找到匹配项的情况。

private static IEnumerable<string> InclusiveSplit
(
    string source, 
    string pattern
)
{
  List<string> parts = new List<string>();
  int currIndex = 0;

  // First, find all the matches. These are your separators.
  MatchCollection matches = 
      Regex.Matches(source, pattern, 
      RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

  // If there are no matches, there's nothing to split, so just return a
  // collection with just the source string in it.
  if (matches.Count < 1)
  {
    parts.Add(source);
  }
  else
  {
    foreach (Match match in matches)
    {
      // If the match begins after our current index, we need to add the
      // portion of the source string between the last match and the 
      // current match.
      if (match.Index > currIndex)
      {
        parts.Add(source.Substring(currIndex, match.Index - currIndex));
      }

      // Add the matched value, of course, to make the split inclusive.
      parts.Add(match.Value);

      // Update the current index so we know if the next match has an
      // unmatched substring before it.
      currIndex = match.Index + match.Length;
    }

    // Finally, check is there is a bit of unmatched string at the end of the 
    // source string.
    if (currIndex < source.Length)
      parts.Add(source.Substring(currIndex));
  }

  return parts;
}

你的样例输入的输出结果将如下所示：

[0] "http://www.dsa.com/asd/jpg/good.jpg"
[1] "This is a good day"
[2] "http://www.a.com/b.png"
[3] "We are the Best friendshttp://www.c.com"