多个结果的 .indexOf 函数

Question

多个结果的 .indexOf 函数

6

假设我有一段文本，我想找到每个逗号的位置。字符串的简短版本如下：

string s = "A lot, of text, with commas, here and,there";

理想情况下，我会使用类似以下的东西：

int[] i = s.indexOf(',');

但是，由于indexOf只返回第一个逗号，因此我改为执行以下操作：

List<int> list = new List<int>();
for (int i = 0; i < s.Length; i++)
{
   if (s[i] == ',')
      list.Add(i);
}

有没有其他更优化的做法？

- GuruMeditation

2

你在这里实际上想要做什么？ - Schroedingers Cat

2

如果您想在以后分离字符串，最好使用split。 - Hyperboreus

@Shroedingers Cat - 我正在清理字符串以传递给我正在开发的 NLP 库。我的想法是定位它们的位置，确定它们是否被正确使用，并确定它们是否以数字形式使用，例如“3,14”。 - GuruMeditation

@Hyperboreus - 我不需要把它们分开。 - GuruMeditation

6个回答

8

你可以使用Regex.Matches(string, string)方法。它会返回一个MatchCollection，然后你可以确定Match.Index。MSDN有一个很好的例子，

``` using System; using System.Text.RegularExpressions; ```

public class Example
{
   public static void Main()
   {
      string pattern = @"\b\w+es\b";
      string sentence = "Who writes these notes?";

      foreach (Match match in Regex.Matches(sentence, pattern))
         Console.WriteLine("Found '{0}' at position {1}", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found 'writes' at position 4
//       Found 'notes' at position 17

- user195488

我曾经简要地研究过正则表达式，但之前没有任何经验，无法弄清如何使用它。让我试试这个。 - GuruMeditation

没错，这个可以。需要对它进行基准测试，但在此期间将其标记为答案。如果没有其他方法，这是一种更清晰的方法。 - GuruMeditation

确保将您的正则表达式保存在静态上下文中。创建正则表达式的初始成本比运行正则表达式要高得多。 - Kyle W

@Kyle W - 谢谢你的提示，我会去做的。 - GuruMeditation

6

IndexOf还可以添加另一个参数用于指定开始查找的位置。您可以将该参数设置为上次已知逗号位置+1。例如：

string s = "A lot, of text, with commas, here and, there";
int loc = s.IndexOf(',');
while (loc != -1) {
    Console.WriteLine(loc);
    loc = s.IndexOf(',', loc + 1);
}

- Corey Ogburn

1

是的，我之前尝试过这种方法，但在某个地方读到说这实际上比只使用for循环更慢。 - GuruMeditation

2

您可以使用带有起始索引的IndexOf方法的重载来获取下一个逗号，但仍需要在循环中执行此操作，性能与您目前的代码几乎相同。

您可以使用正则表达式查找所有逗号，但这会产生相当大的开销，因此它不比您目前的代码更优化。

您可以编写一个LINQ查询以不同的方式执行此操作，但这也会产生一些开销，因此它不比您目前的代码更优化。

因此，有许多替代方法，但没有任何更优化的方法。

- Guffa

0

有点不寻常，但为什么不使用分割呢？可能比迭代整个字符串更温和。

string longString = "Some, string, with, commas.";
string[] splitString = longString.Split(",");
int numSplits = splitString.Length - 1;
Debug.Log("number of commas "+numSplits);
Debug.Log("first comma index = "+GetIndex(splitString, 0)+" second comma index = "+GetIndex(splitString, 1));

public int GetIndex(string[] stringArray, int num)
{
    int charIndex = 0;
    for (int n = num; n >= 0; n--)
    {
        charIndex+=stringArray[n].Length;
    }
    return charIndex + num;
}

- Adriano Mancino

0

这是另一种方法 - 一种获取特定子字符串后面的字符串值的方法，在我的情况下是id=（一个以分号结尾的字符串ID值），在较大的字符串中有多个id。我希望它继续到最后。这是我的方法：

   public static List<string> GetAllIDsFromString(string largerString, string findThis)
   {
        List<string> listOfIDs = new List<string>();
        char lastChar = findThis.Last();
        do
        {
            string idSearch = largerString.Substring(largerString.IndexOf(findThis));
            string foundID = idSearch.Split(';')[0]?.Substring(idSearch.IndexOf(lastChar) + 1);
            idSearch = idSearch.Substring(idSearch.IndexOf(lastChar) + 1);
            listOfIDs.Add(foundID);
            largerString = idSearch;

         } while (largerString.IndexOf(findThis) > -1);

         return listOfIDs;
    }

- pkucas

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- fubo · Accepted Answer

这里我有一个针对此事的扩展方法，与IndexOf的用法相同：

public static IEnumerable<int> AllIndexesOf(this string str, string searchstring)
{
    int minIndex = str.IndexOf(searchstring);
    while (minIndex != -1)
    {
        yield return minIndex;
        minIndex = str.IndexOf(searchstring, minIndex + searchstring.Length);
    }
}

所以您可以使用

s.AllIndexesOf(","); // 5    14    27    37

https://dotnetfiddle.net/DZdQ0L