如何在C#中比较两个字符串,忽略大小写、空格和任何换行符。我还需要检查如果两个字符串都为null,则将其标记为相同。
谢谢!
你应该通过删除不需要比较的字符来规范化每个字符串,然后可以使用忽略大小写的StringComparison
执行String.Equals
。
像这样:
string s1 = "HeLLo wOrld!";
string s2 = "Hello\n WORLd!";
string normalized1 = Regex.Replace(s1, @"\s", "");
string normalized2 = Regex.Replace(s2, @"\s", "");
bool stringEquals = String.Equals(
normalized1,
normalized2,
StringComparison.OrdinalIgnoreCase);
Console.WriteLine(stringEquals);
这里首先使用Regex.Replace
删除所有的空格字符。对于两个字符串都为空的特殊情况,此处未作处理,但您可以在执行字符串规范化之前轻松处理该情况。
Regex.Replace
会对性能产生影响吗? - JDandChips这可能也起作用。
String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase | CompareOptions.IgnoreSymbols) == 0
编辑:
IgnoreSymbols:表示字符串比较必须忽略符号,例如空格字符、标点符号、货币符号、百分号、数学符号、&符号等。
删除所有不需要的字符,然后使用ToLower()方法忽略大小写。
编辑:虽然以上方法可以工作,但最好使用StringComparison.OrdinalIgnoreCase。只需将其作为第二个参数传递给Equals方法即可。
StringComparison.OrdinalIgnoreCase
方法不能解决忽略换行符的问题。 - cederlof首先,通过正则表达式从两个字符串中替换所有的空格,然后使用参数 ignoreCase = true 的 String.Compare
方法进行比较。
string a = System.Text.RegularExpressions.Regex.Replace("void foo", @"\s", "");
string b = System.Text.RegularExpressions.Regex.Replace("voidFoo", @"\s", "");
bool isTheSame = String.Compare(a, b, true) == 0;
char.IsWhiteSpace(ch)
来确定空格,并使用char.ToLowerInvariant(ch)
进行大小写不敏感(如果需要)。在我的测试中,我的解决方案比基于正则表达式的解决方案快5倍到8倍。我的类还实现了IEqualityComparer的GetHashCode(obj)
方法,使用另一个SO答案中的此代码。此GetHashCode(obj)
也忽略空格,并可选地忽略大小写。private class StringCompIgnoreWhiteSpace : IEqualityComparer<string>
{
public bool Equals(string strx, string stry)
{
if (strx == null) //stry may contain only whitespace
return string.IsNullOrWhiteSpace(stry);
else if (stry == null) //strx may contain only whitespace
return string.IsNullOrWhiteSpace(strx);
int ix = 0, iy = 0;
for (; ix < strx.Length && iy < stry.Length; ix++, iy++)
{
char chx = strx[ix];
char chy = stry[iy];
//ignore whitespace in strx
while (char.IsWhiteSpace(chx) && ix < strx.Length)
{
ix++;
chx = strx[ix];
}
//ignore whitespace in stry
while (char.IsWhiteSpace(chy) && iy < stry.Length)
{
iy++;
chy = stry[iy];
}
if (ix == strx.Length && iy != stry.Length)
{ //end of strx, so check if the rest of stry is whitespace
for (int iiy = iy + 1; iiy < stry.Length; iiy++)
{
if (!char.IsWhiteSpace(stry[iiy]))
return false;
}
return true;
}
if (ix != strx.Length && iy == stry.Length)
{ //end of stry, so check if the rest of strx is whitespace
for (int iix = ix + 1; iix < strx.Length; iix++)
{
if (!char.IsWhiteSpace(strx[iix]))
return false;
}
return true;
}
//The current chars are not whitespace, so check that they're equal (case-insensitive)
//Remove the following two lines to make the comparison case-sensitive.
chx = char.ToLowerInvariant(chx);
chy = char.ToLowerInvariant(chy);
if (chx != chy)
return false;
}
//If strx has more chars than stry
for (; ix < strx.Length; ix++)
{
if (!char.IsWhiteSpace(strx[ix]))
return false;
}
//If stry has more chars than strx
for (; iy < stry.Length; iy++)
{
if (!char.IsWhiteSpace(stry[iy]))
return false;
}
return true;
}
public int GetHashCode(string obj)
{
if (obj == null)
return 0;
int hash = 17;
unchecked // Overflow is fine, just wrap
{
for (int i = 0; i < obj.Length; i++)
{
char ch = obj[i];
if(!char.IsWhiteSpace(ch))
//use this line for case-insensitivity
hash = hash * 23 + char.ToLowerInvariant(ch).GetHashCode();
//use this line for case-sensitivity
//hash = hash * 23 + ch.GetHashCode();
}
}
return hash;
}
}
private static void TestComp()
{
var comp = new StringCompIgnoreWhiteSpace();
Console.WriteLine(comp.Equals("abcd", "abcd")); //true
Console.WriteLine(comp.Equals("abCd", "Abcd")); //true
Console.WriteLine(comp.Equals("ab Cd", "Ab\n\r\tcd ")); //true
Console.WriteLine(comp.Equals(" ab Cd", " A b" + Environment.NewLine + "cd ")); //true
Console.WriteLine(comp.Equals(null, " \t\n\r ")); //true
Console.WriteLine(comp.Equals(" \t\n\r ", null)); //true
Console.WriteLine(comp.Equals("abcd", "abcd h")); //false
Console.WriteLine(comp.GetHashCode(" a b c d")); //-699568861
//This is -699568861 if you #define StringCompIgnoreWhiteSpace_CASE_INSENSITIVE
// Otherwise it's -1555613149
Console.WriteLine(comp.GetHashCode("A B c \t d"));
}
private static void SpeedTest()
{
const int loop = 100000;
string first = "a bc d";
string second = "ABC D";
var compChar = new StringCompIgnoreWhiteSpace();
Stopwatch sw1 = Stopwatch.StartNew();
for (int i = 0; i < loop; i++)
{
bool equals = compChar.Equals(first, second);
}
sw1.Stop();
Console.WriteLine(string.Format("char time = {0}", sw1.Elapsed)); //char time = 00:00:00.0361159
var compRegex = new StringCompIgnoreWhiteSpaceRegex();
Stopwatch sw2 = Stopwatch.StartNew();
for (int i = 0; i < loop; i++)
{
bool equals = compRegex.Equals(first, second);
}
sw2.Stop();
Console.WriteLine(string.Format("regex time = {0}", sw2.Elapsed)); //regex time = 00:00:00.2773072
}
private class StringCompIgnoreWhiteSpaceRegex : IEqualityComparer<string>
{
public bool Equals(string strx, string stry)
{
if (strx == null)
return string.IsNullOrWhiteSpace(stry);
else if (stry == null)
return string.IsNullOrWhiteSpace(strx);
string a = System.Text.RegularExpressions.Regex.Replace(strx, @"\s", "");
string b = System.Text.RegularExpressions.Regex.Replace(stry, @"\s", "");
return String.Compare(a, b, true) == 0;
}
public int GetHashCode(string obj)
{
if (obj == null)
return 0;
string a = System.Text.RegularExpressions.Regex.Replace(obj, @"\s", "");
return a.GetHashCode();
}
}
ix < strx.Length
(以及iy < stry.Length
)应该改为ix < strx.Length - 1
。 - Itai Bar-Haim我会建议先在比较之前从字符串中删除您不想比较的字符。如果性能是一个问题,您可以考虑存储已经删除了这些字符的每个字符串版本。
或者,您可以编写一个比较程序,跳过您想忽略的字符。但对我来说,那似乎只是更多的工作。
public static string ExceptChars(this string str, IEnumerable<char> toExclude)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
char c = str[i];
if (!toExclude.Contains(c))
sb.Append(c);
}
return sb.ToString();
}
public static bool SpaceCaseInsenstiveComparision(this string stringa, string stringb)
{
return (stringa==null&&stringb==null)||stringa.ToLower().ExceptChars(new[] { ' ', '\t', '\n', '\r' }).Equals(stringb.ToLower().ExceptChars(new[] { ' ', '\t', '\n', '\r' }));
}
"Te st".SpaceCaseInsenstiveComparision("Te st");
.ToLower()
方法(因为它会创建一个新的字符串),而是使用StringComparison.OrdinalIgnoreCase
(它也更快)。 - Ivaylo Slavov另一个选择是使用LINQ SequenceEquals
方法,根据我的测试,它比其他答案中使用的正则表达式方法快两倍以上,并且非常易于阅读和维护。
public static bool Equals_Linq(string s1, string s2)
{
return Enumerable.SequenceEqual(
s1.Where(c => !char.IsWhiteSpace(c)).Select(char.ToUpperInvariant),
s2.Where(c => !char.IsWhiteSpace(c)).Select(char.ToUpperInvariant));
}
public static bool Equals_Regex(string s1, string s2)
{
return string.Equals(
Regex.Replace(s1, @"\s", ""),
Regex.Replace(s2, @"\s", ""),
StringComparison.OrdinalIgnoreCase);
}
以下是我使用的简单性能测试代码:
var s1 = "HeLLo wOrld!";
var s2 = "Hello\n WORLd!";
var watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
Equals_Linq(s1, s2);
}
Console.WriteLine(watch.Elapsed); // ~1.7 seconds
watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
Equals_Regex(s1, s2);
}
Console.WriteLine(watch.Elapsed); // ~4.6 seconds
这种方法并不是为了性能而优化的,而是为了完整性。
null
代码片段:
public static class StringHelper
{
public static bool AreEquivalent(string source, string target)
{
if (source == null) return target == null;
if (target == null) return false;
var normForm1 = Normalize(source);
var normForm2 = Normalize(target);
return string.Equals(normForm1, normForm2);
}
private static string Normalize(string value)
{
Debug.Assert(value != null);
// normalize unicode, combining characters, diacritics
value = value.Normalize(NormalizationForm.FormC);
// normalize new lines to white space
value = value.Replace("\r\n", "\n").Replace("\r", "\n");
// normalize white space
value = Regex.Replace(value, @"\s", string.Empty);
// normalize casing
return value.ToLowerInvariant();
}
}
Trim()
去除所有的空格。StringComparison.OrdinalIgnoreCase
忽略大小写敏感性,例如:stringA.Equals(stringB, StringComparison.OrdinalIgnoreCase)