我该如何从字符串中删除字符?比如:"My name @is ,Wan.;'; Wan"
。
我想要从该字符串中删除字符'@',',','.',';','\''
,使其变为"My name is Wan Wan"
var str = "My name @is ,Wan.;'; Wan";
var charsToRemove = new string[] { "@", ",", ".", ";", "'" };
foreach (var c in charsToRemove)
{
str = str.Replace(c, string.Empty);
}
但是如果您想删除所有非字母字符,我可以建议另一种方法
var str = "My name @is ,Wan.;'; Wan";
str = new string((from c in str
where char.IsWhiteSpace(c) || char.IsLetterOrDigit(c)
select c
).ToArray());
简单:
String.Join("", "My name @is ,Wan.;'; Wan".Split('@', ',' ,'.' ,';', '\''));
Join
替换为 Concat
:string.Concat("My name @is ,Wan.;'; Wan".Split('@', ',' ,'.' ,';', '\''));
- zcoop98听起来这似乎是一个正则表达式的理想应用,因为它是专门设计用于快速文本操作的引擎。在这种情况下:
Regex.Replace("He\"ll,o Wo'r.ld", "[@,\\.\";'\\\\]", string.Empty)
比较不同建议(以及在目标具有不同大小和位置的单字符替换上下文中进行比较)。
在这种特定情况下,将字符串拆分为目标并在替换(在此情况下为空字符串)上进行连接是最快的,至少快了3倍。最终,性能取决于替换的数量,在源中替换的位置以及源的大小。#ymmv
(完整结果请点击此处)
| Test | Compare | Elapsed |
|---------------------------|---------|--------------------------------------------------------------------|
| SplitJoin | 1.00x | 29023 ticks elapsed (2.9023 ms) [in 10K reps, 0.00029023 ms per] |
| Replace | 2.77x | 80295 ticks elapsed (8.0295 ms) [in 10K reps, 0.00080295 ms per] |
| RegexCompiled | 5.27x | 152869 ticks elapsed (15.2869 ms) [in 10K reps, 0.00152869 ms per] |
| LinqSplit | 5.43x | 157580 ticks elapsed (15.758 ms) [in 10K reps, 0.0015758 ms per] |
| Regex, Uncompiled | 5.85x | 169667 ticks elapsed (16.9667 ms) [in 10K reps, 0.00169667 ms per] |
| Regex | 6.81x | 197551 ticks elapsed (19.7551 ms) [in 10K reps, 0.00197551 ms per] |
| RegexCompiled Insensitive | 7.33x | 212789 ticks elapsed (21.2789 ms) [in 10K reps, 0.00212789 ms per] |
| Regex Insensitive | 7.52x | 218164 ticks elapsed (21.8164 ms) [in 10K reps, 0.00218164 ms per] |
(注意: Perf
和 Vs
是我编写的计时扩展)
void test(string title, string sample, string target, string replacement) {
var targets = target.ToCharArray();
var tox = "[" + target + "]";
var x = new Regex(tox);
var xc = new Regex(tox, RegexOptions.Compiled);
var xci = new Regex(tox, RegexOptions.Compiled | RegexOptions.IgnoreCase);
// no, don't dump the results
var p = new Perf/*<string>*/();
p.Add(string.Join(" ", title, "Replace"), n => targets.Aggregate(sample, (res, curr) => res.Replace(new string(curr, 1), replacement)));
p.Add(string.Join(" ", title, "SplitJoin"), n => String.Join(replacement, sample.Split(targets)));
p.Add(string.Join(" ", title, "LinqSplit"), n => String.Concat(sample.Select(c => targets.Contains(c) ? replacement : new string(c, 1))));
p.Add(string.Join(" ", title, "Regex"), n => Regex.Replace(sample, tox, replacement));
p.Add(string.Join(" ", title, "Regex Insentive"), n => Regex.Replace(sample, tox, replacement, RegexOptions.IgnoreCase));
p.Add(string.Join(" ", title, "Regex, Uncompiled"), n => x.Replace(sample, replacement));
p.Add(string.Join(" ", title, "RegexCompiled"), n => xc.Replace(sample, replacement));
p.Add(string.Join(" ", title, "RegexCompiled Insensitive"), n => xci.Replace(sample, replacement));
var trunc = 40;
var header = sample.Length > trunc ? sample.Substring(0, trunc) + "..." : sample;
p.Vs(header);
}
void Main()
{
// also see https://dev59.com/sms05IYBdhLWcg3wJ-uQ
"Control".Perf(n => { var s = "*"; });
var text = "My name @is ,Wan.;'; Wan";
var clean = new[] { '@', ',', '.', ';', '\'' };
test("stackoverflow", text, string.Concat(clean), string.Empty);
var target = "o";
var f = "x";
var replacement = "1";
var fillers = new Dictionary<string, string> {
{ "short", new String(f[0], 10) },
{ "med", new String(f[0], 300) },
{ "long", new String(f[0], 1000) },
{ "huge", new String(f[0], 10000) }
};
var formats = new Dictionary<string, string> {
{ "start", "{0}{1}{1}" },
{ "middle", "{1}{0}{1}" },
{ "end", "{1}{1}{0}" }
};
foreach(var filler in fillers)
foreach(var format in formats) {
var title = string.Join("-", filler.Key, format.Key);
var sample = string.Format(format.Value, target, filler.Value);
test(title, sample, target, replacement);
}
}
针对您的问题不是那么具体,但可以通过在正则表达式中列出可接受的字符来将一个字符串中的所有标点符号(除空格外)全部删除:
string dirty = "My name @is ,Wan.;'; Wan";
// only space, capital A-Z, lowercase a-z, and digits 0-9 are allowed in the string
string clean = Regex.Replace(dirty, "[^A-Za-z0-9 ]", "");
请注意在数字9后面有一个空格,以免从您的句子中删除空格。第三个参数是一个空字符串,用于替换不属于正则表达式的任何子字符串。
string x = "My name @is ,Wan.;'; Wan";
string modifiedString = x.Replace("@", "").Replace(",", "").Replace(".", "").Replace(";", "").Replace("'", "");
这是我编写的一种方法,它采用了略微不同的方法。与指定要删除的字符不同,我告诉我的方法我想保留哪些字符 - 它将删除所有其他字符。
在OP的示例中,他只想保留字母字符和空格。以下是调用我的方法的示例 (C#演示):
var str = "My name @is ,Wan.;'; Wan";
// "My name is Wan Wan"
var result = RemoveExcept(str, alphas: true, spaces: true);
这是我的方法:
/// <summary>
/// Returns a copy of the original string containing only the set of whitelisted characters.
/// </summary>
/// <param name="value">The string that will be copied and scrubbed.</param>
/// <param name="alphas">If true, all alphabetical characters (a-zA-Z) will be preserved; otherwise, they will be removed.</param>
/// <param name="numerics">If true, all numeric characters (0-9) will be preserved; otherwise, they will be removed.</param>
/// <param name="dashes">If true, all dash characters (-) will be preserved; otherwise, they will be removed.</param>
/// <param name="underlines">If true, all underscore characters (_) will be preserved; otherwise, they will be removed.</param>
/// <param name="spaces">If true, all whitespace (e.g. spaces, tabs) will be preserved; otherwise, they will be removed.</param>
/// <param name="periods">If true, all dot characters (".") will be preserved; otherwise, they will be removed.</param>
public static string RemoveExcept(string value, bool alphas = false, bool numerics = false, bool dashes = false, bool underlines = false, bool spaces = false, bool periods = false) {
if (string.IsNullOrWhiteSpace(value)) return value;
if (new[] { alphas, numerics, dashes, underlines, spaces, periods }.All(x => x == false)) return value;
var whitelistChars = new HashSet<char>(string.Concat(
alphas ? "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" : "",
numerics ? "0123456789" : "",
dashes ? "-" : "",
underlines ? "_" : "",
periods ? "." : "",
spaces ? " " : ""
).ToCharArray());
var scrubbedValue = value.Aggregate(new StringBuilder(), (sb, @char) => {
if (whitelistChars.Contains(@char)) sb.Append(@char);
return sb;
}).ToString();
return scrubbedValue;
}
public static class StringEx
{
public static string RemoveCharacters(this string s, params char[] unwantedCharacters)
=> s == null ? null : string.Join(string.Empty, s.Split(unwantedCharacters));
}
使用方法
var name = "edward woodward!";
var removeDs = name.RemoveCharacters('d', '!');
Assert.Equal("ewar woowar", removeDs); // old joke
var forbiddenChars = @"@,.;'".ToCharArray();
var dirty = "My name @is ,Wan.;'; Wan";
var clean = new string(dirty.Where(c => !forbiddenChars.Contains(c)).ToArray());