将字符串转换为标题格式

5

我需要将以下内容转换为标题大小写:

  1. 短语中的第一个单词;

  2. 在同一短语中,长度大于 minLength 的其他单词。

我看了ToTitleCase,但结果不如预期。

因此,当 minLength = 2 时,短语 "the car is very fast" 将变成 "The Car is Very Fast"。

我已经能够使用以下方法使第一个单词大写:

Char[] letters = source.ToCharArray();
letters[0] = Char.ToUpper(letters[0]);

而要获得我使用的单词:

Regex.Matches(source, @"\b(\w|['-])+\b"

但我不确定如何将所有这些放在一起。

谢谢, 米格尔


我刚刚更新了我的问题以展示我尝试过的内容。 - Miguel Moura
如果minLength为2,为什么“is”太短而无法转换为标题大小写? - Tim Schmelter
@TimSchmelter 因为 OP 已经说明了“长度大于 minLength”。 - Ulugbek Umirov
2
@UlugbekUmirov:所以 minLength 实际上不够 min 了 ;) - Tim Schmelter
可能是将字符串转换为标题案例的重复问题。 - StayOnTarget
显示剩余2条评论
4个回答

6

示例代码:

string input = "i have the car which is very fast";
int minLength = 2;
string regexPattern = string.Format(@"^\w|\b\w(?=\w{{{0}}})", minLength);
string output = Regex.Replace(input, regexPattern, m => m.Value.ToUpperInvariant());

更新(针对在单个字符串中有多个句子的情况)。

string input = "i have the car which is very fast. me is slow.";
int minLength = 2;
string regexPattern = string.Format(@"(?<=(^|\.)\s*)\w|\b\w(?=\w{{{0}}})", minLength);
string output = Regex.Replace(input, regexPattern, m => m.Value.ToUpperInvariant());

输出:

I Have The Car Which is Very Fast. Me is Slow.

您可能希望处理“!”,“?”和其他符号,那么您可以使用以下方法。您可以添加任意数量的句子终止符号。
string input = "i have the car which is very fast! me is slow.";
int minLength = 2;
string regexPattern = string.Format(@"(?<=(^|[.!?])\s*)\w|\b\w(?=\w{{{0}}})", minLength);
string output = Regex.Replace(input, regexPattern, m => m.Value.ToUpperInvariant());

更新 (2) - 将e-marketing转换为E-Marketing(将-视为有效的单词符号):

string input = "i have the car which is very fast! me is slow. it is very nice to learn e-marketing these days.";
int minLength = 2;
string regexPattern = string.Format(@"(?<=(^|[.!?])\s*)\w|\b\w(?=[-\w]{{{0}}})", minLength);
string output = Regex.Replace(input, regexPattern, m => m.Value.ToUpperInvariant());

如何将您的示例扩展到长度大于2的所有单词以及第一个单词和点后面的单词,即使它们的长度小于2? - Miguel Moura
我进行了测试,除了一个例外,它似乎可以正常工作。如果我有一个像“e-marketing”这样的单词,我希望它变成“E-Marketing”,但此时我得到的是“e-Marketing”。您的示例能否解决这个问题?谢谢。 - Miguel Moura
@MDMoura 添加了“-”的处理。 - Ulugbek Umirov
@MDMoura,正确的标题应该是“电子营销”(类似于“电子商务”)吗? - user2819245
1
@elgonzo 没有严格的规定:http://laurenedwardssv.blogspot.com.tr/2010/03/e-commerce-e-commerce-or-not-so-easy-to.html - Ulugbek Umirov
@UlugbekUmirov 谢谢你的帮助...我刚刚将其标记为答案。 - Miguel Moura

1

英文标题大小写非常复杂,而且无法计算。这是事实。

你能得到的最好结果是一个程序,根据你喜欢的单词列表更改所有小写单词。但对于所有口语表达来说,这仍然是错误的。虽然扩展的变体列表可以捕捉其中许多,但有些仍然无法在没有语义分析的情况下确定。以下是两个例子:

  • Running on/On Empty
  • Working on/On a Building

后者确实可以从上下文中清楚地理解;前者则不行。虽然有明显的意义差异,但计算机无法决定哪一个是正确的。

(有时甚至人类也无法。我在 StackExchnge 论坛上询问了第一个例子,但没有得到可接受的答案。)

这里是我喜欢的替换列表;但一些四字词汇(无恶意)是个人选择。此外,有人可能会认为所有类型的数字,如anyallfew都应该大写。

这并不算优雅,事实上有点尴尬。但它对我来说很有效,所以我经常使用它,并且已经用它处理了超过10万个标题..。
public string ETC(string title)
{  // english title capitalization
    if (title == null) return "";

    string s = title.Trim().Replace('`', '\'');      // change apo to tick mark

    TextInfo UsaTextInfo = new CultureInfo("en-US", false).TextInfo;
    s = UsaTextInfo.ToTitleCase(s);              // caps for all words

    // a list of exceptions one way or the other..
    s = s.Replace(" A ", " a ");
    s = s.Replace(" also ", " Also ");
    s = s.Replace(" An ", " an ");
    s = s.Replace(" And ", " and ");
    s = s.Replace(" as ", " As ");
    s = s.Replace(" At ", " at ");
    s = s.Replace(" be ", " Be ");
    s = s.Replace(" But ", " But ");
    s = s.Replace(" By ", " by ");
    s = s.Replace(" For ", " for ");
    s = s.Replace(" From ", " from ");
    s = s.Replace(" if ", " If ");
    s = s.Replace(" In ", " in ");
    s = s.Replace(" Into ", " into ");
    s = s.Replace(" he ", " He ");
    s = s.Replace(" has ", " Has ");
    s = s.Replace(" had ", " Had ");
    s = s.Replace(" is ", " Is ");
    s = s.Replace(" my ", " My ");
    s = s.Replace("   ", "  ");                // no triple spaces
    s = s.Replace("'N'", "'n'");          // Rock 'n' Roll
    s = s.Replace("'N'", "'n'");         // Rock 'n Roll
    s = s.Replace(" no ", " No ");
    s = s.Replace(" Nor ", " nor ");
    s = s.Replace(" Not ", " not ");
    s = s.Replace(" Of ", " of ");
    s = s.Replace(" Off ", " off ");
    s = s.Replace(" On ", " on ");
    s = s.Replace(" Onto ", " onto ");
    s = s.Replace(" Or ", " or ");
    s = s.Replace(" O'c ", " O'C ");
    s = s.Replace(" Over ", " over ");
    s = s.Replace(" so ", " So ");
    s = s.Replace(" To ", " to ");
    s = s.Replace(" that ", " That ");
    s = s.Replace(" this ", " This ");
    s = s.Replace(" thus ", " Thus ");
    s = s.Replace(" The ", " the ");
    s = s.Replace(" Too ", " too ");
    s = s.Replace(" when ", " When ");
    s = s.Replace(" With ", " with ");
    s = s.Replace(" Up ", " up ");
    s = s.Replace(" Yet ", " yet ");
    // a few(!) verbal expressions
    s = s.Replace(" Get up ", " Get Up ");
    s = s.Replace(" Give up ", " Give Up ");
    s = s.Replace(" Givin' up ", " Givin' Up ");
    s = s.Replace(" Grow up ", " Grow Up ");
    s = s.Replace(" Hung up ", " Hung Up ");
    s = s.Replace(" Make up ", " Make Up ");
    s = s.Replace(" Wake Me up ", " Wake Me Up ");
    s = s.Replace(" Mixed up ", " Mixed Up ");
    s = s.Replace(" Shut up ", " Shut Up ");
    s = s.Replace(" Stand up ", " Stand Up ");            
    s = s.Replace(" Wind up ", " Wind Up ");
    s = s.Replace(" Wake up ", " Wake Up ");
    s = s.Replace(" Come up ", " Come Up ");
    s = s.Replace(" Working on ", " Working On ");
    s = s.Replace(" Waiting on ", " Waiting On ");
    s = s.Replace(" Turn on ", " Turn On ");
    s = s.Replace(" Move on ", " Move On ");
    s = s.Replace(" Keep on ", " Keep On ");
    s = s.Replace(" Bring It on ", " Bring It On ");
    s = s.Replace(" Hold on ", " Hold On ");
    s = s.Replace(" Hang on ", " Hang On ");
    s = s.Replace(" Go on ", " Go On ");
    s = s.Replace(" Coming on ", " Coming On ");
    s = s.Replace(" Come on ", " Come On ");
    s = s.Replace(" Call on ", " Call On ");
    s = s.Replace(" Trust in ", " Trust In ");
    s = s.Replace(" Fell in ", " Fell In ");
    s = s.Replace(" Falling in ", " Falling In ");
    s = s.Replace(" Fall in ", " Fall In ");
    s = s.Replace(" Faith in ", " Faith In ");
    s = s.Replace(" Come in ", " Come In ");
    s = s.Replace(" Believe in ", " Believe In ");



    return s.Trim();
}

请注意,仍有许多规则不能这样实施。
一些基本规则并不难:大写第一个和最后一个单词。所有动词(Is),形容词(Red),代词(He),名词(Ace)和数字(One),即使它们少于3(或4)个字母。
但是例外很难,例如:当介词是动词表达式的一部分时,不要大写介词...
示例1:“Working on/On a Building”-您必须知道这是一首福音歌曲才能决定它是“On”。
示例2:“Running On/on Empty”。可能意味着“Running On”或者“Running (with gas indictor) 'on Empty'”。
因此,最终您将不得不妥协。

0

一个不需要正则表达式的替代方案(并且比较幼稚)是使用String.Split方法和List.Select函数来映射复杂条件:

var text = @"i have the car which is very fast. me is slow.";
var length = 2;
var first = true; // first word in the sentence
var containsDot = false; // previous word contains a dot
var result = text
                .Split(' ')
                .ToList()
                .Select (p => 
                    {
                        if (first)
                        {
                            p = FirstCharToUpper(p);
                            first = false;
                        }
                        if (containsDot)
                        {
                            p = FirstCharToUpper(p);
                            containsDot = false;
                        }
                        containsDot = p.Contains(".");
                        if (p.Length > length)
                        {
                            return FirstCharToUpper(p);
                        }
                        return p;
                    })
                .Aggregate ((h, t) => h + " " + t);
Console.WriteLine(result);

输出结果为:

I Have The Car Which is Very Fast. Me is Slow.

FirstCharToUpper 方法来自于这个 Stack Overflow 帖子

public static string FirstCharToUpper(string input)
{
    if (String.IsNullOrEmpty(input))
        throw new ArgumentException("ARGH!");
    return input.First().ToString().ToUpper() + String.Join("", input.Skip(1));
}

这种解决方案的缺点是:条件越复杂,选择语句就越复杂/难以阅读,但它是正则表达式的替代方案。

0
这里有一种方法,它使用了一个 StringBuilder 和纯字符串方法,而不需要使用正则表达式,因此应该非常高效:
public static string ToTitleCase(string input, int minLength = 0)
{
    TextInfo ti = CultureInfo.CurrentCulture.TextInfo;
    string titleCaseDefault = ti.ToTitleCase(input);
    if (minLength == 0)
        return titleCaseDefault;
    StringBuilder sb = new StringBuilder(titleCaseDefault.Length);
    int wordCount = 0;
    char[] wordSeparatorChars = " \t\n.,;-:".ToCharArray();

    for (int i = 0; i < titleCaseDefault.Length; i++)
    {
        char c = titleCaseDefault[i];
        bool nonSpace = !char.IsWhiteSpace(c);
        if (nonSpace)
        {
            wordCount++;
            int firstSpace = titleCaseDefault.IndexOfAny(wordSeparatorChars, i);
            int endIndex = firstSpace >= 0 ? firstSpace : titleCaseDefault.Length;
            string word = titleCaseDefault.Substring(i, endIndex - i);
            if (wordCount == 1) // first word upper
                sb.Append(word);
            else
                sb.Append(word.Length < minLength ? word.ToLower() : ti.ToTitleCase(word));
            i = endIndex - 1;
        }
        else
            sb.Append(c);
    }
    return sb.ToString();
}

您的样本数据:

string text =  "the car is very fast";
string output = ToTitleCase(text, 3);

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接