获取字符串的前140个字符,包括特殊情况

3
我有一个字符串,长度限制为140个字符。通常,在我的代码中会超过140个字符。字符串是以Mxxxx格式的一组值,其中x可以是任何数字,并且它没有严格的长度限制。因此,我既可以有M1,也可以有M281。
如果字符串超过了140个字符,我想要取前面的140个字符,但如果最后一个字符被截断了一半,我不想在我的字符串中包含它。
尽管如此,我仍然需要将第二部分保存在某个本地变量中。
例如,假设这是该字符串。
"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619"

假设这是前140个字符:

"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M69"

上一个值是M6919,但它被分成了M6919

最有效的方法是:如果字符串长度超过140,则进行拆分;但如果新字符串中的最后一个值被拆分为两个部分,则将它从第一个字符串中移除,并将其与原始字符串剩余部分一起放入其他字符串值中。

可能有许多实现方式。我可以使用if或switch/case循环,如果第二个字符串的第一个字母不是'M',则知道该值已拆分,应该将其从第一个字符串中移除,但是否有比这更简洁的解决方案?

private static string CreateSettlmentStringsForUnstructuredField(string settlementsString)
{
    string returnSettlementsString = settlementsString.Replace(", ", " ");

    if (returnSettlementsString.Length > 140)
    {
        returnSettlementsString.Substring(0, 140);
        /*if returnSettlementsString was spitted in two in a way 
          that last value was broken in two parts, take that value 
          out of returnSettlementStrings and put it in some new 
          string value with the other half of the string.*/
    }
    return returnSettlementsString;
} 

为什么你不使用字符串列表? - Dmitriy Kovalenko
@DmitriyKovalenko,我得到了这样的字符串。我可以将其转换为字符串列表,但是然后我还需要将其转换回字符串,并且仍然需要执行检查长度等所有功能。因此,我觉得可能会使事情变得更加复杂。 - nemo_87
除了最后一个 Mx 之外,每个 Mx 后面都有一个逗号吗? - Hamid Pourjam
8个回答

2

可能会有类似这样的解决方案:

string result;
if (input.Length > 140)
{
    result = new string(input.Take(140).ToArray());
    if (input[140] != ',') // will ensure that we don´t omit the last complete word if the 140eth character is a comma
        result = result.Substring(0, result.LastIndexOf(','));
} 
else result = input;

如果总长度大于140个字符,它只需取前面的140个字符。然后搜索最后一个逗号的索引,并取出该逗号之前所有的字符。


1
你最好的选择是将字符串拆分成“单词”,然后使用字符串构建器重新组合它们。未经测试的原始代码将如下所示;
public IEnumerable<string> SplitSettlementStrings(string settlementsString) 
{
    var sb = new StringBuilder();
    foreach(var word in WordsFrom(settlementsString))
    {
        var extraFragment = $"{word}, ";
        if (sb.Length + extraFragment < 140) {
        sb.Append(extraFragment);
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        yield return sb.ToString();
        sb = new StringBuilder();
    }

    if (sb.Length > 0) {
        // we may have content left in the string builder
        yield return sb.ToString();
    }
}

你需要使用类似这样的方法来拆分单词;
 public IEnumerable<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    return settlementsString.split(',').Select(x => x.Trim()).Where(x => x.Length > 0);
 }

“你会这样使用整个代码;”
 var settlementStringsIn140CharLenghts = SplitSettlementStrings("M234, M456, M452 ...").ToArray()

编辑

旧版 .net 版本如下:

public ICollection<string> SplitSettlementStrings(string settlementsString) 
{
    List<string> results = new List<string>();
    StringBuilder sb = new StringBuilder();
    foreach(string word in WordsFrom(settlementsString))
    {
        string extraFragment = word + ", ";
        if (sb.Length + extraFragment < 140) {
           sb.Append(extraFragment);
        }
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        results.Add(sb.ToString());
        sb = new StringBuilder();
    }

    if (sb.Length > 0) {
        // we may have content left in the string builder
        resuls.Add(sb.ToString());
    }
}

 public ICollection<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    string[] fragments = settlementsString.split(',');
    List<string> result = new List<string>();
    foreach(string fragment in fragments) 
    {
        var candidate = fragment.Trim();
        if (candidate.Length > 0) 
        {
            result.Add(candidate);
        }
    } 
    return result;
 }

我认为“words”是C# 6.0的一个特性,这是一个非常古老的版本。可能是3.5,但我喜欢这个想法 :) @SteveCooper - nemo_87
啊。那么你可以通过返回一个ICollection<string>来进行排序,然后只需在List<string>中收集结果。"{word}, "部分可以用word + ", "替换。 - Steve Cooper
老派版本已发布 ;) - Steve Cooper

0
我使用这个:
static string FirstN(string s, int n = 140)
{
    if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
    while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
    return s.Substring(0, n);
}

工作测试样例代码(带有注释输出):

using System;
namespace ConsoleApplication1
{
    class Program
    {
        static string FirstN(string s, int n = 140)
        {
            if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
            while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
            return s.Substring(0, n);
        }
        static void Main(string[] args)
        {
            var s = FirstN("M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619");

            Console.WriteLine(s.Length); // 136
            Console.WriteLine(s);  //M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169,
        }
    }
}

希望这能有所帮助。


0

类似这样的代码应该可以运行:

string test = "M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619";

if (test.Length > 140)
    if (test[140] != ',' && test[140] != ' ') // Last entry was split?
        test = test.Substring(0, test.LastIndexOf(',', 139)); // Take up to but not including the last ','
    else
        test = test.Substring(0, 139);

Console.WriteLine(test);

2
test = test.Substring(0, test.LastIndexOf(',')) 将提取直到最后一个逗号之前的所有字符,而不是直到第140-x个字符。 - MakePeaceGreatAgain
@HimBromBeere 噢,我忘记在 LastIndexOf() 中加入 , 139 参数了。已修复。 - Matthew Watson

0

仅供娱乐,以下是我的看法:

var ssplit = theString.Replace(", ", "#").Split('#');       
var sb = new StringBuilder();
for(int i = 0; i < ssplit.Length; i++)
{
    if(sb.Length + ssplit[i].Length > 138) // 140 minus the ", "
        break;
    if(sb.Length > 0) sb.Append(", ");
    sb.Append(ssplit[i]);
}

在这里,我将字符串分割成Mxxx部分。然后我遍历这些部分,直到下一个部分溢出140(或138,因为它需要包括", "分隔符在计数中)

查看实际效果


我的意思是我只是为了好玩而回答的,作为一个有趣的挑战 :-) - Jcl

0
如果您不想将字符串拆分为列表,可以尝试以下方法:
string myString = "M19, M42........";
string result;
int index = 141;

do
{
    //Decrement index to reduce the substring size
    index--;

    //Make the result the new length substring
    result = myString.Substring(0, index);

}while (myString[index] != ','); //Check if our result contains a comma as the next char to check if we're at the end of an entry

所以你基本上只是将原始字符串子串到140,检查位置141处的字符是否为逗号,表示“干净”切割。如果不是,则会在139处进行子串处理,检查140是否为逗号,以此类推。


0

这里有一个解决方案。它从第141个字符开始向后处理字符串。

public static string Normalize(string input, int length)
{
    var terminators = new[] { ',', ' ' };
    if (input.Length <= length + 1)
        return input;

    int i = length + 1;
    while (!terminators.Contains(input[i]) && i > 0)
        i = i - 1;

    return input.Substring(0, i).TrimEnd(' ', ',');
}

Normalize(settlementsString, 140);

0

由于为新字符串进行持续的内存分配,这可能不是最敏感于性能的解决方案,但它听起来像是某种一次性原始数据输入。我们有一个选项,在输入超过140个字符时仅删除“标记”:

const string separator = ", ";

while (input.Length > 140)
{
     int delStartIndex = input.LastIndexOf(separator);
     int delLength = input.Length - delStartIndex;

     input = input.Remove(delStartIndex, delLength);
}

一个更注重性能的方法是创建一个形式为IEnumerable<string>string[]的子字符串,并在将它们连接之前计算它们的总长度。大致如下所示:
const string separator = ", ";
var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

var length = splitInput[0].Length;
var targetIndex = 1;

for (targetIndex = 1; length <= 140; targetIndex++)
    length += separator.Length + splitInput[targetIndex].Length;

if (length > 140)
    targetIndex--;

var splitOutput = new string[targetIndex];
Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);

var output = string.Join(separator, splitOutput);

我们甚至可以像这样创建一个漂亮的扩展方法:
public static class StringUtils
{
    public static string TrimToLength(this string input, string separator, int targetLength)
    {
        var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

        var length = splitInput[0].Length;
        var targetIndex = 1;

        for (targetIndex = 1; length <= targetLength; targetIndex++)
            length += separator.Length + splitInput[targetIndex].Length;

        if (length > targetLength)
            targetIndex--;

        var splitOutput = new string[targetIndex];
        Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);

        return string.Join(separator, splitOutput);
    }
}

并像这样调用:

input.TrimToLength(", ", 140);

或者:

input.TrimToLength(separator: ", ", targetLength:140);

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接