仅包含字符串的行合并到上一行

3

输入为:

<p>1:4 And David said unto him, How went the matter? I pray thee, tell me.</p>

<p>And he answered, That the people are fled from the battle, and many of the people also are fallen and dead; and Saul and Jonathan his son are dead also.</p>

第一行包含数字(1:4),第二行只包含字符串。

我想要在 <p> 标签中仅找到字符串,并将该内容合并到先前的 html 文件中的 <p> 标签中。

意思是:

1:4 And David said unto him, How went the matter? I pray thee, tell me. And he answered, That the people are fled from the battle, and many of the people also are fallen and dead; and Saul and Jonathan his son are dead also.

我可以这样做吗:
Regex.IsMatch(html, @"^[a-zA-Z]+$");

我该怎么做?

你是说你想合并诗歌中的人类段落,以便每个HTML“段落”都包含整个诗句,从圣经参考开始? - azhrei
1个回答

0

看起来我明白你想要实现的目标:

StringBuilder sb = new StringBuilder();
foreach (string line in input.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries))
{
    sb.Append(line.Trim());

    // notice different regex, i.e.:
    // new paragraph stars with `<p>x:y` and ends with `</p>`

    if (!Regex.IsMatch(line, @"^\<p\>[0-9]\:[0-9].+\</p\>$"))
    {
         sb.AppendLine(); // insert line break
    }
}
string result = sb.ToString();

对我有效,查看沙盒:onetwo


它对我不起作用,请帮我只获取文本<p>标签而非数字..意思是只有<p>他回答说,人民已经从战斗中逃离,许多人也倒下死亡;扫罗和他的儿子约拿单也死了。</p> - TinKerBell
@TinKerBell:line = line.Remove("<p>").Remove("</p>") 将会从文本中的任何位置删除两个标签。 - abatishchev

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接