我有一些文本片段,想要将它们分成几行。问题是这些文本已经被格式化了,所以我不能像通常那样进行分割,比如这样:
_text = text.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries)
.ToArray();
这是样本文字:
adj 1: around the middle of a scale of evaluation of physical
measures; "an orange of average size"; "intermediate
capacity"; "a plane with intermediate range"; "medium
bombers" [syn: {average}, {intermediate}]
2: (of meat) cooked until there is just a little pink meat
inside
n 1: a means or instrumentality for storing or communicating
information
2: the surrounding environment; "fish require an aqueous
medium"
3: an intervening substance through which signals can travel as
a means for communication
4: (bacteriology) a nutrient substance (solid or liquid) that
is used to cultivate micro-organisms [syn: {culture medium}]
5: an intervening substance through which something is
achieved; "the dissolving medium is called a solvent"
6: a liquid with which pigment is mixed by a painter
7: (biology) a substance in which specimens are preserved or
displayed
8: a state that is intermediate between extremes; a middle
position; "a happy medium"
格式总是相同的:
- 可能会出现1-3个字母
- 数字1-10
- 冒号
- 空格
- 可能分布在多行上的文本。
有人能给我一些建议吗?我该如何使用split或其他方法来实现这一点?
更新:Steven的答案,但不确定如何将其适配到我的函数中。这里我展示了我的原始代码以及Steven提供的建议,但有一部分我不确定。
public parser(string text)
{
//_text = text.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries)
// .ToArray();
string pattern = @"(\w{1,3} )?1?\d: (?<line>[^\r\n]+)(\r?\n\s+(?<line>[^\r\n]+))*";
foreach (Match m in Regex.Matches(text, pattern))
{
if (m.Success)
{
string entry = string.Join(Environment.NewLine,
m.Groups["line"].Captures.Cast<Capture>().Select(x => x.Value));
// ...
}
}
}
为了测试目的,这里以不同的格式提供文本:
“medium adj 1: 在物理测量评估范围中间的位置;“一个大小适中的橙子”;“中等能力”;“中程飞机”;“中型轰炸机” [同义词:{average},{intermediate}] 2:(肉类)煮到咬一口有点粉色的肉味儿 n 1: 用于存储或传达信息的手段或工具 2: 环境;“鱼需要一种水介质” 3: 作为通信媒介的介质 4:(细菌学)培养微生物所使用的营养物质(固体或液体)[同义词:{culture medium}] 5: 完成某事的介质;“溶解介质称为溶剂” 6: 艺术家混合颜料的液体 7:(生物学)标本保存或展示的物质 8: 处于极端之间的状态;中间位置;“一个幸福的平衡状态” 9: 在生与死之间充当调解人的人;“他咨询了几位灵媒” [同义词:{spiritualist}] 10: 广泛传播给公众的传输[同义词:{mass medium}] 11: 你特别适合的职业;“在法律方面,他找到了自己真正的天职” [同义词:{metier}] [也作:{media}(复数)]”