Notepad++中的双换行符问题

3

我正在准备一些Whatsapp聊天记录,以便生成统计数据和词云。然而,我的数据中不时会出现双换行符的痕迹,这会影响日志的格式,我想知道如何自动修复这个问题。

13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats 

well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...

搜索并删除空行(简单修复)。但是仍有破坏日期和时间格式的行:

13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats 
well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...

目标格式:

13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...

也许解决方案在于利用这个规则:我需要保留的换行符遵循以下模式:
TEXT *linebreak* 
NUMBER(begging of date column)

讨厌的人们跟随着这个模式:
TEXT *linebreak*
TEXT

我怎样才能使用Notepad++修复它?
1个回答

1
在搜索和替换对话框中,您可以搜索此模式。
\r\n(?!\d)

启用正则表达式,并替换为空。

\r\n 搜索由 CR 和 LF 组成的换行符。在 Notepad++ 中启用控制字符的显示,以查看您有哪种换行符。

(?!\d)负向先行断言,当后面没有数字时为真。这适用于您的示例,但对于某些边角情况可能会失败,您可以将其扩展为模式,例如 (?!\d{2}\s) 当日期始终为两位数时。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接