我正在准备一些Whatsapp聊天记录,以便生成统计数据和词云。然而,我的数据中不时会出现双换行符的痕迹,这会影响日志的格式,我想知道如何自动修复这个问题。
13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats
well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
搜索并删除空行(简单修复)。但是仍有破坏日期和时间格式的行:
13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats
well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
目标格式:
13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
也许解决方案在于利用这个规则:我需要保留的换行符遵循以下模式:
TEXT *linebreak*
NUMBER(begging of date column)
讨厌的人们跟随着这个模式:
TEXT *linebreak*
TEXT
我怎样才能使用Notepad++修复它?