我有一个文件,是从Microsoft Lync对话中提取具有RTF格式标记的值而得到的。例如,一个文件可能如下所示:
任何建议都将不胜感激!
补充: user1@capital.com @ 2013-01-18 17:48:03Z(TO:user2@capital.com)
{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 行得通。\embo0 如何嵌入图片?\f1\par {*\lyncflags rtf=1}} user1@capital.com @ 2013-01-18 17:48:57Z(TO:user2@capital.com)
{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 我明白了。\embo0\f1\par {*\lyncflags rtf=1}} user1@capital.com @ 2013-01-18 17:49:27Z(TO:user2@capital.com)
{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 让我们试一次会议。\embo0\f1\par {*\lyncflags rtf=1}}
使用Lua脚本,我正在尝试删除RTF标记并仅提取对话文本。因此,我的函数的结果应该是:{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0>Segoe UI;} {\f1\fnil Segoe UI;}} {\colortbl;\red0\green0\blue0;} {*\ generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440} \viewkind4\uc1 \pard\cf1\embo\f0\fs20 Craig...\embo0 \embo please\embo0 \embo close\embo0 \embo>out\embo0 \embo of\embo0 \embo your\embo0 \embo old\embo0 \embo client\embo0 \embo>and\embo0 \embo re-open\embo0\f1\par {*\lyncflags rtf = 1}}
我尝试过使用string.gsub和正则表达式来匹配模式并用空格替换它们以仅保留文本,但它并没有起作用。以下是我目前所拥有的用于string.gsub的代码:Craig...请关闭您的旧客户端并重新打开
result = string.gsub(s, "\{\*?\\[^{}]+}|[{}]|\\\n?[A-Za-z]+\n?(?:-?\d+)?[ ]?", " ")
任何建议都将不胜感激!
补充: user1@capital.com @ 2013-01-18 17:48:03Z(TO:user2@capital.com)
{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 行得通。\embo0 如何嵌入图片?\f1\par {*\lyncflags rtf=1}} user1@capital.com @ 2013-01-18 17:48:57Z(TO:user2@capital.com)
{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 我明白了。\embo0\f1\par {*\lyncflags rtf=1}} user1@capital.com @ 2013-01-18 17:49:27Z(TO:user2@capital.com)
{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 让我们试一次会议。\embo0\f1\par {*\lyncflags rtf=1}}