如何在安卓设备上将HTML文本转换为纯文本?

17
我需要将以字符串形式存在的HTML文本转换为普通文本。
String mHtmlString = "<p class="MsoNormal" style="margin-bottom:10.5pt;text-align:justify;line-height: 10.5pt"><b><span style="font-size: 8.5pt; font-family: Arial, sans-serif;">Lorem Ipsum</span></b><span style="font-size: 8.5pt; font-family: Arial, sans-serif;"> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.<o:p></o:p></span></p> <p class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: 8.5pt; font-family: Arial, sans-serif;"> </span><span style="font-family: Arial, sans-serif; font-size: 8.5pt; line-height: 10.5pt; text-align: justify;">Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.</span></p>"

我迄今为止所做的事情:

TextView textView = (TextView) findViewById(R.id.textView);
String plainText = Html.fromHtml(mHtmlString).toString()
textView.setText(plainText);

很不幸,无法处理嵌套的HTML。

非常感谢任何帮助。


Desc.setText(desc);... 嗯,我想你不是这个意思。 - hichris123
你遇到了什么问题? - PankajAndroid
@PankajAndroid 我无法将HTML文本转换为纯文本。但我找到了解决方案,请查看我的答案。 - Hiren Patel
我有一个来自API的HTML字符串。我想将其转换为简单的字符串。Html.fromHtml().toString已经被弃用,而且也不再显示标签。 - jlively
4个回答

51

我正在给出我的答案。

String mHtmlString = "<p class="MsoNormal" style="margin-bottom:10.5pt;text-align:justify;line-height: 10.5pt"><b><span style="font-size: 8.5pt; font-family: Arial, sans-serif;">Lorem Ipsum</span></b><span style="font-size: 8.5pt; font-family: Arial, sans-serif;"> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.<o:p></o:p></span></p> <p class="MsoNormal" style="margin-bottom: 0.0001pt;"><span style="font-size: 8.5pt; font-family: Arial, sans-serif;"> </span><span style="font-family: Arial, sans-serif; font-size: 8.5pt; line-height: 10.5pt; text-align: justify;">Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.</span></p>";

TextView 上设置 Html 文本字符串:

TextView textView = (TextView) findViewById(R.id.textView);
textView.setText(Html.fromHtml(Html.fromHtml(mHtmlString).toString()));

希望这能帮到你。


1
你为什么嵌套使用Html.fromHtml()? - Taylor Kline
1
@TaylorKline,实际上是有嵌套的HTML标签。 - Hiren Patel
使用单个标签,我得到了:"<div><div><div><div><div><div></div></div></div></div></div></div>",而使用嵌套标签,我得到了:".............",因此我们需要嵌套它。 - Lalit Jadav

8
如果你想要从html中删除html标签,那么使用Jsoup(http://jsoup.org)。
 String textFromHtml = Jsoup.parse(MY_HTML_STRING_HERE).text();
 TextView desc = (TextView) dialog.findViewById(R.id.description);
 desc.setText(textFromHtml);

2
如果您不介意您的应用程序增加半兆字节,请继续。 - Anarchofascist
这比Html.fromHtml在处理更复杂的Html时要好得多。 - live-love

6
这对我有效:

    Spanned spanned = Html.fromHtml(textWithMarkup);
    char[] chars = new char[spanned.length()];
    TextUtils.getChars(spanned, 0, spanned.length(), chars, 0);
    String plainText = new String(chars);

我使用简单的标签,例如<b>和<i>。没有测试过更复杂的HTML。


非常好的高效解决方案,无需使用任何库。 - user3471194
绝妙的解决方案......非常感谢 - Mohd Sakib Syed

3
没有人提到了一个非常棒的Kotlin扩展函数来完成这个任务:
只需像这样使用它:
"yourHtmlString".parseAsHtml()

for more info:

parseAsHtml()


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接