用文本替换HTML链接

Question

用文本替换HTML链接

4

如何在HTML（Python）中用锚点替换链接？

例如输入：

 <p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>

我想要结果带有保留的p标签（只是去掉一个标签）：

<p>
Hello link text1 and link text2 ! 
</p>

- Evg

我不知道答案，但我猜测它涉及到BeautifulSoup :-) - mgilson

@mgilson，一个简单的正则表达式能解决非嵌套锚定符的情况吗？ - Maciej Gol

https://dev59.com/4U3Sa4cB1Zd3GeqPrQHf - mccakici

3个回答

3

看起来非常适合使用BeautifulSoup的unwrap()方法：

from bs4 import BeautifulSoup
data = '''<p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>'''
soup = BeautifulSoup(data)
p_tag = soup.find('p')
for _ in p_tag.find_all('a'):
    p_tag.a.unwrap()
print p_tag

这将产生：

<p> Hello link text1 and link text2 ! </p>

- shaktimaan

0

你可以使用解析库，比如BeautifulSoup和其他一些。我不能确定，但你可能会在这里找到一些有用的东西。

- Nitin

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- miindlek · Accepted Answer

您可以使用简单的正则表达式和sub函数来完成这个操作：

import re

text = '<p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>'
pattern =r'<(a|/a).*?>'

result = re.sub(pattern , "", text)

print result
'<p> Hello link text1 and link text2 ! </p>'

这段代码将所有出现的<a..>和</a>标签替换为空字符串。