我正在使用Beautiful Soup和Requests加载网站的HTML(例如https://en.wikipedia.org/wiki/Elephant)。我想模仿这个页面,但我想给“p”标签(段落)中的句子上色。
为此,我正在使用spacy将文本分成句子。我选择一种颜色(对于那些感兴趣的人,这是基于二进制深度学习分类器的概率颜色)。
这并不会出现任何错误,但是当我渲染HTML文件时,HTML文件显示出来没有标记。
我正在使用Jupyter快速显示HTML文件:
为此,我正在使用spacy将文本分成句子。我选择一种颜色(对于那些感兴趣的人,这是基于二进制深度学习分类器的概率颜色)。
def get_colorized_p(p):
doc = nlp(p.text) # p is the beautiful soup p tag
string = '<p>'
for sentence in doc.sents:
# The prediction value in anything within 0 to 1.
prediction = classify(sentence.text, model=model, pred_values=True)[1][1].numpy()
# I am using a custom function to map the prediction to a hex colour.
color = get_hexcolor(prediction)
string += f'<mark style="background: {color};">{sentence.text} </mark> '
string += '</p>'
return string # I create a new long string with the markup
我用HTML标记创建了一个新的长字符串,其中包含p标记。现在我想替换beautiful soup对象中的“旧”元素。我通过简单的循环来实现这个目标:
for element in tqdm_notebook(soup.findAll()):
if element.name == 'p':
if len(element.text.split()) > 2:
element = get_colorized_p(element)
这并不会出现任何错误,但是当我渲染HTML文件时,HTML文件显示出来没有标记。
我正在使用Jupyter快速显示HTML文件:
from IPython.display import display, HTML
display(HTML(html_file))
然而这并不起作用。我通过get_colorized_p
验证了返回的字符串。当我将其用于单个p元素并渲染时,它可以正常工作。但是我想将该字符串插入到beautiful soup对象中。
希望有人能够解决这个问题。在循环内替换元素时出现问题。但是,我不知道如何修复它。
以下是已呈现字符串示例的示例:
<p><mark style="background: #edf8fb;">Elephants are the largest existing land animals.</mark><mark style="background: #f1fafc;">Three living species are currently recognised: the African bush elephant, the African forest elephant, and the Asian elephant.</mark><mark style="background: #f3fafc;">They are an informal grouping within the proboscidean family Elephantidae.</mark><mark style="background: #f3fafc;">Elephantidae is the only surviving family of proboscideans; extinct members include the mastodons.</mark><mark style="background: #eff9fb;">Elephantidae also contains several extinct groups, including the mammoths and straight-tusked elephants.</mark><mark style="background: #68c3a6;">African elephants have larger ears and concave backs, whereas Asian elephants have smaller ears, and convex or level backs.</mark><mark style="background: #56ba91;">The distinctive features of all elephants include a long proboscis called a trunk, tusks, large ear flaps, massive legs, and tough but sensitive skin.</mark><mark style="background: #d4efec;">The trunk is used for breathing, bringing food and water to the mouth, and grasping objects.</mark><mark style="background: #e7f6f9;">Tusks, which are derived from the incisor teeth, serve both as weapons and as tools for moving objects and digging.</mark><mark style="background: #d9f1f0;">The large ear flaps assist in maintaining a constant body temperature as well as in communication.</mark><mark style="background: #e5f5f9;">The pillar-like legs carry their great weight.</mark><mark style="background: #72c7ad;"> </mark></p>