BeautifulSoup：提取img标签的alt属性数据

Question

BeautifulSoup：提取img标签的alt属性数据

4

我有以下的图像html，我正在尝试解析alt属性中的信息。目前我已经成功地提取了图像。

原始html（我目前解析的内容）：

<img class="rslp-p" alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver" src="http://i.ebayimg.com/00/$(KGrHqZ,!j!E5dyh0jTpBO(3yE7Wg!~~_26.JPG?set_id=89040003C1" itemprop="image" />

我将从解析的内容构建图像名称：

当前代码：

def main(url, output_folder="~/images"):
         """Download the images at url"""
         soup = bs(urlopen(url))
         parsed = list(urlparse.urlparse(url))
         count = 0
         for image in soup.findAll("img"):
             print image
             count += 1
             print count
             print "Image: %(src)s" % image
             image_url = urlparse.urljoin(url, image['src'])
             filename = image["src"].split("/")[-1].split("?")[0].replace("$",'').replace(".JPG",".jpg").replace("~~_26",str(count)).lstrip("(")
             parsed[2] = image["src"]
             outpath = os.path.join(output_folder, filename)
             urlretrieve(image_url, outpath)

What I would like to do is extract is

alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver"

我希望在提取图片时，能够使用alt数据作为文件名。

- add-semi-colons

2

你正在使用 image['src'] 来获取源。难道你不能只使用 image['alt'] 来获取 alt 吗？或者是我误解了你的问题？ - BrtH

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gonzalo · Accepted Answer

11

在您的for循环内，您可以通过简单地执行以下操作来获得：

image.get('alt', '')

这在BeautifulSoup的文档中有解释（“Tags的属性”）。

- Gonzalo

2

关键错误意味着特定的img标签没有alt属性。您确定页面上的每个图像都与alt文本相关联吗？ - larissa

修改后的答案，应该适用于 @anyaMairead 提到的情况。 - Gonzalo

实际上有些没有，我正在尽力避免那些没有的。 - add-semi-colons

@GonzaloDelgado 谢谢，我该如何将alt信息添加为文件名？ - add-semi-colons

根据您想要文件名看起来像什么，您可以将其混合到示例代码的文件名结构中，尽管在那里有很大的改进空间，我建议您在Code Reviews（http://codereview.stackexchange.com/）询问相关问题。 - Gonzalo