如何使用Python保存已知URL地址的图像到本地？

Question

如何使用Python保存已知URL地址的图像到本地？

pythonweb-scraping

204

我知道一个互联网上的图片URL。例如，http://www.digimouth.com/news/media/2011/09/google-logo.jpg，其中包含Google的标志。

现在，我该如何使用Python下载这张图片而不用实际打开浏览器中的URL并手动保存文件？

- Pankaj Vatsa

1

可能是如何使用Python通过HTTP下载文件？的重复问题。 - Jaydev

18个回答

29

import urllib
resource = urllib.urlopen("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")
output = open("file01.jpg","wb")
output.write(resource.read())
output.close()

file01.jpg 将包含你的图像。

- Noufal Ibrahim

3

你应该以二进制模式打开文件：open("file01.jpg", "wb")，否则可能会损坏图像。 - Liquid_Fire

2

urllib.urlretrieve еҸҜд»ҘзӣҙжҺҘдҝқеӯҳеӣҫзүҮгҖӮ - heltonbiker

1

这是Python 2版本。也许你有更新的Python版本呢？ - Noufal Ibrahim

21

我写了一个脚本可以实现这个功能，你可以在我的Github上使用它。

我使用了BeautifulSoup来解析任何网站上的图片。如果你经常进行网络爬虫（或打算使用我的工具），建议你 sudo pip install BeautifulSoup。有关BeautifulSoup的信息请参见此处。

为方便起见，这是我的代码：

from bs4 import BeautifulSoup
from urllib2 import urlopen
import urllib

# use this image scraper from the location that 
#you want to save scraped images to

def make_soup(url):
    html = urlopen(url).read()
    return BeautifulSoup(html)

def get_images(url):
    soup = make_soup(url)
    #this makes a list of bs4 element tags
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + "images found.")
    print 'Downloading images to current working directory.'
    #compile our unicode list of image links
    image_links = [each.get('src') for each in images]
    for each in image_links:
        filename=each.split('/')[-1]
        urllib.urlretrieve(each, filename)
    return image_links

#a standard call looks like this
#get_images('http://www.wookmark.com')

- Yup.

19

可以使用requests完成此操作。加载页面并将二进制内容转储到文件中。

import os
import requests

url = 'https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg'
page = requests.get(url)

f_ext = os.path.splitext(url)[-1]
f_name = 'img{}'.format(f_ext)
with open(f_name, 'wb') as f:
    f.write(page.content)

- Alex

1

如果出现错误请求，请在请求中添加用户头信息 :) - 1UC1F3R616

此外，在写入文件之前，您可能希望检查 page.status_code == 200。 - idbrii

13

Python 3

urllib.request - 可扩展的用于打开 URL 的库

from urllib.error import HTTPError
from urllib.request import urlretrieve

try:
    urlretrieve(image_url, image_local_path)
except FileNotFoundError as err:
    print(err)   # something wrong with local path
except HTTPError as err:
    print(err)  # something wrong with url

- SergO

7

我编写了一个扩展Yup的脚本。我进行了修复。现在它将绕过403：Forbidden问题。当图像无法检索时，它不会崩溃。它试图避免损坏的预览。它获取正确的绝对URL。它提供更多信息。可以从命令行运行并带有参数。

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib2
import shutil
import requests
from urlparse import urljoin
import sys
import time

def make_soup(url):
    req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib2.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print 'Downloading images to current working directory.'
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print 'Getting: ' + filename
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print '  An error occured. Continuing.'
    print 'Done.'

if __name__ == '__main__':
    url = sys.argv[1]
    get_images(url)

- madprops

6

一个适用于Python 2和Python 3的解决方案：

try:
    from urllib.request import urlretrieve  # Python 3
except ImportError:
    from urllib import urlretrieve  # Python 2

url = "http://www.digimouth.com/news/media/2011/09/google-logo.jpg"
urlretrieve(url, "local-filename.jpg")

或者，如果接受使用requests并且它是一个http(s) URL的附加要求：

def load_requests(source_url, sink_path):
    """
    Load a file from an URL (e.g. http).

    Parameters
    ----------
    source_url : str
        Where to load the file from.
    sink_path : str
        Where the loaded file is stored.
    """
    import requests
    r = requests.get(source_url, stream=True)
    if r.status_code == 200:
        with open(sink_path, 'wb') as f:
            for chunk in r:
                f.write(chunk)

- Martin Thoma

6

使用 requests 库

import requests
import shutil,os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder

def ImageDl(url):
    attempts = 0
    while attempts < 5:#retry 5 times
        try:
            filename = url.split('/')[-1]
            r = requests.get(url,headers=headers,stream=True,timeout=5)
            if r.status_code == 200:
                with open(os.path.join(path,filename),'wb') as f:
                    r.raw.decode_content = True
                    shutil.copyfileobj(r.raw,f)
            print(filename)
            break
        except Exception as e:
            attempts+=1
            print(e)


ImageDl(url)

- Sohan Das

在我的情况下，头部似乎非常重要，我一直在收到403错误。现在它有效了。 - Ishtiyaq Husain

4

使用简单的 Python wget 模块来下载链接。用法如下：

import wget
wget.download('http://www.digimouth.com/news/media/2011/09/google-logo.jpg')

- Gaurav Shrivastava

3

这是一个非常简短的答案。

import urllib
urllib.urlretrieve("http://photogallery.sandesh.com/Picture.aspx?AlubumId=422040", "Abc.jpg")

- OO7

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Liquid_Fire · Accepted Answer

Python 2

如果你只是想将内容保存为文件，这里有一个更简单的方法：

import urllib

urllib.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

第二个参数是文件应该保存的本地路径。

Python 3

正如SergO建议的那样，下面的代码应该适用于Python 3。

import urllib.request

urllib.request.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")