使用Java从Github下载二进制文件

7
我正在尝试使用以下方法下载此文件(http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar),但似乎不起作用。我得到了一个空的/损坏的文件。
String link = "http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar";
String fileName = "ChampionHelper-4.jar";

URL url = new URL(link);
URLConnection c = url.openConnection();
c.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 1.2.30703)");

InputStream input;
input = c.getInputStream();
byte[] buffer = new byte[4096];
int n = -1;

OutputStream output = new FileOutputStream(new File(fileName));
while ((n = input.read(buffer)) != -1) {
    if (n > 0) {
        output.write(buffer, 0, n);
    }
}
output.close();

但我可以使用相同的方法从我的Dropbox成功下载以下文件(http://dl.dropbox.com/u/13226123/ChampionHelper-4.jar)。所以,某种方式上,Github 知道我不是尝试下载文件的常规用户。我已经尝试过更改用户代理,但这也没有帮助。那么,我应该如何使用Java下载托管在我的Github帐户上的文件呢?编辑:我尝试使用apache commons-io进行此操作,但我得到了相同的效果,一个空/损坏的文件。

我能够毫无问题地从Github下载文件。我的浏览器是Windows 7上的Chrome v23。 - Chris Snow
@Chris 的问题不是关于能否通过浏览器下载文件。请重新阅读问题。 - Mukul Goel
5个回答

3
看起来GitHub在请求此文件时会给你多个重定向级别,而这篇StackOverflow文章指出,URLConnection不会自动遵循更改协议的重定向。以下是我使用curl看到的内容:
第一次请求:
curl -v http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
* About to connect() to github.com port 80 (#0)
*   Trying 207.97.227.239... connected
* Connected to github.com (207.97.227.239) port 80 (#0)
> GET /downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: github.com
> Accept: */*
>  
< HTTP/1.1 301 Moved Permanently 
< Server: nginx < Date: Sun, 18 Nov 2012 15:56:36 GMT 
< Content-Type: text/html < Content-Length: 178 
< Connection: close 
< Location: https://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar 
<  <html> <head><title>301 Moved Permanently</title></head> <body bgcolor="white"> <center><h1>301 Moved Permanently</h1></center> <hr><center>nginx</center> </body> </html>
* Closing connection #0

这个位置头的curl:
curl -v https://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
* About to connect() to github.com port 443 (#0)
*   Trying 207.97.227.239... connected
* Connected to github.com (207.97.227.239) port 443 (#0)
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using RC4-SHA
* Server certificate:
*    subject: businessCategory=Private Organization; 1.3.6.1.4.1.311.60.2.1.3=US; 1.3.6.1.4.1.311.60.2.1.2=California; serialNumber=C3268102; C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=github.com
*    start date: 2011-05-27 00:00:00 GMT
*    expire date: 2013-07-29 12:00:00 GMT
*    subjectAltName: github.com matched
*    issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert High Assurance EV CA-1
*    SSL certificate verify ok.
> GET /downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: github.com
> Accept: */*
> 
< HTTP/1.1 302 Found
< Server: nginx
< Date: Sun, 18 Nov 2012 15:58:56 GMT
< Content-Type: text/html; charset=utf-8
< Connection: keep-alive
< Status: 302 Found
< Strict-Transport-Security: max-age=2592000
< Cache-Control: no-cache
< X-Runtime: 48
< Location: http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
< X-Frame-Options: deny
< Content-Length: 149
< 
* Connection #0 to host github.com left intact
* Closing connection #0
* SSLv3, TLS alert, Client hello (1):
<html><body>You are being <a href="http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar">redirected</a>.</body></html>

这个响应中的location头返回的是实际文件。您可能希望使用Apache HTTP Client进行下载。您可以设置它在GET期间遵循这些301和302重定向。


您可以在HTTPURLConnection实例上调用setInstanceFollowRedirects来自动跟踪这些重定向。 - robert
我链接的StackOverflow文章指出,URLConnection不会遵循更改协议的重定向。你写过一些代码来测试setInstanceFollowRedirects是否有效吗? - Christian Trimble
你是对的...必须添加以下内容:while (c.getResponseCode() > 300 && c.getResponseCode() < 400) c = (HttpURLConnection) (new URL(c.getHeaderField("Location"))).openConnection(); - robert

2
这个可以完成工作:
public class Download {
   private static boolean isRedirected( Map<String, List<String>> header ) {
      for( String hv : header.get( null )) {
         if(   hv.contains( " 301 " )
            || hv.contains( " 302 " )) return true;
      }
      return false;
   }
   public static void main( String[] args ) throws Throwable
   {
      String link =
         "http://github.com/downloads/TheHolyWaffle/ChampionHelper/" +
         "ChampionHelper-4.jar";
      String            fileName = "ChampionHelper-4.jar";
      URL               url  = new URL( link );
      HttpURLConnection http = (HttpURLConnection)url.openConnection();
      Map< String, List< String >> header = http.getHeaderFields();
      while( isRedirected( header )) {
         link = header.get( "Location" ).get( 0 );
         url    = new URL( link );
         http   = (HttpURLConnection)url.openConnection();
         header = http.getHeaderFields();
      }
      InputStream  input  = http.getInputStream();
      byte[]       buffer = new byte[4096];
      int          n      = -1;
      OutputStream output = new FileOutputStream( new File( fileName ));
      while ((n = input.read(buffer)) != -1) {
         output.write( buffer, 0, n );
      }
      output.close();
   }
}

2
获取原始二进制文件的直接下载链接,例如https://github.com/xerial/sqlite-jdbc/blob/master/src/main/resources/org/sqlite/native/Windows/x86_64/sqlitejdbc.dll?raw=true,通过复制查看原始文件链接实现:

最后使用以下代码片段下载文件:
public static void download(String downloadURL) throws IOException
{
    URL website = new URL(downloadURL);
    String fileName = getFileName(downloadURL);

    try (InputStream inputStream = website.openStream())
    {
        Files.copy(inputStream, Paths.get(fileName), StandardCopyOption.REPLACE_EXISTING);
    }
}

public static String getFileName(String downloadURL)
{
    String baseName = FilenameUtils.getBaseName(downloadURL);
    String extension = FilenameUtils.getExtension(downloadURL);
    String fileName = baseName + "." + extension;

    int questionMarkIndex = fileName.indexOf("?");
    if (questionMarkIndex != -1)
    {
        fileName = fileName.substring(0, questionMarkIndex);
    }

    fileName = fileName.replaceAll("-", "");
    return URLDecoder.decode(fileName, "UTF-8");
}

你还需要 Apache Commons IO 的maven依赖项,用于 FilenameUtils 类:
<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>LATEST</version>
</dependency>

1

我找到了解决方案。

显然,http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar 并没有直接链接到我的文件。

当我用文本编辑器查看生成的 jar 文件时,我发现了这个问题:

<html><body>You are being <a href="http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar">redirected</a>.</body></html>

这意味着直接链接如下:http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar

使用此链接,我可以轻松下载文件。


请检查我的帖子,它可以处理重定向而无需使用Apache或任何第三方库。 - Aubin


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接