HttpClient 4 - 如何捕获最后重定向的URL

53

我有一段很简单的HttpClient 4代码,用于调用HttpGet获取HTML输出。HTML返回的脚本和图像位置都设置为本地(例如<img src="/images/foo.jpg"/>),因此我需要调用URL将其转换为绝对路径(<img src="http://foo.com/images/foo.jpg"/>)。现在出现了问题——在调用过程中可能会发生一到两个302重定向,因此原始URL不再反映HTML的位置。

如何在考虑到所有可能存在的重定向的情况下,获取返回内容的最新URL?

我查看了HttpGet#getAllHeaders()HttpResponse#getAllHeaders(),但没找到任何有用的信息。

编辑:HttpGet#getURI()返回原始调用地址。

8个回答

63

那将是当前的URL,您可以通过调用

  HttpGet#getURI();

编辑:您没有提到您如何执行重定向。这对我们有效,因为我们自己处理302。

听起来您正在使用DefaultRedirectHandler。我们也曾尝试过这种方法。获取当前URL有点棘手。您需要使用自己的上下文。以下是相关的代码片段,

        HttpGet httpget = new HttpGet(url);
        HttpContext context = new BasicHttpContext(); 
        HttpResponse response = httpClient.execute(httpget, context); 
        if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK)
            throw new IOException(response.getStatusLine().toString());
        HttpUriRequest currentReq = (HttpUriRequest) context.getAttribute( 
                ExecutionContext.HTTP_REQUEST);
        HttpHost currentHost = (HttpHost)  context.getAttribute( 
                ExecutionContext.HTTP_TARGET_HOST);
        String currentUrl = (currentReq.getURI().isAbsolute()) ? currentReq.getURI().toString() : (currentHost.toURI() + currentReq.getURI());

默认的重定向对我们不起作用,所以我们进行了更改,但我忘记了问题出在哪里。


1
哎呀,不行——getURI() 返回给我原始调用的 URL。 - Bostone
1
我没有做任何特别的事情 - 非常基本的HttpGet代码。我在谷歌上搜索我的问题,我认为我需要禁用自动重定向并“跟踪路径”,直到我得到200。 - Bostone
1
在HttpClient 4中,实现这个功能变得如此复杂,这似乎非常愚蠢。在v3中,有一个getPath()方法可以解决问题。 - stevevls
6
"ExecutionContext"现已被废弃,请使用"HttpCoreContext"代替。 - Mark McLaren
1
很遗憾,属性ExecutionContext.HTTP_TARGET_HOST和ExecutionContext.HTTP_REQUEST已被弃用。 - Jakob Alexander Eichler
显示剩余2条评论

44
在 HttpClient 4 中,如果您使用的是 LaxRedirectStrategyDefaultRedirectStrategy 的任何子类,则这是推荐的方式(请参见 DefaultRedirectStrategy 的源代码):
HttpContext context = new BasicHttpContext();
HttpResult<T> result = client.execute(request, handler, context);
URI finalUrl = request.getURI();
RedirectLocations locations = (RedirectLocations) context.getAttribute(DefaultRedirectStrategy.REDIRECT_LOCATIONS);
if (locations != null) {
    finalUrl = locations.getAll().get(locations.getAll().size() - 1);
}

自 HttpClient 4.3.x 开始,上述代码可以简化为:
HttpClientContext context = HttpClientContext.create();
HttpResult<T> result = client.execute(request, handler, context);
URI finalUrl = request.getURI();
List<URI> locations = context.getRedirectLocations();
if (locations != null) {
    finalUrl = locations.get(locations.size() - 1);
}

3
你的回答应该被选为勾选答案。这正是 Apache 的意图!做得好! - Martijn
1
简单明了。而且这个解决方案比这里提到的所有其他方案都要好! - korpe
1
非常感谢您!在最新版本中,DefaultRedirectStrategy.REDIRECT_LOCATIONS已被弃用,应该使用HttpClientContext.REDIRECT_LOCATIONS代替。 - dav1d
有没有办法获取第一个重定向的重定向状态?即301或302? - srchulo
1
如果您正在执行POST查询,则应将重定向策略设置为“LaxRedirectStrategy”,否则“getRedirectLocations”将返回null。 - Hugodby
显示剩余2条评论

15
    HttpGet httpGet = new HttpHead("<put your URL here>");
    HttpClient httpClient = HttpClients.createDefault();
    HttpClientContext context = HttpClientContext.create();
    httpClient.execute(httpGet, context);
    List<URI> redirectURIs = context.getRedirectLocations();
    if (redirectURIs != null && !redirectURIs.isEmpty()) {
        for (URI redirectURI : redirectURIs) {
            System.out.println("Redirect URI: " + redirectURI);
        }
        URI finalURI = redirectURIs.get(redirectURIs.size() - 1);
    }

1
需要注意的另一件事(在所有这些答案中)是"原子HTTP重定向处理"的概念,它建议客户端(至少某些类型的Web应用程序)出于安全考虑不应该能够看到除最后一个重定向URL之外的任何内容。(但是,在Java中可能很难完全防止它)。 - Martin Pain

8
我在HttpComponents Client Documentation上发现了这个。
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpClientContext context = HttpClientContext.create();
HttpGet httpget = new HttpGet("http://localhost:8080/");
CloseableHttpResponse response = httpclient.execute(httpget, context);
try {
    HttpHost target = context.getTargetHost();
    List<URI> redirectLocations = context.getRedirectLocations();
    URI location = URIUtils.resolve(httpget.getURI(), target, redirectLocations);
    System.out.println("Final HTTP location: " + location.toASCIIString());
    // Expected to be an absolute URI
} finally {
    response.close();
}

6

基于 ZZ Coder 解决方案的个人见解是使用 ResponseInterceptor 来简单跟踪最后一个重定向位置,这样就不会丢失信息,例如在哈希标记之后。如果没有响应拦截器,你将会失去哈希标记。示例:http://j.mp/OxbI23

private static HttpClient createHttpClient() throws NoSuchAlgorithmException, KeyManagementException {
    SSLContext sslContext = SSLContext.getInstance("SSL");
    TrustManager[] trustAllCerts = new TrustManager[] { new TrustAllTrustManager() };
    sslContext.init(null, trustAllCerts, new java.security.SecureRandom());

    SSLSocketFactory sslSocketFactory = new SSLSocketFactory(sslContext);
    SchemeRegistry schemeRegistry = new SchemeRegistry();
    schemeRegistry.register(new Scheme("https", 443, sslSocketFactory));
    schemeRegistry.register(new Scheme("http", 80, new PlainSocketFactory()));

    HttpParams params = new BasicHttpParams();
    ClientConnectionManager cm = new org.apache.http.impl.conn.SingleClientConnManager(schemeRegistry);

    // some pages require a user agent
    AbstractHttpClient httpClient = new DefaultHttpClient(cm, params);
    HttpProtocolParams.setUserAgent(httpClient.getParams(), "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1");

    httpClient.setRedirectStrategy(new RedirectStrategy());

    httpClient.addResponseInterceptor(new HttpResponseInterceptor() {
        @Override
        public void process(HttpResponse response, HttpContext context)
                throws HttpException, IOException {
            if (response.containsHeader("Location")) {
                Header[] locations = response.getHeaders("Location");
                if (locations.length > 0)
                    context.setAttribute(LAST_REDIRECT_URL, locations[0].getValue());
            }
        }
    });

    return httpClient;
}

private String getUrlAfterRedirects(HttpContext context) {
    String lastRedirectUrl = (String) context.getAttribute(LAST_REDIRECT_URL);
    if (lastRedirectUrl != null)
        return lastRedirectUrl;
    else {
        HttpUriRequest currentReq = (HttpUriRequest) context.getAttribute(ExecutionContext.HTTP_REQUEST);
        HttpHost currentHost = (HttpHost)  context.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
        String currentUrl = (currentReq.getURI().isAbsolute()) ? currentReq.getURI().toString() : (currentHost.toURI() + currentReq.getURI());
        return currentUrl;
    }
}

public static final String LAST_REDIRECT_URL = "last_redirect_url";

您可以像ZZ Coder的解决方案一样使用它:

HttpResponse response = httpClient.execute(httpGet, context);
String url = getUrlAfterRedirects(context);

4

我认为找到最后一个URL的更简单方法是使用DefaultRedirectHandler。

package ru.test.test;

import java.net.URI;

import org.apache.http.HttpResponse;
import org.apache.http.ProtocolException;
import org.apache.http.impl.client.DefaultRedirectHandler;
import org.apache.http.protocol.HttpContext;

public class MyRedirectHandler extends DefaultRedirectHandler {

    public URI lastRedirectedUri;

    @Override
    public boolean isRedirectRequested(HttpResponse response, HttpContext context) {

        return super.isRedirectRequested(response, context);
    }

    @Override
    public URI getLocationURI(HttpResponse response, HttpContext context)
            throws ProtocolException {

        lastRedirectedUri = super.getLocationURI(response, context);

        return lastRedirectedUri;
    }

}

使用此处理程序的代码:

  DefaultHttpClient httpclient = new DefaultHttpClient();
  MyRedirectHandler handler = new MyRedirectHandler();
  httpclient.setRedirectHandler(handler);

  HttpGet get = new HttpGet(url);

  HttpResponse response = httpclient.execute(get);

  HttpEntity entity = response.getEntity();
  lastUrl = url;
  if(handler.lastRedirectedUri != null){
      lastUrl = handler.lastRedirectedUri.toString();
  }

HttpClient#setRedirectHandler()方法在最新版本的HttpClient中已被弃用。 - James Selvakumar
有人知道如何在最新版本中处理这个问题吗? - Jakob Alexander Eichler

2

在2.3版本中,Android仍不支持跟随重定向(HTTP代码302)。我只读取位置头并重新下载:

if (statusCode != HttpStatus.SC_OK) {
    Header[] headers = response.getHeaders("Location");

    if (headers != null && headers.length != 0) {
        String newUrl = headers[headers.length - 1].getValue();
        // call again the same downloading method with new URL
        return downloadBitmap(newUrl);
    } else {
        return null;
    }
}

这里没有循环重定向保护,所以请小心。在我的博客上了解更多使用AndroidHttpClient跟踪302重定向的信息。

0

这是我成功获取重定向URL的方法:

Header[] arr = httpResponse.getHeaders("Location");
for (Header head : arr){
    String whatever = arr.getValue();
}

或者,如果您确定只有一个重定向位置,请执行以下操作:

httpResponse.getFirstHeader("Location").getValue();

2
这对我来说不起作用。它只返回最后一个请求的头部信息。 - Amir Raminfar

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接