如何通过HTTP发送UTF-8字符作为字符串

3
我正在使用MultipartEntityBuilder通过HTTP发送文件。我们将"filename"作为字符串属性发送到addBinaryBody中,如下所示。问题是字符串文件名包含一些特殊字符,例如

"gültig/Kapitel/00/00/SOPs/SOP/sop123.pdf"

但是当它经过HTTP时,会变成

"g?ltig/Kapitel/00/00/SOPs/SOP/sop003986.pdf"

我尝试了URLDecoder和new String(bytes,StandardCharsets.UTF_8)。没有起作用。请建议一些解决方法。

需要的答案:

特殊字符应该显示为"gültig"而不是"g?ltig"

MultipartEntityBuilder builder = MultipartEntityBuilder.create();
    builder.addTextBody("index", docbase_name.toLowerCase() + "_content_index");
    builder.addBinaryBody("file", fileContent, ContentType.MULTIPART_FORM_DATA,filename);
    HttpEntity multipart = builder.build();
    HttpPost request = new HttpPost(
            "http://" + utility.getIp()
                + ":" + utility.getPort() + "/fscrawler/_upload");
    request.setEntity(multipart);

令人印象深刻的是,即使由于编码问题而导致末尾数字的变化也很惊人 ;-) - Joachim Sauer
哦耶!!在第二行中,我举了一个不同的例子:)但问题是我得到了“g?ltig”而不是“gültig”。 - sadhiya usama
我相信您可以通过指定要使用的编码来解决问题,如何实现取决于您使用的API,而我不是很熟悉这个API。在此处尝试构建自己的“String”几乎肯定是错误的方法。 - Joachim Sauer
1个回答

3

filename应该作为RFC 2047头传递给addBinaryBody,您可以使用Java邮件API中的MimeUtility进行编码(例如,请参见类似问题如何在Jersey中将UTF-8字符串放入和获取HTTP头? ):

builder.addBinaryBody("file", fileContent, ContentType.MULTIPART_FORM_DATA, MimeUtility.encodeText(filename));

setEntity 变为:

        request.setEntity(
                MultipartEntityBuilder.create()
                        .addTextBody("index", docbase_name.toLowerCase() + "_content_index")
                        .addBinaryBody("file",
                                fileContent,
                                ContentType.MULTIPART_FORM_DATA,
                                MimeUtility.encodeText(filename))
                        .build());

完整测试用例:

package com.github.vtitov.test;

import com.google.common.base.Charsets;
import com.google.common.io.CharStreams;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.mime.MultipartEntityBuilder;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.glassfish.jersey.logging.LoggingFeature;
import org.glassfish.jersey.media.multipart.FormDataBodyPart;
import org.glassfish.jersey.media.multipart.FormDataMultiPart;
import org.glassfish.jersey.server.ResourceConfig;
import org.glassfish.jersey.test.JerseyTest;
import org.glassfish.jersey.test.TestProperties;
import org.junit.Test;

import javax.mail.internet.MimeUtility;
import javax.ws.rs.Consumes;
import javax.ws.rs.GET;
import javax.ws.rs.POST;
import javax.ws.rs.Path;
import javax.ws.rs.core.Application;

import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.StandardCharsets;
import java.util.LinkedList;
import java.util.List;
import java.util.UUID;
import java.util.logging.Formatter;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.logging.SimpleFormatter;

import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.notNullValue;

public class RestTest extends JerseyTest {
    private final static Logger log = Logger.getLogger(MockHttpResource.class.getName());

    @Path("fscrawler")
    public static class FscrawlerResource {
        @POST
        @Consumes("multipart/form-data")
        @Path("_upload")
        public String postToString(final FormDataMultiPart multiPart) throws Exception {
            List<String> fileNames = new LinkedList<>();
            try {
                for(FormDataBodyPart f:multiPart.getFields().get("file")) {
                    fileNames.add(MimeUtility.decodeText(f.getContentDisposition().getFileName()));
                }
            } catch (Exception e) {
                log.log(Level.SEVERE, "server error: ", e);
                throw e;
            }
            return String.join(",", fileNames);
        }
    }

    @Override
    protected Application configure() {
        forceSet(TestProperties.CONTAINER_PORT, "0");
        set(TestProperties.RECORD_LOG_LEVEL, Level.INFO.intValue());
        set(TestProperties.RECORD_LOG_LEVEL, Level.FINE.intValue());
        return new ResourceConfig(FscrawlerResource.class)
                .register(LoggingFeature.class)
                .register(org.glassfish.jersey.media.multipart.MultiPartFeature.class)
                ;
    }

    @Test
    public void multipart() throws IOException {
        String baseUri = target().getUri().toString();
        String docbase_name = UUID.randomUUID().toString();
        byte[] fileContent = UUID.randomUUID().toString().getBytes(StandardCharsets.UTF_8);
        String  filename = "gültig/file.txt";

        HttpPost request = new HttpPost(baseUri + "fscrawler/_upload");
        request.setEntity(
                MultipartEntityBuilder.create()
                        .addTextBody("index", docbase_name.toLowerCase() + "_content_index")
                        .addBinaryBody("file",
                                fileContent,
                                ContentType.MULTIPART_FORM_DATA,
                                MimeUtility.encodeText(filename))
                        .build());

        log.info("executing request " + request.getRequestLine());
        try(CloseableHttpClient httpclient = HttpClients.createDefault();
            CloseableHttpResponse response = httpclient.execute(request)
        ) {
            log.info(String.format("response: %s", response.toString()));
            HttpEntity resEntity = response.getEntity();
            assertThat(resEntity, notNullValue());
            if (resEntity != null) {
                log.info("Response content length: " + resEntity.getContentLength());
                String resContent = CharStreams.toString(new InputStreamReader(resEntity.getContent(), Charsets.UTF_8));
                log.info(String.format("response content: %s", resContent));
                assertThat("filename matches", filename, equalTo(resContent));
            }
            EntityUtils.consume(resEntity);
        } catch (IOException e) {
            dumpServerLogRecords();
            throw e;
        }
        dumpServerLogRecords();
    }

    void dumpServerLogRecords() {
        log.info(String.format("total server log records: %s", getLoggedRecords().size()));
        Formatter sf = new SimpleFormatter();
        getLoggedRecords().forEach(r -> {
            log.info(String.format("server log record\n%s", sf.format(r)));
        });

    }
}

您可以启用日志记录以查看请求、响应和处理过程:

mvn test \
  -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.SimpleLog \
  -Dorg.apache.commons.logging.simplelog.showdatetime=true \
  -Dorg.apache.commons.logging.simplelog.log.org.apache.http=DEBUG \
  -Dorg.apache.commons.logging.simplelog.log.org.apache.http.wire=DEBUG

我无法使用MimeUtility编码,因为我们放置在fscrawler中的文件名正在被另一台服务器使用。我们无法对其进行解码。我尝试使用 new String(filename.getBytes(), StandardCharsets.UTF_8),但这也不起作用。 - sadhiya usama
我了解您正在使用 fscrawler。不过,目前还不清楚您是否可以打补丁来修改 fscrawler 的安装。原始的 fscrawler 使用 ISO_8859_1 请参见此处或者您可以提交一个补丁到 Github 或自己打补丁:在文件名处理中添加 MimeUtility.decodeMimeUtility.decodeText - y_ug

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接