Java BASE64 utf8字符串解码

Question

Java BASE64 utf8字符串解码

16

我正在使用org.apache.commons.codec.binary.Base64来解码UTF-8字符串。有时，我会得到一个Base64编码的字符串，在解码后看起来像是^@kďż˝ďż˝@@。如何检查Base64是否正确或解码后的UTF-8字符串是否为有效的UTF-8字符串？

澄清一下，我正在使用

public static String base64Decode(String str) {
    try {
        return new String(base64Decode(str.getBytes(Constants.UTF_8)), Constants.UTF_8);
    } catch (UnsupportedEncodingException e) {
         ...
    }
}

public static byte[] base64Decode(byte[] byteArray) {
    return Base64.decodeBase64(byteArray);
}

- terry207

一个字符串是"UTF-8"是什么意思？一个字符串对象并不知道编码和字符集。 - Michael Konietzka

1

@Michael Konietzka：我认为这是不必要的吹毛求疵。Base64编码了一系列字节。我认为OP明确表示字节序列被假定为是Unicode字符串的UTF-8编码，而不是直接将java.lang.String编码为Base64（正如您所说，这是没有意义的）。 - finnw

@finnw 抱歉，我不知道如何清楚地解释。我使用 base64 获取编码字符串，并想检查它是否正确。我想捕捉到这种情况：当我获取到经过 base64 编码的字符串并解码后看起来像垃圾时，因为我接收到的应该是一些例如名称之类的东西。 - terry207

也许我只需要检查base64是否包含任何空格和其他不允许的字符？ - terry207

3个回答

1

试试这个：

var B64 = {
    alphabet: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=',
    lookup: null,
    ie: /MSIE /.test(navigator.userAgent),
    ieo: /MSIE [67]/.test(navigator.userAgent),
    encode: function (s) {
        var buffer = B64.toUtf8(s),
            position = -1,
            len = buffer.length,
            nan1, nan2, enc = [, , , ];
        if (B64.ie) {
            var result = [];
            while (++position < len) {
                nan1 = buffer[position + 1], nan2 = buffer[position + 2];
                enc[0] = buffer[position] >> 2;
                enc[1] = ((buffer[position] & 3) << 4) | (buffer[++position] >> 4);
                if (isNaN(nan1)) enc[2] = enc[3] = 64;
                else {
                    enc[2] = ((buffer[position] & 15) << 2) | (buffer[++position] >> 6);
                    enc[3] = (isNaN(nan2)) ? 64 : buffer[position] & 63;
                }
                result.push(B64.alphabet[enc[0]], B64.alphabet[enc[1]], B64.alphabet[enc[2]], B64.alphabet[enc[3]]);
            }
            return result.join('');
        } else {
            result = '';
            while (++position < len) {
                nan1 = buffer[position + 1], nan2 = buffer[position + 2];
                enc[0] = buffer[position] >> 2;
                enc[1] = ((buffer[position] & 3) << 4) | (buffer[++position] >> 4);
                if (isNaN(nan1)) enc[2] = enc[3] = 64;
                else {
                    enc[2] = ((buffer[position] & 15) << 2) | (buffer[++position] >> 6);
                    enc[3] = (isNaN(nan2)) ? 64 : buffer[position] & 63;
                }
                result += B64.alphabet[enc[0]] + B64.alphabet[enc[1]] + B64.alphabet[enc[2]] + B64.alphabet[enc[3]];
            }
            return result;
        }
    },
    decode: function (s) {
        var buffer = B64.fromUtf8(s),
            position = 0,
            len = buffer.length;
        if (B64.ieo) {
            result = [];
            while (position < len) {
                if (buffer[position] < 128) result.push(String.fromCharCode(buffer[position++]));
                else if (buffer[position] > 191 && buffer[position] < 224) result.push(String.fromCharCode(((buffer[position++] & 31) << 6) | (buffer[position++] & 63)));
                else result.push(String.fromCharCode(((buffer[position++] & 15) << 12) | ((buffer[position++] & 63) << 6) | (buffer[position++] & 63)));
            }
            return result.join('');
        } else {
            result = '';
            while (position < len) {
                if (buffer[position] < 128) result += String.fromCharCode(buffer[position++]);
                else if (buffer[position] > 191 && buffer[position] < 224) result += String.fromCharCode(((buffer[position++] & 31) << 6) | (buffer[position++] & 63));
                else result += String.fromCharCode(((buffer[position++] & 15) << 12) | ((buffer[position++] & 63) << 6) | (buffer[position++] & 63));
            }
            return result;
        }
    },
    toUtf8: function (s) {
        var position = -1,
            len = s.length,
            chr, buffer = [];
        if (/^[\x00-\x7f]*$/.test(s)) while (++position < len)
        buffer.push(s.charCodeAt(position));
        else while (++position < len) {
            chr = s.charCodeAt(position);
            if (chr < 128) buffer.push(chr);
            else if (chr < 2048) buffer.push((chr >> 6) | 192, (chr & 63) | 128);
            else buffer.push((chr >> 12) | 224, ((chr >> 6) & 63) | 128, (chr & 63) | 128);
        }
        return buffer;
    },
    fromUtf8: function (s) {
        var position = -1,
            len, buffer = [],
            enc = [, , , ];
        if (!B64.lookup) {
            len = B64.alphabet.length;
            B64.lookup = {};
            while (++position < len)
            B64.lookup[B64.alphabet[position]] = position;
            position = -1;
        }
        len = s.length;
        while (position < len) {
            enc[0] = B64.lookup[s.charAt(++position)];
            enc[1] = B64.lookup[s.charAt(++position)];
            buffer.push((enc[0] << 2) | (enc[1] >> 4));
            enc[2] = B64.lookup[s.charAt(++position)];
            if (enc[2] == 64) break;
            buffer.push(((enc[1] & 15) << 4) | (enc[2] >> 2));
            enc[3] = B64.lookup[s.charAt(++position)];
            if (enc[3] == 64) break;
            buffer.push(((enc[2] & 3) << 6) | enc[3]);
        }
        return buffer;
    }
};

点击此处查看。

- atiruz

1

这个对我来说完美地运作了。我知道它得到了负面评价，因为它是一个关于Java问题的JavaScript答案。 - Keeper Hood

0

我创建了这个方法：

public static String descodificarDeBase64(String stringCondificado){
    try {
        return new String(Base64.decode(stringCondificado.getBytes("UTF-8"),Base64.DEFAULT));
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
        return "";
    }
}

我可以将包含西班牙语特殊字符如á、ñ、í、ü的Base64编码进行解码。

示例：

descodificarDeBase64("wr9xdcOpIHRhbD8=");

将返回：¿Qué tal？

- bheatcoker

Base64.DEFAULT 未定义 - Philip Rego

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- BalusC · Accepted Answer

30

在将String转换为byte[]和反向转换时，您应该指定字符集。

byte[] bytes = string.getBytes("UTF-8");
// feed bytes to Base64

和

// get bytes from Base64
String string = new String(bytes, "UTF-8");

否则将使用平台的默认编码，这并不一定是UTF-8本身。

- BalusC

这个字符串看起来不像是UTF8误解为单字节编码。它可能是GB18030误解为UTF8吗？ - finnw

@finnw：答案确实假定原始字符串是UTF-8，正如OP明确提到的那样。如果实际情况并非如此，则问题需要在其他地方解决。 - BalusC

@BalusC：一个字符串是UTF8是什么意思？UTF-8是一种编码方式。 - Michael Konietzka

@Michael：这个字符串肯定是以某种方式构建的。例如，如果你根据Reader返回的数据创建字符串，你还需要确保Reader使用UTF-8读取源数据。不过我理解你的吹毛求疵，我应该更好地表达我的前一个评论，比如用“源数据”代替“字符串”。 - BalusC

"国家标准"既不是UTF-8也不是GB18030，它只是一个字符串对象。但是它可以使用UTF-8、GB18030进行编码，因为这些编码可以编码所有的Unicode代码点。当然，解码系统必须使用与编码系统相同的字符编码来处理字节。是的，我在这个问题上很挑剔，因为问题中提到了“一个字符串是utf-8”，需要澄清，因为没有“UTF-8字符串”这样的东西。您可以使用UTF-8将字符串编码为字节数组，但那时只有一个byte[]。" - Michael Konietzka

@Michael（和OP）：简而言之，当将String转换为/从byte[]转换时，您需要指定相同的字符编码。这也是我的答案所涉及的内容。如果这不能解决问题，则原始源根本不在相同的编码中，或者显示控制台（您打印/显示这些字符的地方）根本不使用相同的编码。至少Base64中的问题与字符编码无关。 - BalusC