基于字符串的数据编码:Base64 vs Base64url


Base64和Base64url之间有什么区别?我在JSON Web Tokens等东西中看到这样的内容。


Base64和Base64url都是将二进制数据编码为字符串形式的方法。您可以在此处阅读有关base64理论的内容。 Base64的问题在于它包含+/=这些字符,在某些文件系统名称和URL中具有保留意义。因此,base64url通过将+替换为-,将/替换为_来解决此问题。尾部填充字符=在不需要时可以被省略,但在URL中,它最可能是% URL编码。然后,编码数据可以在URL中包含而不会出现问题。
Index  Base64  Base64Url

0      A       A 
1      B       B 
2      C       C 
3      D       D 
4      E       E 
5      F       F 
6      G       G 
7      H       H 
8      I       I 
9      J       J 
10     K       K 
11     L       L 
12     M       M 
13     N       N 
14     O       O 
15     P       P 
16     Q       Q 
17     R       R 
18     S       S 
19     T       T 
20     U       U 
21     V       V 
22     W       W 
23     X       X 
24     Y       Y 
25     Z       Z 
26     a       a 
27     b       b 
28     c       c 
29     d       d 
30     e       e 
31     f       f 
32     g       g 
33     h       h 
34     i       i 
35     j       j 
36     k       k 
37     l       l 
38     m       m 
39     n       n 
40     o       o 
41     p       p 
42     q       q 
43     r       r 
44     s       s 
45     t       t 
46     u       u 
47     v       v 
48     w       w
49     x       x
50     y       y
51     z       z
52     0       0
53     1       1
54     2       2
55     3       3
56     4       4
57     5       5
58     6       6
59     7       7
60     8       8
61     9       9
62     +       -
63     /       _
       =       (optional)


RCF 4648规范

4. Base 64 Encoding

The following description of base 64 is derived from 3, [4], [5], and [6]. This encoding may be referred to as "base64".

The Base 64 encoding is designed to represent arbitrary sequences of octets in a form that allows the use of both upper- and lowercase letters but that need not be human readable.

A 65-character subset of US-ASCII is used, enabling 6 bits to be
represented per printable character. (The extra 65th character, "=", is used to signify a special processing function.)

The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4 concatenated 6-bit groups, each of which is translated into a single character in the base 64 alphabet.

Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is placed in the
output string.

                  Table 1: The Base 64 Alphabet

 Value Encoding  Value Encoding  Value Encoding  Value Encoding
     0 A            17 R            34 i            51 z
     1 B            18 S            35 j            52 0
     2 C            19 T            36 k            53 1
     3 D            20 U            37 l            54 2
     4 E            21 V            38 m            55 3
     5 F            22 W            39 n            56 4
     6 G            23 X            40 o            57 5
     7 H            24 Y            41 p            58 6
     8 I            25 Z            42 q            59 7
     9 J            26 a            43 r            60 8
    10 K            27 b            44 s            61 9
    11 L            28 c            45 t            62 +
    12 M            29 d            46 u            63 /
    13 N            30 e            47 v
    14 O            31 f            48 w         (pad) =
    15 P            32 g            49 x
    16 Q            33 h            50 y

Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full encoding quantum is
always completed at the end of a quantity. When fewer than 24 input
bits are available in an input group, bits with value zero are added
(on the right) to form an integral number of 6-bit groups. Padding
at the end of the data is performed using the '=' character. Since
all base 64 input is an integral number of octets, only the following cases can arise:

(1) The final quantum of encoding input is an integral multiple of 24 bits; here, the final unit of encoded output will be an integral multiple of 4 characters with no "=" padding.

(2) The final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by two "=" padding characters.

(3) The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one "=" padding character.

5. Base 64 Encoding with URL and Filename Safe Alphabet

The Base 64 encoding with an URL and filename safe alphabet has been used in [12].

An alternative alphabet has been suggested that would use "~" as the 63rd character. Since the "~" character has special meaning in some file system environments, the encoding described in this section is recommended instead. The remaining unreserved URI character is ".", but some file system environments do not permit multiple "." in a filename, thus making the "." character unattractive as well.

The pad character "=" is typically percent-encoded when used in an URI [9], but if the data length is known implicitly, this can be
avoided by skipping the padding; see section 3.2.

This encoding may be referred to as "base64url". This encoding
should not be regarded as the same as the "base64" encoding and
should not be referred to as only "base64". Unless clarified
otherwise, "base64" refers to the base 64 in the previous section.

This encoding is technically identical to the previous one, except for the 62:nd and 63:rd alphabet character, as indicated in Table 2.

     Table 2: The "URL and Filename safe" Base 64 Alphabet

 Value Encoding  Value Encoding  Value Encoding  Value Encoding
     0 A            17 R            34 i            51 z
     1 B            18 S            35 j            52 0
     2 C            19 T            36 k            53 1
     3 D            20 U            37 l            54 2
     4 E            21 V            38 m            55 3
     5 F            22 W            39 n            56 4
     6 G            23 X            40 o            57 5
     7 H            24 Y            41 p            58 6
     8 I            25 Z            42 q            59 7
     9 J            26 a            43 r            60 8
    10 K            27 b            44 s            61 9
    11 L            28 c            45 t            62 - (minus)
    12 M            29 d            46 u            63 _
    13 N            30 e            47 v           (underline)
    14 O            31 f            48 w
    15 P            32 g            49 x
    16 Q            33 h            50 y         (pad) =

我可能在这里漏掉了一些东西。但是1.关于“这里是差异的图表”<—除了+/之外,其他都是相同的,对吗?2.如果“-”掩盖了“+”,那么你如何区分它本身与掩盖“+”时的“-”? - mfaani
@Honey,一:没错,Base64和Base64url只有在第62和63位上不同。二:我不会说“-”掩盖了“+”。这只是使用了不同的ASCII符号来表示62。如果你想实际地表示“-”字符,那么它应该被编码为“LQ==”。请参见这个 - Suragch
@Suragch - 就我在现场所见,从常见程度来看,base64urlbase64 更为普遍。您知道这是否属实吗? - Zephaniah Grunschlag
@ZephaniahGrunschlag,这对我来说似乎也是正确的,但是我在这个领域不是专家。 - Suragch


虽然这篇文章比较旧,但如果有人需要帮助的话,如果你使用的是.NET技术,你可以通过Brock Allen提供的System.IdentityModel.Base64Url类来进行Base64Url编码/解码。

    /// <summary>
    /// Base64Url encoder/decoder
    /// </summary>
    public static class Base64Url
        /// <summary>
        /// Encodes the specified byte array.
        /// </summary>
        /// <param name="arg">The argument.</param>
        /// <returns></returns>
        public static string Encode(byte[] arg)
            var s = Convert.ToBase64String(arg); // Standard base64 encoder
            s = s.Split('=')[0]; // Remove any trailing '='s
            s = s.Replace('+', '-'); // 62nd char of encoding
            s = s.Replace('/', '_'); // 63rd char of encoding
            return s;

        /// <summary>
        /// Decodes the specified string.
        /// </summary>
        /// <param name="arg">The argument.</param>
        /// <returns></returns>
        /// <exception cref="System.Exception">Illegal base64url string!</exception>
        public static byte[] Decode(string arg)
            var s = arg;
            s = s.Replace('-', '+'); // 62nd char of encoding
            s = s.Replace('_', '/'); // 63rd char of encoding
            switch (s.Length % 4) // Pad with trailing '='s
                case 0: break; // No pad chars in this case
                case 2: s += "=="; break; // Two pad chars
                case 3: s += "="; break; // One pad char
                default: throw new Exception("Illegal base64url string!");
            return Convert.FromBase64String(s); // Standard base64 decoder

或者 WebEncoders.Base64UrlEncode (.Net Core 1.0+) - Hans Kesting

