summaryrefslogtreecommitdiffstats
path: root/doc/rfc/rfc4648.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc4648.txt')
-rw-r--r--doc/rfc/rfc4648.txt1011
1 files changed, 1011 insertions, 0 deletions
diff --git a/doc/rfc/rfc4648.txt b/doc/rfc/rfc4648.txt
new file mode 100644
index 0000000..c7599b4
--- /dev/null
+++ b/doc/rfc/rfc4648.txt
@@ -0,0 +1,1011 @@
+
+
+
+
+
+
+Network Working Group S. Josefsson
+Request for Comments: 4648 SJD
+Obsoletes: 3548 October 2006
+Category: Standards Track
+
+
+ The Base16, Base32, and Base64 Data Encodings
+
+Status of This Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2006).
+
+Abstract
+
+ This document describes the commonly used base 64, base 32, and base
+ 16 encoding schemes. It also discusses the use of line-feeds in
+ encoded data, use of padding in encoded data, use of non-alphabet
+ characters in encoded data, use of different encoding alphabets, and
+ canonical encodings.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Josefsson Standards Track [Page 1]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 2. Conventions Used in This Document ...............................3
+ 3. Implementation Discrepancies ....................................3
+ 3.1. Line Feeds in Encoded Data .................................3
+ 3.2. Padding of Encoded Data ....................................4
+ 3.3. Interpretation of Non-Alphabet Characters in Encoded Data ..4
+ 3.4. Choosing the Alphabet ......................................4
+ 3.5. Canonical Encoding .........................................5
+ 4. Base 64 Encoding ................................................5
+ 5. Base 64 Encoding with URL and Filename Safe Alphabet ............7
+ 6. Base 32 Encoding ................................................8
+ 7. Base 32 Encoding with Extended Hex Alphabet ....................10
+ 8. Base 16 Encoding ...............................................10
+ 9. Illustrations and Examples .....................................11
+ 10. Test Vectors ..................................................12
+ 11. ISO C99 Implementation of Base64 ..............................14
+ 12. Security Considerations .......................................14
+ 13. Changes Since RFC 3548 ........................................15
+ 14. Acknowledgements ..............................................15
+ 15. Copying Conditions ............................................15
+ 16. References ....................................................16
+ 16.1. Normative References .....................................16
+ 16.2. Informative References ...................................16
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Josefsson Standards Track [Page 2]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+1. Introduction
+
+ Base encoding of data is used in many situations to store or transfer
+ data in environments that, perhaps for legacy reasons, are restricted
+ to US-ASCII [1] data. Base encoding can also be used in new
+ applications that do not have legacy restrictions, simply because it
+ makes it possible to manipulate objects with text editors.
+
+ In the past, different applications have had different requirements
+ and thus sometimes implemented base encodings in slightly different
+ ways. Today, protocol specifications sometimes use base encodings in
+ general, and "base64" in particular, without a precise description or
+ reference. Multipurpose Internet Mail Extensions (MIME) [4] is often
+ used as a reference for base64 without considering the consequences
+ for line-wrapping or non-alphabet characters. The purpose of this
+ specification is to establish common alphabet and encoding
+ considerations. This will hopefully reduce ambiguity in other
+ documents, leading to better interoperability.
+
+2. Conventions Used in This Document
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [2].
+
+3. Implementation Discrepancies
+
+ Here we discuss the discrepancies between base encoding
+ implementations in the past and, where appropriate, mandate a
+ specific recommended behavior for the future.
+
+3.1. Line Feeds in Encoded Data
+
+ MIME [4] is often used as a reference for base 64 encoding. However,
+ MIME does not define "base 64" per se, but rather a "base 64 Content-
+ Transfer-Encoding" for use within MIME. As such, MIME enforces a
+ limit on line length of base 64-encoded data to 76 characters. MIME
+ inherits the encoding from Privacy Enhanced Mail (PEM) [3], stating
+ that it is "virtually identical"; however, PEM uses a line length of
+ 64 characters. The MIME and PEM limits are both due to limits within
+ SMTP.
+
+ Implementations MUST NOT add line feeds to base-encoded data unless
+ the specification referring to this document explicitly directs base
+ encoders to add line feeds after a specific number of characters.
+
+
+
+
+
+
+Josefsson Standards Track [Page 3]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+3.2. Padding of Encoded Data
+
+ In some circumstances, the use of padding ("=") in base-encoded data
+ is not required or used. In the general case, when assumptions about
+ the size of transported data cannot be made, padding is required to
+ yield correct decoded data.
+
+ Implementations MUST include appropriate pad characters at the end of
+ encoded data unless the specification referring to this document
+ explicitly states otherwise.
+
+ The base64 and base32 alphabets use padding, as described below in
+ sections 4 and 6, but the base16 alphabet does not need it; see
+ section 8.
+
+3.3. Interpretation of Non-Alphabet Characters in Encoded Data
+
+ Base encodings use a specific, reduced alphabet to encode binary
+ data. Non-alphabet characters could exist within base-encoded data,
+ caused by data corruption or by design. Non-alphabet characters may
+ be exploited as a "covert channel", where non-protocol data can be
+ sent for nefarious purposes. Non-alphabet characters might also be
+ sent in order to exploit implementation errors leading to, e.g.,
+ buffer overflow attacks.
+
+ Implementations MUST reject the encoded data if it contains
+ characters outside the base alphabet when interpreting base-encoded
+ data, unless the specification referring to this document explicitly
+ states otherwise. Such specifications may instead state, as MIME
+ does, that characters outside the base encoding alphabet should
+ simply be ignored when interpreting data ("be liberal in what you
+ accept"). Note that this means that any adjacent carriage return/
+ line feed (CRLF) characters constitute "non-alphabet characters" and
+ are ignored. Furthermore, such specifications MAY ignore the pad
+ character, "=", treating it as non-alphabet data, if it is present
+ before the end of the encoded data. If more than the allowed number
+ of pad characters is found at the end of the string (e.g., a base 64
+ string terminated with "==="), the excess pad characters MAY also be
+ ignored.
+
+3.4. Choosing the Alphabet
+
+ Different applications have different requirements on the characters
+ in the alphabet. Here are a few requirements that determine which
+ alphabet should be used:
+
+
+
+
+
+
+Josefsson Standards Track [Page 4]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+ o Handled by humans. The characters "0" and "O" are easily
+ confused, as are "1", "l", and "I". In the base32 alphabet below,
+ where 0 (zero) and 1 (one) are not present, a decoder may
+ interpret 0 as O, and 1 as I or L depending on case. (However, by
+ default it should not; see previous section.)
+
+ o Encoded into structures that mandate other requirements. For base
+ 16 and base 32, this determines the use of upper- or lowercase
+ alphabets. For base 64, the non-alphanumeric characters (in
+ particular, "/") may be problematic in file names and URLs.
+
+ o Used as identifiers. Certain characters, notably "+" and "/" in
+ the base 64 alphabet, are treated as word-breaks by legacy text
+ search/index tools.
+
+ There is no universally accepted alphabet that fulfills all the
+ requirements. For an example of a highly specialized variant, see
+ IMAP [8]. In this document, we document and name some currently used
+ alphabets.
+
+3.5. Canonical Encoding
+
+ The padding step in base 64 and base 32 encoding can, if improperly
+ implemented, lead to non-significant alterations of the encoded data.
+ For example, if the input is only one octet for a base 64 encoding,
+ then all six bits of the first symbol are used, but only the first
+ two bits of the next symbol are used. These pad bits MUST be set to
+ zero by conforming encoders, which is described in the descriptions
+ on padding below. If this property do not hold, there is no
+ canonical representation of base-encoded data, and multiple base-
+ encoded strings can be decoded to the same binary data. If this
+ property (and others discussed in this document) holds, a canonical
+ encoding is guaranteed.
+
+ In some environments, the alteration is critical and therefore
+ decoders MAY chose to reject an encoding if the pad bits have not
+ been set to zero. The specification referring to this may mandate a
+ specific behaviour.
+
+4. Base 64 Encoding
+
+ The following description of base 64 is derived from [3], [4], [5],
+ and [6]. This encoding may be referred to as "base64".
+
+ The Base 64 encoding is designed to represent arbitrary sequences of
+ octets in a form that allows the use of both upper- and lowercase
+ letters but that need not be human readable.
+
+
+
+
+Josefsson Standards Track [Page 5]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+ A 65-character subset of US-ASCII is used, enabling 6 bits to be
+ represented per printable character. (The extra 65th character, "=",
+ is used to signify a special processing function.)
+
+ The encoding process represents 24-bit groups of input bits as output
+ strings of 4 encoded characters. Proceeding from left to right, a
+ 24-bit input group is formed by concatenating 3 8-bit input groups.
+ These 24 bits are then treated as 4 concatenated 6-bit groups, each
+ of which is translated into a single character in the base 64
+ alphabet.
+
+ Each 6-bit group is used as an index into an array of 64 printable
+ characters. The character referenced by the index is placed in the
+ output string.
+
+ Table 1: The Base 64 Alphabet
+
+ Value Encoding Value Encoding Value Encoding Value Encoding
+ 0 A 17 R 34 i 51 z
+ 1 B 18 S 35 j 52 0
+ 2 C 19 T 36 k 53 1
+ 3 D 20 U 37 l 54 2
+ 4 E 21 V 38 m 55 3
+ 5 F 22 W 39 n 56 4
+ 6 G 23 X 40 o 57 5
+ 7 H 24 Y 41 p 58 6
+ 8 I 25 Z 42 q 59 7
+ 9 J 26 a 43 r 60 8
+ 10 K 27 b 44 s 61 9
+ 11 L 28 c 45 t 62 +
+ 12 M 29 d 46 u 63 /
+ 13 N 30 e 47 v
+ 14 O 31 f 48 w (pad) =
+ 15 P 32 g 49 x
+ 16 Q 33 h 50 y
+
+ Special processing is performed if fewer than 24 bits are available
+ at the end of the data being encoded. A full encoding quantum is
+ always completed at the end of a quantity. When fewer than 24 input
+ bits are available in an input group, bits with value zero are added
+ (on the right) to form an integral number of 6-bit groups. Padding
+ at the end of the data is performed using the '=' character. Since
+ all base 64 input is an integral number of octets, only the following
+ cases can arise:
+
+ (1) The final quantum of encoding input is an integral multiple of 24
+ bits; here, the final unit of encoded output will be an integral
+ multiple of 4 characters with no "=" padding.
+
+
+
+Josefsson Standards Track [Page 6]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+ (2) The final quantum of encoding input is exactly 8 bits; here, the
+ final unit of encoded output will be two characters followed by
+ two "=" padding characters.
+
+ (3) The final quantum of encoding input is exactly 16 bits; here, the
+ final unit of encoded output will be three characters followed by
+ one "=" padding character.
+
+5. Base 64 Encoding with URL and Filename Safe Alphabet
+
+ The Base 64 encoding with an URL and filename safe alphabet has been
+ used in [12].
+
+ An alternative alphabet has been suggested that would use "~" as the
+ 63rd character. Since the "~" character has special meaning in some
+ file system environments, the encoding described in this section is
+ recommended instead. The remaining unreserved URI character is ".",
+ but some file system environments do not permit multiple "." in a
+ filename, thus making the "." character unattractive as well.
+
+ The pad character "=" is typically percent-encoded when used in an
+ URI [9], but if the data length is known implicitly, this can be
+ avoided by skipping the padding; see section 3.2.
+
+ This encoding may be referred to as "base64url". This encoding
+ should not be regarded as the same as the "base64" encoding and
+ should not be referred to as only "base64". Unless clarified
+ otherwise, "base64" refers to the base 64 in the previous section.
+
+ This encoding is technically identical to the previous one, except
+ for the 62:nd and 63:rd alphabet character, as indicated in Table 2.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Josefsson Standards Track [Page 7]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+ Table 2: The "URL and Filename safe" Base 64 Alphabet
+
+ Value Encoding Value Encoding Value Encoding Value Encoding
+ 0 A 17 R 34 i 51 z
+ 1 B 18 S 35 j 52 0
+ 2 C 19 T 36 k 53 1
+ 3 D 20 U 37 l 54 2
+ 4 E 21 V 38 m 55 3
+ 5 F 22 W 39 n 56 4
+ 6 G 23 X 40 o 57 5
+ 7 H 24 Y 41 p 58 6
+ 8 I 25 Z 42 q 59 7
+ 9 J 26 a 43 r 60 8
+ 10 K 27 b 44 s 61 9
+ 11 L 28 c 45 t 62 - (minus)
+ 12 M 29 d 46 u 63 _
+ 13 N 30 e 47 v (underline)
+ 14 O 31 f 48 w
+ 15 P 32 g 49 x
+ 16 Q 33 h 50 y (pad) =
+
+6. Base 32 Encoding
+
+ The following description of base 32 is derived from [11] (with
+ corrections). This encoding may be referred to as "base32".
+
+ The Base 32 encoding is designed to represent arbitrary sequences of
+ octets in a form that needs to be case insensitive but that need not
+ be human readable.
+
+ A 33-character subset of US-ASCII is used, enabling 5 bits to be
+ represented per printable character. (The extra 33rd character, "=",
+ is used to signify a special processing function.)
+
+ The encoding process represents 40-bit groups of input bits as output
+ strings of 8 encoded characters. Proceeding from left to right, a
+ 40-bit input group is formed by concatenating 5 8bit input groups.
+ These 40 bits are then treated as 8 concatenated 5-bit groups, each
+ of which is translated into a single character in the base 32
+ alphabet. When a bit stream is encoded via the base 32 encoding, the
+ bit stream must be presumed to be ordered with the most-significant-
+ bit first. That is, the first bit in the stream will be the high-
+ order bit in the first 8bit byte, the eighth bit will be the low-
+ order bit in the first 8bit byte, and so on.
+
+
+
+
+
+
+
+Josefsson Standards Track [Page 8]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+ Each 5-bit group is used as an index into an array of 32 printable
+ characters. The character referenced by the index is placed in the
+ output string. These characters, identified in Table 3, below, are
+ selected from US-ASCII digits and uppercase letters.
+
+ Table 3: The Base 32 Alphabet
+
+ Value Encoding Value Encoding Value Encoding Value Encoding
+ 0 A 9 J 18 S 27 3
+ 1 B 10 K 19 T 28 4
+ 2 C 11 L 20 U 29 5
+ 3 D 12 M 21 V 30 6
+ 4 E 13 N 22 W 31 7
+ 5 F 14 O 23 X
+ 6 G 15 P 24 Y (pad) =
+ 7 H 16 Q 25 Z
+ 8 I 17 R 26 2
+
+ Special processing is performed if fewer than 40 bits are available
+ at the end of the data being encoded. A full encoding quantum is
+ always completed at the end of a body. When fewer than 40 input bits
+ are available in an input group, bits with value zero are added (on
+ the right) to form an integral number of 5-bit groups. Padding at
+ the end of the data is performed using the "=" character. Since all
+ base 32 input is an integral number of octets, only the following
+ cases can arise:
+
+ (1) The final quantum of encoding input is an integral multiple of 40
+ bits; here, the final unit of encoded output will be an integral
+ multiple of 8 characters with no "=" padding.
+
+ (2) The final quantum of encoding input is exactly 8 bits; here, the
+ final unit of encoded output will be two characters followed by
+ six "=" padding characters.
+
+ (3) The final quantum of encoding input is exactly 16 bits; here, the
+ final unit of encoded output will be four characters followed by
+ four "=" padding characters.
+
+ (4) The final quantum of encoding input is exactly 24 bits; here, the
+ final unit of encoded output will be five characters followed by
+ three "=" padding characters.
+
+ (5) The final quantum of encoding input is exactly 32 bits; here, the
+ final unit of encoded output will be seven characters followed by
+ one "=" padding character.
+
+
+
+
+
+Josefsson Standards Track [Page 9]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+7. Base 32 Encoding with Extended Hex Alphabet
+
+ The following description of base 32 is derived from [7]. This
+ encoding may be referred to as "base32hex". This encoding should not
+ be regarded as the same as the "base32" encoding and should not be
+ referred to as only "base32". This encoding is used by, e.g.,
+ NextSECure3 (NSEC3) [10].
+
+ One property with this alphabet, which the base64 and base32
+ alphabets lack, is that encoded data maintains its sort order when
+ the encoded data is compared bit-wise.
+
+ This encoding is identical to the previous one, except for the
+ alphabet. The new alphabet is found in Table 4.
+
+ Table 4: The "Extended Hex" Base 32 Alphabet
+
+ Value Encoding Value Encoding Value Encoding Value Encoding
+ 0 0 9 9 18 I 27 R
+ 1 1 10 A 19 J 28 S
+ 2 2 11 B 20 K 29 T
+ 3 3 12 C 21 L 30 U
+ 4 4 13 D 22 M 31 V
+ 5 5 14 E 23 N
+ 6 6 15 F 24 O (pad) =
+ 7 7 16 G 25 P
+ 8 8 17 H 26 Q
+
+8. Base 16 Encoding
+
+ The following description is original but analogous to previous
+ descriptions. Essentially, Base 16 encoding is the standard case-
+ insensitive hex encoding and may be referred to as "base16" or "hex".
+
+ A 16-character subset of US-ASCII is used, enabling 4 bits to be
+ represented per printable character.
+
+ The encoding process represents 8-bit groups (octets) of input bits
+ as output strings of 2 encoded characters. Proceeding from left to
+ right, an 8-bit input is taken from the input data. These 8 bits are
+ then treated as 2 concatenated 4-bit groups, each of which is
+ translated into a single character in the base 16 alphabet.
+
+ Each 4-bit group is used as an index into an array of 16 printable
+ characters. The character referenced by the index is placed in the
+ output string.
+
+
+
+
+
+Josefsson Standards Track [Page 10]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+ Table 5: The Base 16 Alphabet
+
+ Value Encoding Value Encoding Value Encoding Value Encoding
+ 0 0 4 4 8 8 12 C
+ 1 1 5 5 9 9 13 D
+ 2 2 6 6 10 A 14 E
+ 3 3 7 7 11 B 15 F
+
+ Unlike base 32 and base 64, no special padding is necessary since a
+ full code word is always available.
+
+9. Illustrations and Examples
+
+ To translate between binary and a base encoding, the input is stored
+ in a structure, and the output is extracted. The case for base 64 is
+ displayed in the following figure, borrowed from [5].
+
+ +--first octet--+-second octet--+--third octet--+
+ |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|
+ +-----------+---+-------+-------+---+-----------+
+ |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|
+ +--1.index--+--2.index--+--3.index--+--4.index--+
+
+ The case for base 32 is shown in the following figure, borrowed from
+ [7]. Each successive character in a base-32 value represents 5
+ successive bits of the underlying octet sequence. Thus, each group
+ of 8 characters represents a sequence of 5 octets (40 bits).
+
+ 1 2 3
+ 01234567 89012345 67890123 45678901 23456789
+ +--------+--------+--------+--------+--------+
+ |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >|
+ +--------+--------+--------+--------+--------+
+ <===> 8th character
+ <====> 7th character
+ <===> 6th character
+ <====> 5th character
+ <====> 4th character
+ <===> 3rd character
+ <====> 2nd character
+ <===> 1st character
+
+
+
+
+
+
+
+
+
+
+Josefsson Standards Track [Page 11]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+ The following example of Base64 data is from [5], with corrections.
+
+ Input data: 0x14fb9c03d97e
+ Hex: 1 4 f b 9 c | 0 3 d 9 7 e
+ 8-bit: 00010100 11111011 10011100 | 00000011 11011001 01111110
+ 6-bit: 000101 001111 101110 011100 | 000000 111101 100101 111110
+ Decimal: 5 15 46 28 0 61 37 62
+ Output: F P u c A 9 l +
+
+ Input data: 0x14fb9c03d9
+ Hex: 1 4 f b 9 c | 0 3 d 9
+ 8-bit: 00010100 11111011 10011100 | 00000011 11011001
+ pad with 00
+ 6-bit: 000101 001111 101110 011100 | 000000 111101 100100
+ Decimal: 5 15 46 28 0 61 36
+ pad with =
+ Output: F P u c A 9 k =
+
+ Input data: 0x14fb9c03
+ Hex: 1 4 f b 9 c | 0 3
+ 8-bit: 00010100 11111011 10011100 | 00000011
+ pad with 0000
+ 6-bit: 000101 001111 101110 011100 | 000000 110000
+ Decimal: 5 15 46 28 0 48
+ pad with = =
+ Output: F P u c A w = =
+
+10. Test Vectors
+
+ BASE64("") = ""
+
+ BASE64("f") = "Zg=="
+
+ BASE64("fo") = "Zm8="
+
+ BASE64("foo") = "Zm9v"
+
+ BASE64("foob") = "Zm9vYg=="
+
+ BASE64("fooba") = "Zm9vYmE="
+
+ BASE64("foobar") = "Zm9vYmFy"
+
+ BASE32("") = ""
+
+ BASE32("f") = "MY======"
+
+ BASE32("fo") = "MZXQ===="
+
+
+
+Josefsson Standards Track [Page 12]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+ BASE32("foo") = "MZXW6==="
+
+ BASE32("foob") = "MZXW6YQ="
+
+ BASE32("fooba") = "MZXW6YTB"
+
+ BASE32("foobar") = "MZXW6YTBOI======"
+
+ BASE32-HEX("") = ""
+
+ BASE32-HEX("f") = "CO======"
+
+ BASE32-HEX("fo") = "CPNG===="
+
+ BASE32-HEX("foo") = "CPNMU==="
+
+ BASE32-HEX("foob") = "CPNMUOG="
+
+ BASE32-HEX("fooba") = "CPNMUOJ1"
+
+ BASE32-HEX("foobar") = "CPNMUOJ1E8======"
+
+ BASE16("") = ""
+
+ BASE16("f") = "66"
+
+ BASE16("fo") = "666F"
+
+ BASE16("foo") = "666F6F"
+
+ BASE16("foob") = "666F6F62"
+
+ BASE16("fooba") = "666F6F6261"
+
+ BASE16("foobar") = "666F6F626172"
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Josefsson Standards Track [Page 13]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+11. ISO C99 Implementation of Base64
+
+ An ISO C99 implementation of Base64 encoding and decoding that is
+ believed to follow all recommendations in this RFC is available from:
+
+ http://josefsson.org/base-encoding/
+
+ This code is not normative.
+
+ The code could not be included in this RFC for procedural reasons
+ (RFC 3978 section 5.4).
+
+12. Security Considerations
+
+ When base encoding and decoding is implemented, care should be taken
+ not to introduce vulnerabilities to buffer overflow attacks, or other
+ attacks on the implementation. A decoder should not break on invalid
+ input including, e.g., embedded NUL characters (ASCII 0).
+
+ If non-alphabet characters are ignored, instead of causing rejection
+ of the entire encoding (as recommended), a covert channel that can be
+ used to "leak" information is made possible. The ignored characters
+ could also be used for other nefarious purposes, such as to avoid a
+ string equality comparison or to trigger implementation bugs. The
+ implications of ignoring non-alphabet characters should be understood
+ in applications that do not follow the recommended practice.
+ Similarly, when the base 16 and base 32 alphabets are handled case
+ insensitively, alteration of case can be used to leak information or
+ make string equality comparisons fail.
+
+ When padding is used, there are some non-significant bits that
+ warrant security concerns, as they may be abused to leak information
+ or used to bypass string equality comparisons or to trigger
+ implementation problems.
+
+ Base encoding visually hides otherwise easily recognized information,
+ such as passwords, but does not provide any computational
+ confidentiality. This has been known to cause security incidents
+ when, e.g., a user reports details of a network protocol exchange
+ (perhaps to illustrate some other problem) and accidentally reveals
+ the password because she is unaware that the base encoding does not
+ protect the password.
+
+ Base encoding adds no entropy to the plaintext, but it does increase
+ the amount of plaintext available and provide a signature for
+ cryptanalysis in the form of a characteristic probability
+ distribution.
+
+
+
+
+Josefsson Standards Track [Page 14]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+13. Changes Since RFC 3548
+
+ Added the "base32 extended hex alphabet", needed to preserve sort
+ order of encoded data.
+
+ Referenced IMAP for the special Base64 encoding used there.
+
+ Fixed the example copied from RFC 2440.
+
+ Added security consideration about providing a signature for
+ cryptoanalysis.
+
+ Added test vectors.
+
+ Fixed typos.
+
+14. Acknowledgements
+
+ Several people offered comments and/or suggestions, including John E.
+ Hadstate, Tony Hansen, Gordon Mohr, John Myers, Chris Newman, and
+ Andrew Sieber. Text used in this document are based on earlier RFCs
+ describing specific uses of various base encodings. The author
+ acknowledges the RSA Laboratories for supporting the work that led to
+ this document.
+
+ This revised version is based in parts on comments and/or suggestions
+ made by Roy Arends, Eric Blake, Brian E Carpenter, Elwyn Davies, Bill
+ Fenner, Sam Hartman, Ted Hardie, Per Hygum, Jelte Jansen, Clement
+ Kent, Tero Kivinen, Paul Kwiatkowski, and Ben Laurie.
+
+15. Copying Conditions
+
+ Copyright (c) 2000-2006 Simon Josefsson
+
+ Regarding the abstract and sections 1, 3, 8, 10, 12, 13, and 14 of
+ this document, that were written by Simon Josefsson ("the author",
+ for the remainder of this section), the author makes no guarantees
+ and is not responsible for any damage resulting from its use. The
+ author grants irrevocable permission to anyone to use, modify, and
+ distribute it in any way that does not diminish the rights of anyone
+ else to use, modify, and distribute it, provided that redistributed
+ derivative works do not contain misleading author or version
+ information and do not falsely purport to be IETF RFC documents.
+ Derivative works need not be licensed under similar terms.
+
+
+
+
+
+
+
+Josefsson Standards Track [Page 15]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+16. References
+
+16.1. Normative References
+
+ [1] Cerf, V., "ASCII format for network interchange", RFC 20,
+ October 1969.
+
+ [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+16.2. Informative References
+
+ [3] Linn, J., "Privacy Enhancement for Internet Electronic Mail:
+ Part I: Message Encryption and Authentication Procedures", RFC
+ 1421, February 1993.
+
+ [4] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
+ Extensions (MIME) Part One: Format of Internet Message Bodies",
+ RFC 2045, November 1996.
+
+ [5] Callas, J., Donnerhacke, L., Finney, H., and R. Thayer,
+ "OpenPGP Message Format", RFC 2440, November 1998.
+
+ [6] Arends, R., Austein, R., Larson, M., Massey, D., and S. Rose,
+ "DNS Security Introduction and Requirements", RFC 4033, March
+ 2005.
+
+ [7] Klyne, G. and L. Masinter, "Identifying Composite Media
+ Features", RFC 2938, September 2000.
+
+ [8] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION
+ 4rev1", RFC 3501, March 2003.
+
+ [9] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
+ Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986,
+ January 2005.
+
+ [10] Laurie, B., Sisson, G., Arends, R., and D. Blacka, "DNSSEC Hash
+ Authenticated Denial of Existence", Work in Progress, June
+ 2006.
+
+ [11] Myers, J., "SASL GSSAPI mechanisms", Work in Progress, May
+ 2000.
+
+ [12] Wilcox-O'Hearn, B., "Post to P2P-hackers mailing list",
+ http://zgp.org/pipermail/p2p-hackers/2001-September/
+ 000315.html, September 2001.
+
+
+
+
+Josefsson Standards Track [Page 16]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+Author's Address
+
+ Simon Josefsson
+ SJD
+ EMail: simon@josefsson.org
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Josefsson Standards Track [Page 17]
+
+RFC 4648 Base-N Encodings October 2006
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2006).
+
+ This document is subject to the rights, licenses and restrictions
+ contained in BCP 78, and except as set forth therein, the authors
+ retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+ ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+ INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+ INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr.
+
+ The IETF invites any interested party to bring to its attention any
+ copyrights, patents or patent applications, or other proprietary
+ rights that may cover technology that may be required to implement
+ this standard. Please address the information to the IETF at
+ ietf-ipr@ietf.org.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is provided by the IETF
+ Administrative Support Activity (IASA).
+
+
+
+
+
+
+
+Josefsson Standards Track [Page 18]
+