A.4.11 String Encoding
Facilities for encoding, decoding, and converting 
strings in various character encoding schemes are provided by packages 
Strings.UTF_Encoding, Strings.UTF_Encoding.Conversions, Strings.UTF_Encoding.Strings, 
Strings.UTF_Encoding.Wide_Strings, and Strings.UTF_Encoding.Wide_Wide_Strings.
Static Semantics
The encoding library 
packages have the following declarations:
package Ada.Strings.UTF_Encoding 
is
   pragma Pure (UTF_Encoding);
 
   -- 
Declarations common to the string encoding packages
   type Encoding_Scheme 
is (UTF_8, UTF_16BE, UTF_16LE);
 
   subtype UTF_String 
is String;
 
   subtype UTF_8_String 
is String;
 
   subtype UTF_16_Wide_String 
is Wide_String;
 
   Encoding_Error : 
exception;
 
   BOM_8    : 
constant UTF_8_String :=
                Character'Val(16#EF#) &
                Character'Val(16#BB#) &
                Character'Val(16#BF#);
 
   BOM_16BE : 
constant UTF_String :=
                Character'Val(16#FE#) &
                Character'Val(16#FF#);
 
   BOM_16LE : 
constant UTF_String :=
                Character'Val(16#FF#) &
                Character'Val(16#FE#);
 
   BOM_16   : 
constant UTF_16_Wide_String :=
               (1 => Wide_Character'Val(16#FEFF#));
 
   function Encoding (Item    : UTF_String;
                      Default : Encoding_Scheme := UTF_8)
      
return Encoding_Scheme;
 
end Ada.Strings.UTF_Encoding;
package Ada.Strings.UTF_Encoding.Conversions 
is
   pragma Pure (Conversions);
 
   -- 
Conversions between various encoding schemes
   function Convert (Item          : UTF_String;
                     Input_Scheme  : Encoding_Scheme;
                     Output_Scheme : Encoding_Scheme;
                     Output_BOM    : Boolean := False) 
return UTF_String;
 
   function Convert (Item          : UTF_String;
                     Input_Scheme  : Encoding_Scheme;
                     Output_BOM    : Boolean := False)
      
return UTF_16_Wide_String;
 
   function Convert (Item          : UTF_8_String;
                     Output_BOM    : Boolean := False)
      
return UTF_16_Wide_String;
 
   function Convert (Item          : UTF_16_Wide_String;
                     Output_Scheme : Encoding_Scheme;
                     Output_BOM    : Boolean := False) 
return UTF_String;
 
   function Convert (Item          : UTF_16_Wide_String;
                     Output_BOM    : Boolean := False) 
return UTF_8_String;
 
end Ada.Strings.UTF_Encoding.Conversions;
package Ada.Strings.UTF_Encoding.Strings 
is
   pragma Pure (Strings);
 
   -- 
Encoding / decoding between String and various encoding schemes
   function Encode (Item          : String;
                    Output_Scheme : Encoding_Scheme;
                    Output_BOM    : Boolean  := False) 
return UTF_String;
 
   function Encode (Item       : String;
                    Output_BOM : Boolean  := False) 
return UTF_8_String;
 
   function Encode (Item       : String;
                    Output_BOM : Boolean  := False)
      
return UTF_16_Wide_String;
 
   function Decode (Item         : UTF_String;
                    Input_Scheme : Encoding_Scheme) 
return String;
 
   function Decode (Item : UTF_8_String) 
return String;
 
   function Decode (Item : UTF_16_Wide_String) 
return String;
 
end Ada.Strings.UTF_Encoding.Strings;
package Ada.Strings.UTF_Encoding.Wide_Strings 
is
   pragma Pure (Wide_Strings);
 
   -- 
Encoding / decoding between Wide_String and various encoding schemes
   function Encode (Item          : Wide_String;
                    Output_Scheme : Encoding_Scheme;
                    Output_BOM    : Boolean  := False) 
return UTF_String;
 
   function Encode (Item       : Wide_String;
                    Output_BOM : Boolean  := False) 
return UTF_8_String;
 
   function Encode (Item       : Wide_String;
                    Output_BOM : Boolean  := False)
      
return UTF_16_Wide_String;
 
   function Decode (Item         : UTF_String;
                    Input_Scheme : Encoding_Scheme) 
return Wide_String;
 
   function Decode (Item : UTF_8_String) 
return Wide_String;
 
   function Decode (Item : UTF_16_Wide_String) 
return Wide_String;
 
end Ada.Strings.UTF_Encoding.Wide_Strings;
package Ada.Strings.UTF_Encoding.Wide_Wide_Strings 
is
   pragma Pure (Wide_Wide_Strings);
 
   -- 
Encoding / decoding between Wide_Wide_String and various encoding schemes
   function Encode (Item          : Wide_Wide_String;
                    Output_Scheme : Encoding_Scheme;
                    Output_BOM    : Boolean  := False) 
return UTF_String;
 
   function Encode (Item       : Wide_Wide_String;
                    Output_BOM : Boolean  := False) 
return UTF_8_String;
 
   function Encode (Item       : Wide_Wide_String;
                    Output_BOM : Boolean  := False)
      
return UTF_16_Wide_String;
 
   function Decode (Item         : UTF_String;
                    Input_Scheme : Encoding_Scheme) 
return Wide_Wide_String;
 
   function Decode (Item : UTF_8_String) 
return Wide_Wide_String;
 
   function Decode (Item : UTF_16_Wide_String) 
return Wide_Wide_String;
 
end Ada.Strings.UTF_Encoding.Wide_Wide_Strings;
 The type Encoding_Scheme defines encoding schemes. 
UTF_8 corresponds to the UTF-8 encoding scheme defined by Annex D of 
ISO/IEC 10646. UTF_16BE corresponds to the UTF-16 encoding scheme defined 
by Annex C of ISO/IEC 10646 in 8 bit, big-endian order; and UTF_16LE 
corresponds to the UTF-16 encoding scheme in 8 bit, little-endian order.
 
 The subtype UTF_String is used to represent a String 
of 8-bit values containing a sequence of values encoded in one of three 
ways (UTF-8, UTF-16BE, or UTF-16LE). The subtype UTF_8_String is used 
to represent a String of 8-bit values containing a sequence of values 
encoded in UTF-8. The subtype UTF_16_Wide_String is used to represent 
a Wide_String of 16-bit values containing a sequence of values encoded 
in UTF-16.
 The BOM_8, BOM_16BE, BOM_16LE, and BOM_16 constants 
correspond to values used at the start of a string to indicate the encoding.
 Each of the Encode functions takes a String, Wide_String, 
or Wide_Wide_String Item parameter that is assumed to be an array of 
unencoded characters. Each of the Convert functions takes a UTF_String, 
UTF_8_String, or UTF_16_String Item parameter that is assumed to contain 
characters whose position values correspond to a valid encoding sequence 
according to the encoding scheme required by the function or specified 
by its Input_Scheme parameter.
 Each of the Convert and Encode functions returns 
a UTF_String, UTF_8_String, or UTF_16_String value whose characters have 
position values that correspond to the encoding of the Item parameter 
according to the encoding scheme required by the function or specified 
by its Output_Scheme parameter. For UTF_8, no overlong encoding is returned. 
A BOM is included at the start of the returned string if the Output_BOM 
parameter is set to True. The lower bound of the returned string is 1.
 Each of the Decode functions takes a UTF_String, 
UTF_8_String, or UTF_16_String Item parameter which is assumed to contain 
characters whose position values correspond to a valid encoding sequence 
according to the encoding scheme required by the function or specified 
by its Input_Scheme parameter, and returns the corresponding String, 
Wide_String, or Wide_Wide_String value. The lower bound of the returned 
string is 1.
 For each of the Convert and Decode functions, an 
initial BOM in the input that matches the expected encoding scheme is 
ignored, and a different initial BOM causes Encoding_Error to be propagated.
 The exception Encoding_Error 
is also propagated in the following situations: 
By a Convert or Decode function when a UTF encoded 
string contains an invalid encoding sequence.
By a Convert or Decode function when the expected 
encoding is UTF-16BE or UTF-16LE and the input string has an odd length.
By a Decode function yielding a String when the 
decoding of a sequence results in a code point whose value exceeds 16#FF#.
By a Decode function yielding a Wide_String when 
the decoding of a sequence results in a code point whose value exceeds 
16#FFFF#.
By an Encode function taking a Wide_String as input 
when an invalid character appears in the input. In particular, the characters 
whose position is in the range 16#D800# .. 16#DFFF# are invalid because 
they conflict with UTF-16 surrogate encodings, and the characters whose 
position is 16#FFFE# or 16#FFFF# are also invalid because they conflict 
with BOM codes. 
function Encoding (Item    : UTF_String;
                   Default : Encoding_Scheme := UTF_8)
   return Encoding_Scheme;
Inspects a UTF_String 
value to determine whether it starts with a BOM for UTF-8, UTF-16BE, 
or UTF_16LE. If so, returns the scheme corresponding to the BOM; otherwise, 
returns the value of Default.
function Convert (Item          : UTF_String;
                  Input_Scheme  : Encoding_Scheme;
                  Output_Scheme : Encoding_Scheme;
                  Output_BOM    : Boolean := False) return UTF_String;
Returns the value 
of Item (originally encoded in UTF-8, UTF-16LE, or UTF-16BE as specified 
by Input_Scheme) encoded in one of these three schemes as specified by 
Output_Scheme.
function Convert (Item          : UTF_String;
                  Input_Scheme  : Encoding_Scheme;
                  Output_BOM    : Boolean := False)
   return UTF_16_Wide_String;
Returns the value 
of Item (originally encoded in UTF-8, UTF-16LE, or UTF-16BE as specified 
by Input_Scheme) encoded in UTF-16.
function Convert (Item          : UTF_8_String;
                  Output_BOM    : Boolean := False)
   return UTF_16_Wide_String;
Returns the value 
of Item (originally encoded in UTF-8) encoded in UTF-16.
function Convert (Item          : UTF_16_Wide_String;
                  Output_Scheme : Encoding_Scheme;
                  Output_BOM    : Boolean := False) return UTF_String;
Returns the value 
of Item (originally encoded in UTF-16) encoded in UTF-8, UTF-16LE, or 
UTF-16BE as specified by Output_Scheme.
function Convert (Item          : UTF_16_Wide_String;
                  Output_BOM    : Boolean := False) return UTF_8_String;
Returns the value 
of Item (originally encoded in UTF-16) encoded in UTF-8.
function Encode (Item          : String;
                 Output_Scheme : Encoding_Scheme;
                 Output_BOM    : Boolean  := False) return UTF_String;
Returns the value 
of Item encoded in UTF-8, UTF-16LE, or UTF-16BE as specified by Output_Scheme.
function Encode (Item       : String;
                 Output_BOM : Boolean  := False) return UTF_8_String;
Returns the value 
of Item encoded in UTF-8.
function Encode (Item       : String;
                 Output_BOM : Boolean  := False) return UTF_16_Wide_String;
Returns the value 
of Item encoded in UTF_16.
function Decode (Item         : UTF_String;
                 Input_Scheme : Encoding_Scheme) return String;
Returns the result 
of decoding Item, which is encoded in UTF-8, UTF-16LE, or UTF-16BE as 
specified by Input_Scheme.
function Decode (Item : UTF_8_String) return String;
Returns the result 
of decoding Item, which is encoded in UTF-8.
function Decode (Item : UTF_16_Wide_String) return String;
Returns the result 
of decoding Item, which is encoded in UTF-16.
function Encode (Item          : Wide_String;
                 Output_Scheme : Encoding_Scheme;
                 Output_BOM    : Boolean  := False) return UTF_String;
Returns the value 
of Item encoded in UTF-8, UTF-16LE, or UTF-16BE as specified by Output_Scheme.
function Encode (Item       : Wide_String;
                 Output_BOM : Boolean  := False) return UTF_8_String;
Returns the value 
of Item encoded in UTF-8.
function Encode (Item       : Wide_String;
                 Output_BOM : Boolean  := False) return UTF_16_Wide_String;
Returns the value 
of Item encoded in UTF_16.
function Decode (Item         : UTF_String;
                 Input_Scheme : Encoding_Scheme) return Wide_String;
Returns the result 
of decoding Item, which is encoded in UTF-8, UTF-16LE, or UTF-16BE as 
specified by Input_Scheme.
function Decode (Item : UTF_8_String) return Wide_String;
Returns the result 
of decoding Item, which is encoded in UTF-8.
function Decode (Item : UTF_16_Wide_String) return Wide_String;
Returns the result 
of decoding Item, which is encoded in UTF-16.
function Encode (Item          : Wide_Wide_String;
                 Output_Scheme : Encoding_Scheme;
                 Output_BOM    : Boolean  := False) return UTF_String;
Returns the value 
of Item encoded in UTF-8, UTF-16LE, or UTF-16BE as specified by Output_Scheme.
function Encode (Item       : Wide_Wide_String;
                 Output_BOM : Boolean  := False) return UTF_8_String;
Returns the value 
of Item encoded in UTF-8.
function Encode (Item       : Wide_Wide_String;
                 Output_BOM : Boolean  := False) return UTF_16_Wide_String;
Returns the value 
of Item encoded in UTF_16.
function Decode (Item         : UTF_String;
                 Input_Scheme : Encoding_Scheme) return Wide_Wide_String;
Returns the result 
of decoding Item, which is encoded in UTF-8, UTF-16LE, or UTF-16BE as 
specified by Input_Scheme.
function Decode (Item : UTF_8_String) return Wide_Wide_String;
Returns the result 
of decoding Item, which is encoded in UTF-8.
function Decode (Item : UTF_16_Wide_String) return Wide_Wide_String;
Returns the result 
of decoding Item, which is encoded in UTF-16.
Implementation Advice
  If an implementation supports other encoding schemes, 
another similar child of Ada.Strings should be defined. 
18  A BOM (Byte-Order Mark, code position 
16#FEFF#) can be included in a file or other entity to indicate the encoding; 
it is skipped when decoding. Typically, only the first line of a file 
or other entity contains a BOM. When decoding, the Encoding function 
can be called on the first line to determine the encoding; this encoding 
will then be used in subsequent calls to Decode to convert all of the 
lines to an internal format. 
Ada 2005 and 2012 Editions sponsored in part by Ada-Europe