A.3.5 The Package Wide_Characters.Handling
{
AI05-0185-1}
The package Wide_Characters.Handling provides operations for classifying
Wide_Characters and case folding for Wide_Characters.
Static Semantics
{
AI05-0185-1}
The library package Wide_Characters.Handling has the following declaration:
{
AI05-0266-1}
function Character_Set_Version
return String;
function Is_Control (Item : Wide_Character)
return Boolean;
function Is_Letter (Item : Wide_Character)
return Boolean;
function Is_Lower (Item : Wide_Character)
return Boolean;
function Is_Upper (Item : Wide_Character)
return Boolean;
{
AI12-0260-1}
function Is_Basic (Item : Wide_Character)
return Boolean;
function Is_Digit (Item : Wide_Character)
return Boolean;
function Is_Decimal_Digit (Item : Wide_Character)
return Boolean
renames Is_Digit;
function Is_Hexadecimal_Digit (Item : Wide_Character)
return Boolean;
function Is_Alphanumeric (Item : Wide_Character)
return Boolean;
function Is_Special (Item : Wide_Character)
return Boolean;
function Is_Line_Terminator (Item : Wide_Character)
return Boolean;
function Is_Mark (Item : Wide_Character)
return Boolean;
function Is_Other_Format (Item : Wide_Character)
return Boolean;
function Is_Punctuation_Connector (Item : Wide_Character)
return Boolean;
function Is_Space (Item : Wide_Character)
return Boolean;
{
AI12-0004-1}
function Is_NFKC (Item : Wide_Character)
return Boolean;
function Is_Graphic (Item : Wide_Character)
return Boolean;
function To_Lower (Item : Wide_Character)
return Wide_Character;
function To_Upper (Item : Wide_Character)
return Wide_Character;
{
AI12-0260-1}
function To_Basic (Item : Wide_Character)
return Wide_Character;
function To_Lower (Item : Wide_String)
return Wide_String;
function To_Upper (Item : Wide_String)
return Wide_String;
{
AI12-0260-1}
function To_Basic (Item : Wide_String)
return Wide_String;
end Ada.Wide_Characters.Handling;
{
AI05-0185-1}
The subprograms defined in Wide_Characters.Handling are locale independent.
function Character_Set_Version return String;
{
AI05-0266-1}
Returns an implementation-defined identifier that identifies the version
of the character set standard that is used for categorizing characters
by the implementation.
function Is_Control (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
other_control; otherwise returns False.
function Is_Letter (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
letter_uppercase,
letter_lowercase,
letter_titlecase,
letter_modifier,
letter_other, or
number_letter;
otherwise returns False.
function Is_Lower (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
letter_lowercase; otherwise returns False.
function Is_Upper (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
letter_uppercase; otherwise returns False.
function Is_Basic (Item : Wide_Character) return Boolean;
{
AI12-0260-1}
{
AI12-0450-1}
Returns True if the Wide_Character designated by Item has no Decomposition
Mapping in the code charts of ISO/IEC 10646:2020; otherwise returns False.
Implementation Note: Decomposition Mapping
is defined in Clause 33 of ISO/IEC 10646:2020. Machine-readable (and
normative!) versions of this can be found as Character Decomposition
Mapping, described in file
http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt,
field 5 (which is the 6th item, Unicode counts from zero).
function Is_Digit (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
number_decimal; otherwise returns False.
function Is_Hexadecimal_Digit (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
number_decimal, or is in the range 'A'
.. 'F' or 'a' .. 'f'; otherwise returns False.
function Is_Alphanumeric (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
letter_uppercase,
letter_lowercase,
letter_titlecase,
letter_modifier,
letter_other,
number_letter,
or
number_decimal; otherwise returns False.
function Is_Special (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
graphic_character, but not categorized
as
letter_uppercase,
letter_lowercase,
letter_titlecase,
letter_modifier,
letter_other,
number_letter,
or
number_decimal; otherwise returns False.
function Is_Line_Terminator (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
separator_line or
separator_paragraph,
or if Item is a conventional line terminator character (Line_Feed, Line_Tabulation,
Form_Feed, Carriage_Return, Next_Line); otherwise returns False.
function Is_Mark (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
mark_non_spacing or
mark_spacing_combining;
otherwise returns False.
function Is_Other_Format (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
other_format; otherwise returns False.
function Is_Punctuation_Connector (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
punctuation_connector; otherwise returns
False.
function Is_Space (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
separator_space; otherwise returns False.
function Is_NFKC (Item : Wide_Character) return Boolean;
{
AI12-0004-1}
{
AI12-0263-1}
{
AI12-0439-1}
{
AI12-0450-1}
Returns True if the Wide_Character designated by Item can be present
in a string normalized to Normalization Form KC (as defined by Clause
22 of ISO/IEC 10646:2020), otherwise returns False.
Reason: Wide_Characters for which this
function returns False are not allowed in identifiers (see
2.3)
even if they are categorized as letters or digits.
Implementation Note: This function returns
False if the Unicode property NFKC Quick Check (NFKC_QC in the files)
has the value No. See the Implementation Notes in
2.3
for the source of this property.
Discussion: A string for which Is_NFKC
is true for every character may still not be in Normalization Form KC,
as Is_NFKC returns true for characters that are dependent on characters
around them as to whether they are removed by normalization. Ada does
not provide a full normalization operation (it is complex and expensive).
function Is_Graphic (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
graphic_character; otherwise returns False.
function To_Lower (Item : Wide_Character) return Wide_Character;
{
AI05-0185-1}
{
AI05-0266-1}
{
AI05-0299-1}
{
AI12-0263-1}
{
AI12-0450-1}
Returns the Simple Lowercase Mapping as defined by documents referenced
in Clause 2 of ISO/IEC 10646:2020 of the Wide_Character designated by
Item. If the Simple Lowercase Mapping does not exist for the Wide_Character
designated by Item, then the value of Item is returned.
Discussion: {
AI12-0263-1}
{
AI12-0450-1}
The “documents referenced” means Unicode, Chapter 4 (specifically,
section 4.2 — Case). The case mappings come from Unicode as ISO/IEC
10646:2020 does not include complete case mappings. See the Implementation
Notes in subclause
1.1.4 for machine-readable
versions of both Uppercase and Lowercase mappings.
function To_Lower (Item : Wide_String) return Wide_String;
{
AI05-0185-1}
Returns the result of applying the To_Lower conversion to each Wide_Character
element of the Wide_String designated by Item. The result is the null
Wide_String if the value of the formal parameter is the null Wide_String.
The lower bound of the result Wide_String is 1.
function To_Upper (Item : Wide_Character) return Wide_Character;
{
AI05-0185-1}
{
AI05-0266-1}
{
AI05-0299-1}
{
AI12-0263-1}
{
AI12-0450-1}
Returns the Simple Uppercase Mapping as defined by documents referenced
in Clause 2 of ISO/IEC 10646:2020 of the Wide_Character designated by
Item. If the Simple Uppercase Mapping does not exist for the Wide_Character
designated by Item, then the value of Item is returned.
function To_Upper (Item : Wide_String) return Wide_String;
{
AI05-0185-1}
Returns the result of applying the To_Upper conversion to each Wide_Character
element of the Wide_String designated by Item. The result is the null
Wide_String if the value of the formal parameter is the null Wide_String.
The lower bound of the result Wide_String is 1.
function To_Basic (Item : Wide_Character) return Wide_Character;
{
AI12-0260-1}
{
AI12-0450-1}
Returns the Wide_Character whose code point is given by the first value
of its Decomposition Mapping in the code charts of ISO/IEC 10646:2020
if any; returns Item otherwise.
function To_Basic (Item : Wide_String) return Wide_String;
{
AI12-0260-1}
Returns the result of applying the To_Basic conversion to each Wide_Character
element of the Wide_String designated by Item. The result is the null
Wide_String if the value of the formal parameter is the null Wide_String.
The lower bound of the result Wide_String is 1.
Implementation Advice
{
AI05-0266-1}
The string returned by Character_Set_Version should include either “10646:”
or “Unicode”.
Implementation Advice: The string returned
by Wide_Characters.Handling.Character_Set_Version should include either
“10646:” or “Unicode”.
Discussion: {
AI12-0263-1}
{
AI12-0450-1}
The intent is that the returned string include the year for 10646 (as
in "10646:2020"), and the version number for Unicode (as in
"Unicode 13.0"). We don't try to specify that further so we
don't need to decide how to represent Corrigenda for 10646, nor which
of these is preferred. (Giving a Unicode version is more accurate, as
the case folding and mapping rules always come from a Unicode version
[10646 just tells one to look at Unicode to get those], and the character
classifications ought to be the same for equivalent versions, but we
don't want to talk about non-ISO standards in an ISO standard.)
NOTE 1 {
AI05-0266-1}
{
AI12-0440-1}
{
AI12-0450-1}
The results returned by these functions can depend on which particular
version of ISO/IEC 10646 is supported by the implementation (see
2.1).
Extensions to Ada 2005
Incompatibilities With Ada 2012
{
AI12-0004-1}
{
AI12-0260-1}
Added additional classification routines Is_Basic
and Is_NFKC, and additional conversion routine To_Basic. Therefore, a
use clause conflict is possible; see the introduction of
Annex
A for more on this topic.
Ada 2005 and 2012 Editions sponsored in part by Ada-Europe