2.1 Character Set
The character repertoire for
the text of an Ada program consists of the entire coding space described
by the ISO/IEC 10646:2020 Universal Coded Character Set. This coding
space is organized in
planes, each plane comprising 65536 characters.
Syntax
Paragraphs 2 and
3 were deleted.
A character is defined
by this Reference Manual for each cell in the coding space described
by ISO/IEC 10646:2020, regardless of whether or not ISO/IEC 10646:2020
allocates a character to that cell.
Static Semantics
The coded representation for characters is implementation
defined (it can be a representation that is not defined within ISO/IEC
10646:2020). A character whose relative code point in its plane is 16#FFFE#
or 16#FFFF# is not allowed anywhere in the text of a program. The only
characters allowed outside of comments are those in categories other_format,
format_effector, and graphic_character.
The semantics of an Ada program whose text is not
in Normalization Form C (as defined by Clause 22 of ISO/IEC 10646:2020)
is implementation defined.
The description of the language definition in this
document uses the character properties General Category, Simple Uppercase
Mapping, Uppercase Mapping, and Special Case Condition of the documents
referenced by Clause 2 of ISO/IEC 10646:2020. The actual set of graphic
symbols used by an implementation for the visual representation of the
text of an Ada program is not specified.
Characters are categorized
as follows:
This paragraph was deleted.
Any character whose General Category is defined to be “Letter,
Uppercase”.
Any character whose General Category is defined to be “Letter,
Lowercase”.
Any character whose General Category is defined to be “Letter,
Titlecase”.
Any character whose General Category is defined to be “Letter,
Modifier”.
Any character whose General Category is defined to be “Letter,
Other”.
Any character whose General Category is defined to be “Mark, Non-Spacing”.
Any character whose General Category is defined to be “Mark, Spacing
Combining”.
Any character whose General Category is defined to be “Number,
Decimal”.
Any character whose General Category is defined to be “Number,
Letter”.
Any character whose General Category is defined to be “Punctuation,
Connector”.
Any character whose General Category is defined to be “Other, Format”.
Any character whose General Category is defined to be “Separator,
Space”.
Any character whose General Category is defined to be “Separator,
Line”.
Any character whose General Category is defined to be “Separator,
Paragraph”.
The characters whose code points are 16#09# (CHARACTER TABULATION), 16#0A#
(LINE FEED), 16#0B# (LINE TABULATION), 16#0C# (FORM FEED), 16#0D# (CARRIAGE
RETURN), 16#85# (NEXT LINE), and the characters in categories
separator_line
and
separator_paragraph.
Any character whose General Category is defined to be “Other, Control”,
and which is not defined to be a format_effector.
Any character whose General Category is defined to be “Other, Private
Use”.
Any character whose General Category is defined to be “Other, Surrogate”.
Any character that is not in the categories other_control,
other_private_use, other_surrogate,
format_effector, and whose relative code point
in its plane is neither 16#FFFE# nor 16#FFFF#.
The following names
are used when referring to certain characters (the first name is that
given in ISO/IEC 10646:2020):
graphic symbol | name | graphic symbol | name |
|
| | | |
|
" | quotation mark | : | colon |
|
# | number sign | ; | semicolon |
|
& | ampersand | < | less-than sign |
|
' | apostrophe, tick | = | equals sign |
|
( | left parenthesis | > | greater-than sign |
|
) | right parenthesis | _ | low line, underline |
|
* | asterisk, multiply | | | vertical line |
|
+ | plus sign | / | solidus, divide |
|
, | comma | ! | exclamation point |
|
– | hyphen-minus, minus | % | percent sign |
|
. | full stop, dot, point | [ | left square bracket |
|
@ | commercial at, at sign | ] | right square bracket |
|
Implementation Requirements
An Ada implementation shall accept Ada source code
in UTF-8 encoding, with or without a BOM (see
A.4.11),
where every character is represented by its code point. The character
pair CARRIAGE RETURN/LINE FEED (code points 16#0D# 16#0A#) signifies
a single end of line (see
2.2); every other
occurrence of a
format_effector other than
the character whose code point position is 16#09# (CHARACTER TABULATION)
also signifies a single end of line.
Implementation Permissions
The categories defined above, as well as case mapping
and folding, may be based on an implementation-defined version of ISO/IEC
10646 (2003 edition or later).
NOTE The characters in categories
other_control, other_private_use,
and other_surrogate are only allowed in comments.
Ada 2005 and 2012 Editions sponsored in part by Ada-Europe