utf8
- UTF-8, a transformation format of ISO 10646
SYNOPSIS
ENCODING
Qq UTF-8
DESCRIPTION
The
UTF-8
encoding represents UCS-4 characters as a sequence of octets, using
between 1 and 6 for each character.
It is backwards compatible with
ASCII
so 0x00-0x7f refer to the
ASCII
character set.
The multibyte encoding of
non- ASCII
characters
consist entirely of bytes whose high order bit is set.
The actual
encoding is represented by the following table:
If more than a single representation of a value exists (for example,
0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
used.
Longer ones are detected as an error as they pose a potential
security risk, and destroy the 1:1 character:octet sequence mapping.
"Rob Pike"
"Ken Thompson"
"Hello World""Proceedings of the Winter 1993 USENIX Technical Conference"
"USENIX Association"
"January 1993"
"F. Yergeau"
"UTF-8, a transformation format of ISO 10646"
"RFC 2279"
"January 1998"
"The Unicode Consortium"
"The Unicode Standard, Version 3.0"
"2000"
"as amended by the Unicode Standard Annex #27: Unicode 3.1 and by the Unicode Standard Annex #28: Unicode 3.2"
STANDARDS
The
ENCODING
encoding is compatible with RFC 2279 and Unicode 3.2.