The
utility reads a
LC_CTYPE
source file from standard input and produces a
LC_CTYPE
binary file on standard output suitable for placement in
/usr/share/locale/ language /LC_CTYPE
The format of
src-file
is quite simple.
It consists of a series of lines which start with a keyword and have
associated data following.
C style comments are used
to place comments in the file.
Following options are available:
-d
Turns on debugging messages.
-o
Specify output file.
Besides the keywords which will be listed below,
the following are valid tokens in
src-file
RUNE
A
RUNE
may be any of the following:
'x'
The ASCII character
x
'\x'
The ANSI C character
\x
where
\x
is one of
\a\b\f\n\r\t
or
\v
0x[0-9a-z]*
A hexadecimal number representing a rune code.
0[0-7]*
An octal number representing a rune code.
[1-9][0-9]*
A decimal number representing a rune code.
STRING
A string enclosed in double quotes (").
THRU
Either
...
or
-
Used to indicate ranges.
literal
The follow characters are taken literally:
<([
Used to start a mapping.
All are equivalent.
>)]
Used to end a mapping.
All are equivalent.
:
Used as a delimiter in mappings.
Key words which should only appear once are:
ENCODING
Followed by a
STRING
which indicates the encoding mechanism to be used for this locale.
The current encodings are:
BIG5
The
``Big5''
encoding of Chinese.
EUC
EUC
encoding as used by several
vendors of
UNIX
systems.
GB18030
PRC national standard for encoding of Chinese text.
GB2312
Older PRC national standard for encoding Chinese text.
GBK
A widely used encoding method for Chinese text,
backwards compatible with GB 2312-1980.
MSKanji
The method of encoding Japanese used by Microsoft,
loosely based on JIS.
Also known as
``Shift JIS''
and
``SJIS''
NONE
No translation and the default.
UTF-8
The
UTF-8
transformation format of
ISO
10646
as defined by RFC 2279.
VARIABLE
This keyword must be followed by a single tab or space character,
after which encoding specific data is placed.
Currently only the
EUC
encoding requires variable data.
See
euc(5)
for further details.
INVALID
(obsolete)
A single
RUNE
follows and is used as the invalid rune for this locale.
The following keywords may appear multiple times and have the following
format for data:
<RUNE1 RUNE2>
RUNE1
is mapped to
RUNE2
<RUNE1 THRU RUNEn : RUNE2>
Runes
RUNE1
through
RUNEn are mapped to
RUNE2
through
RUNE2
+ n-1.
MAPLOWER
Defines the tolower mappings.
RUNE2
is the lower case representation of
RUNE1
MAPUPPER
Defines the toupper mappings.
RUNE2
is the upper case representation of
RUNE1
TODIGIT
Defines a map from runes to their digit value.
RUNE2
is the integer value represented by
RUNE1
For example, the ASCII character
`0'
would map to the decimal value 0.
Only values up to 255
are allowed.
The following keywords may appear multiple times and have the following
format for data:
RUNE
This rune has the property defined by the keyword.
RUNE1 THRU RUNEn
All the runes between and including
RUNE1
and
RUNEn have the property defined by the keyword.
ALPHA
Defines runes which are alphabetic, printable and graphic.
CONTROL
Defines runes which are control characters.
DIGIT
Defines runes which are decimal digits, printable and graphic.
GRAPH
Defines runes which are graphic and printable.
LOWER
Defines runes which are lower case, printable and graphic.
PUNCT
Defines runes which are punctuation, printable and graphic.
SPACE
Defines runes which are spaces.
UPPER
Defines runes which are upper case, printable and graphic.
XDIGIT
Defines runes which are hexadecimal digits, printable and graphic.
BLANK
Defines runes which are blank.
PRINT
Defines runes which are printable.
IDEOGRAM
Defines runes which are ideograms, printable and graphic.
SPECIAL
Defines runes which are special characters, printable and graphic.
PHONOGRAM
Defines runes which are phonograms, printable and graphic.
SWIDTH0
Defines runes with display width 0.
SWIDTH1
Defines runes with display width 1.
SWIDTH2
Defines runes with display width 2.
SWIDTH3
Defines runes with display width 3.
If no display width explicitly defined, width 1 assumed
for printable runes by default.