NAME cs00 - Kana-Kanji conversion server (CS) SYNOPSIS /usr/sbin/cs00 AVAILABILITY SUNWjc0u DESCRIPTION The cs00 daemon receives strings from a program such as XCI (xci(7)), performs Kana-Kanji conversion to the string, and returns the result to the program. cs00 uses the "n-bunsetsu saidai itchi hou" version of the "ren-bunsetsu housiki" (joined-morpheme method) for conver- sion, allowing a string of up to 512 characters to be con- verted at one time. More than 50,000 words are registered in the main dictionary which can be modified using mdicm(1). The user can add words to his or her own "user dictionary". See udicmtool(1) and udicm(1) for a detailed discription of the main dictionary and the user dictionary. Code conversion cs00 converts received character code according to the rule defined in the files. ROMAJI to KANA conversion and character-type interconversion also use the rule. Code conversion definition files The code conversion rules are defined in the following files. Filenames hiragana.ccv for HIRAGANA mode katakana.ccv for full-size KATAKANA mode h_katakana.ccv for half-size KATAKANA mode eisuu.ccv for full-size alphanumeric mode h_eisuu.ccv for half-size alphanumeric mode Rules for half-size alphanumeric mode is also used in the KUTEN input mode. Placement of the definition files Code conversion definition files can be placed in the following directories. cs00 looks for each file in these directories in the order below and the file found first is to be used. 1. $HOME/.mle/locale/cs00/ 2. /usr/lib/mle/locale/cs00/ Customizing name of conversion definition files To customize the names of the files, add the following configrable values in the file resources or config. CC_HR for HIRAGANA mode CC_KT for full-size KATAKANA mode CC_HKT for half-size KATAKANA mode CC_EIS for full-size alphanumeric mode CC_HEIS for half-size alphanumeric mode (Example) In file resources: *xci*cs00.config.CC_HR:my_hiragana.ccv In file config: CC_HR = "my_hiragana.ccv" File format Each conversion rule consists of a line in the follow- ing format: string1 string2 [ number | s ] string1 Specifies an input string to be converted. string2 Specifies a string that is the result of con- verting string1. number | s A number value (an integer) specifies the number of characters in the string1 (counted from the last character of the string1 ) to be re- converted. This number must be smaller than the length of the string1. The default is 0. Instead of a number value, 's' specifies that a "non- fixed" entry is not converted. "Non-fixed" means that an input string can match with more than one rule. 's' and a number value cannot be specified at the same time. Note: The characters enclosed by a pair of square brackets [ ] represent Japanese Kana. The upper cases (e.g. [KI]) represent a regular size Kana. The lower cases (e.g. [tsu]) represent a small size Kana. For example, to obtain "[KI][tsu][TO]" by input- ting "kitto", the following rules must be defined: ki [KI] to [TO] tto [tsu] 2 First, as the input string "kitto" matches with "ki", it is converted to "[KI]". Second, as the input string matches to "tto", it is converted to "[tsu]". More- over, the last two characters of "tto" are re- evaluated. As they match with "to", the string "to" is converted to "[TO]". As a whole, "[KI][tsu][TO]" is obtained. There are other rules that result "[KI][tsu][TO]" from "kitto" as described in the following. Each set of the following conversion rules provides "[KI][tsu][TO]" by inputting "kitto". But both can cause some problems. With the following set of rules, the input string "kitto" is also converted to "[KI][tsu][TO]". However, the string "ttttt" is converted to "[tsu][tsu][tsu][tsu]t": ki [KI] to [TO] tt [tsu] 1 With the following set of rules, when you change the definition of "to", you must also change the defini- tion of "tto": ki [KI] to [TO] tto [tsu][TO] The following describes the conversion rules for non- fixed "n". Assume that the following rules are defined: n [N] s to [TO]na [NA] tto [tsu][TO]ni [NI] The input string "n" matches with the rule "n". But 's' is specified in the rule, and this string "n" is a non-fixed entry. Therefore, the string "n" is not converted to "[N]". If "a" is entered after "n", the string will be converted to "[NA]". If "i" is entered after "n", the string will be converted to "[NI]". "n" is con- verted to "[N]" only when "n" is fixed. If 's' is not specified, "n" is converted to "[N]", and when "a" or "i" is entered after "n", "[N]" will change to "[NA]" or "[NI]", respectively. The maximum character length of a line is 1024, and a new- line character terminates the line. A line starts with a hash sign "#" is a comment line. The delimiters between the fields in a rule are spaces or tabs. The following extension characters are required to use the control characters in string1 or string2: \n New line \r Carriage return \t Tab \f Form feed \~ Space (0x20) \{ ( \} ) \# # \\ \ \^ ^ \0 Octal (\001, \012) \1 Octal (\100, \123) \x Two-digit hexadecimal (\x01, \xff) \w Four-digit hexadecimal (\w0101, \wabcd) \q Eight-digit hexadecimal (\q00000101, \q8000cdab) \k Code (Kuten code) (\k0101, \k1616) FILES /usr/lib/mle/ja/cs00/cs00_m.dic Main dictionary for Kana-Kanji conversion /usr/lib/mle/ja/cs00/cs00_u.dic User dictionary for Kana-Kanji conversion /usr/lib/mle/ja/cs00/hiragana.ccv Code conversion rule definition file for HIRAGANA mode /usr/lib/mle/ja/cs00/katakana.ccv Code conversion rule definition file for full-size KATAKANA mode /usr/lib/mle/ja/cs00/h_katakana.ccv Code conversion rule definition file for half-size KATAKANA mode /usr/lib/mle/ja/cs00/eisuu.ccv Code conversion rule definition file for full-size alphanumeric mode /usr/lib/mle/ja/cs00/h_eisuu.ccv Code conversion rule definition file for half-size alphanumeric mode NOTES mdicm(1) or udicm(1) should be used to modify cs00_m.dic or cs00_u.dic, respectively.
Закладки на сайте Проследить за страницей |
Created 1996-2024 by Maxim Chirkov Добавить, Поддержать, Вебмастеру |