MOEVM - Perl binding of libmoe.
use MOEVM; my $vm = MOEVM->new(...); my $n = $vm->puts($text); my $n = $vm->puts($text, 1); my $n = $vm->gets($buf); my $number_of_discarded_characters = $vm->discard; my $input_charset_name = $vm->get_charset; my $output_charset_name = $vm->get_output_charset; MOEVM->set_charset_aliases($charset_name, $non_standard_charset_name1, ...); my $string = MOEVM->version_string; my $number = MOEVM->version_number; my ($output1, $output2, ...) = $vm->convert($input1, ...); my $output = $vm->convert($input1, ...); my $output = MOEVM::convert($input, $option1, ...);
This is a Perl binding of a library to handle multiple octets character encoding:
http://pub.ks-and-ks.ne.jp/prog/libmoe/
Create a new automaton to handle multiple octets character encodings (a moevm). In the following, words input and output are used from the view point of the moevm.
specifies character encoding conversion. converters must be comma separated list of words described in Conversion specifiers.
specifies flags to change behavior of conversion. flags must be comma separated list of words describe in Flag specifiers.
succeeding options apply to input string.
succeeding options apply to output string.
specifies charset name. Some language specifications are also accepted as well as MIME charset names, which are used to restrict candidates of encoding scheme of input string. Acceptable languages are listed in Acceptable languages.
In most cases, conversion setup is automatically performed based on CES, so yo need not to specify converters explicitly.
domestic ASCII converted to US-ASCII,
converted appropriately according to the CES bound to input/output string,
converted to Unicode,
Fullwidth compatibility characters are converted to corresponding halfwidth ones,
Halfwidth compatibility characters are converted to corresponding fullwidth ones,
Codepoints in JIS C 6226 or in JIS X 0208 which are bound to no character are converted into JIS X 0213 plane 1,
All codepoints in JIS C 6226 or in JIS X 0208 are converted into JIS X 0213 plane 1,
Unicode characters of code point between 0x80 and 0x9F (both inclusive) are converted to other Unicode characters as if they are characters of those code point in Microsoft Windows Codepage 1252.
Converters between some JIS X 0208 and Unicode characters having similar glyphs (by the courtesy of Ambrose Li <acli@ada.dhs.org>).
use ``1/11 2/4 2/8 F'' instead of ``1/11 2/4 F'' to designate charsets with final octet 4/0, 4/1, or 4/2 to G0,
escape sequence ``1/11 2/8 4/2'' is output before every control character,
escape sequence for 7 bit single shift is ignored,
discard characters which CES bound to output string can not decode.
The following words may be given instead of MIME charset name for input string. In that case, encoding scheme is automatically detected (hopefully) among succeeding ones.
x-gb-18030-2000, cn-big5, utf-8, or x-euc-tw.
euc-jp, shift_jis, or utf-8.
euc-kr, x-johab, utf-8, or x-unified-hangul.
iso-8859-1, x-gb-18030-2000, cn-big5, x-euc-tw, euc-jp, shift_jis, euc-kr, x-johab, x-unified-hangul, or utf-8.
feeds moevm with string text. It returns the number of characters (not the number of octets) which will be obtained via the method gets below.
If you know that text includes no proper partial representation of a character, you may pass a true value as an optional argument isflush.
appends a converted string to buffer. It returns the number of generated octets.
discard characters kept in moevm. It returns the number of discarded characters.
In array context, it returns the pair of the MIME charset names of input and output string in this order, otherwise it returns the MIME charset name of input string.
In array context, it returns the pair of the MIME charset names of output and input string in this order, otherwise it returns the MIME charset name of output string.
defines aliases of the MIME charset name canonical charset name.
returns version string of libmoe
.
returns version number of libmoe
.
In the array context, it applies conversion to each argument independently, then returns the array of converted strings.
In the non-array context, it applies conversion as if all arguments are concatinated, then returns the string converted from the concatinated string.
makes a moevm with options option1 ..., then applies conversion to text and returns the converted string.
Kiyokazu SUTO <suto@ks-and-ks.ne.jp>
mbconv(1).