NAME

MOEVM - Perl binding of libmoe.

SYNOPSIS

  use MOEVM;
  my $vm = MOEVM->new(...);
  my $n = $vm->puts($text);
  my $n = $vm->puts($text, 1);
  my $n = $vm->gets($buf);
  my $number_of_discarded_characters = $vm->discard;
  my $input_charset_name = $vm->get_charset;
  my $output_charset_name = $vm->get_output_charset;
  MOEVM->set_charset_aliases($charset_name, $non_standard_charset_name1, ...);
  my $string = MOEVM->version_string;
  my $number = MOEVM->version_number;
  my ($output1, $output2, ...) = $vm->convert($input1, ...);
  my $output = $vm->convert($input1, ...);
  my $output = MOEVM::convert($input, $option1, ...);

DESCRIPTION

This is a Perl binding of a library to handle multiple octets character encoding:

  http://pub.ks-and-ks.ne.jp/prog/libmoe/

MOEVM::new class name option ...

Create a new automaton to handle multiple octets character encodings (a moevm). In the following, words input and output are used from the view point of the moevm.

Options which the method "new" accepts

convert-to=converters: specifies character encoding conversion. converters must be comma separated list of words described in Conversion specifiers.
flag=flags: specifies flags to change behavior of conversion. flags must be comma separated list of words describe in Flag specifiers.
input: succeeding options apply to input string.
output: succeeding options apply to output string.
charset=string: specifies charset name. Some language specifications are also accepted as well as MIME charset names, which are used to restrict candidates of encoding scheme of input string. Acceptable languages are listed in Acceptable languages.

Conversion specifiers

In most cases, conversion setup is automatically performed based on CES, so yo need not to specify converters explicitly.

ascii: domestic ASCII converted to US-ASCII,
ces: converted appropriately according to the CES bound to input/output string,
to-ucs: converted to Unicode,
f2h, full-to-half: Fullwidth compatibility characters are converted to corresponding halfwidth ones,
h2f, half-to-full: Halfwidth compatibility characters are converted to corresponding fullwidth ones,
jisx0213: Codepoints in JIS C 6226 or in JIS X 0208 which are bound to no character are converted into JIS X 0213 plane 1,
jisx0213-aggressive: All codepoints in JIS C 6226 or in JIS X 0208 are converted into JIS X 0213 plane 1,
ms-latin1: Unicode characters of code point between 0x80 and 0x9F (both inclusive) are converted to other Unicode characters as if they are characters of those code point in Microsoft Windows Codepage 1252.
ucs-to-jis0208-extra, jis0208-to-ucs-extra: Converters between some JIS X 0208 and Unicode characters having similar glyphs (by the courtesy of Ambrose Li <acli@ada.dhs.org>).

Flag specifiers

use-0x28-for-94x94inG0, 28: use ``1/11 2/4 2/8 F'' instead of ``1/11 2/4 F'' to designate charsets with final octet 4/0, 4/1, or 4/2 to G0,
ac, ascii-at-control: escape sequence ``1/11 2/8 4/2'' is output before every control character,
nossl, ignore-7bit-single-shift: escape sequence for 7 bit single shift is ignored,
dnc, discard-notprefered-char: discard characters which CES bound to output string can not decode.

Acceptable languages

The following words may be given instead of MIME charset name for input string. In that case, encoding scheme is automatically detected (hopefully) among succeeding ones.

c, cn, china, chinese: x-gb-18030-2000, cn-big5, utf-8, or x-euc-tw.
j, ja, jp, japan, japanese: euc-jp, shift_jis, or utf-8.
k, ko, kr, korea, korean: euc-kr, x-johab, utf-8, or x-unified-hangul.
cjk: iso-8859-1, x-gb-18030-2000, cn-big5, x-euc-tw, euc-jp, shift_jis, euc-kr, x-johab, x-unified-hangul, or utf-8.

MOEVM::puts moevm text [isflush]

feeds moevm with string text. It returns the number of characters (not the number of octets) which will be obtained via the method gets below.

If you know that text includes no proper partial representation of a character, you may pass a true value as an optional argument isflush.

MOEVM::gets moevm buffer

appends a converted string to buffer. It returns the number of generated octets.

MOEVM::discard moevm

discard characters kept in moevm. It returns the number of discarded characters.

MOEVM::get_charset moevm

In array context, it returns the pair of the MIME charset names of input and output string in this order, otherwise it returns the MIME charset name of input string.

MOEVM::get_output_charset moevm

In array context, it returns the pair of the MIME charset names of output and input string in this order, otherwise it returns the MIME charset name of output string.

MOEVM::set_charset_aliases class name canonical charset name non standard charset name ...

defines aliases of the MIME charset name canonical charset name.

MOEVM::version_string

returns version string of libmoe.

MOEVM::version_number

returns version number of libmoe.

MOEVM::convert moevm input1 ...

In the array context, it applies conversion to each argument independently, then returns the array of converted strings.

In the non-array context, it applies conversion as if all arguments are concatinated, then returns the string converted from the concatinated string.

MOEVM::convert text option1 ...

makes a moevm with options option1 ..., then applies conversion to text and returns the converted string.

AUTHOR

Kiyokazu SUTO <suto@ks-and-ks.ne.jp>