libmoe-1.5.8.tar.gz (1523KB, 2004-11-21 18:34:48)is gzipped tarball of a collection of functions to handle sequences of characters consisting of multiple octets. It includes a character encoding conversion tool which is initially written for debugging purpose of this library. In spite of my initial intention, I believe that it is very useful tool. You can view ChangeLog:
libmoe-1.5.8-ChangeLog.txt (31KB, 2004-11-21 18:34:48)which is included in the above tarball.
libmoe-devel.tar.gz (1539KB, 2004-11-21 18:36:19)and its ChangeLog
libmoe-devel-ChangeLog.txt (32KB, 2004-11-21 18:36:19)are also available.
The main functionalities are to calculate from a character encoded in multiple octet, a non-negative integer, which is called Universal Code Point (UCP) for convinience of description in this document, including complete information about coded character set containing the character and codepoint of the character in the set, and to reproduce the orignal octet sequence from the integer.
To build and install this library, you need C compiler and libraries conforming to ANSI standard. Further
If you build with the included Makefile, you need to tell to your dynamic linker, the directory (/usr/local/lib) in which the shared library is installed.
If you are installing on a Linux box for example, add the line
/usr/local/libto the file /etc/ld.so.conf unless it already contains such line, and then issue the command
/sbin/ldconfig
The developement version includes a Perl extension module
to use libmoe
functionalities in your Perl script.
After you have successfully installed libmoe
,
please move into the subdirectory named perl
,
then try
perl Makefile.PL; make test
If the output seems fine, then issue the command
make install
to install relevant files.
For detailed usage,
please look at the man page MOEVM(3)
.
Please notice that this feature is highly experimental, and that the APIs may change in release by release for a while.
This library can handle the following subset of the ISO 2022 escape sequences:
1/11 2/5 2/1 2/X 3/Yand trailing
1/11 2/5 4/0where X * 0x10 + Y are integers assigned to encodings by the library.
The library classifies the coded character set (CCS) into 6 categories
For a character in an other CCS, it is somewhat difficult to describe how to determine UCP in natural language. Roughly speaking, we order all the codepoints into one sequence, in the order of above categorization and final octet of escape sequences designating the CCS. The UCP is logical or of the index (staring with 0) in the big sequence and of 1U << 21.
The library has support for a state-less encoding scheme which we call "x-moe-internal" to include all UCP in one document:
Any questions or comments about this page are greatly appreciated.
Almost all contents in this site are written by Kiyokazu SUTO (i.e. me) unless especially noted. I want to put all of them into the PUBLIC DOMAIN, even though some lawyers mention that it is impossible in my country.