mbconv - Character encoding scheme converter
mbconv [options] <file> ...
This is an application of a library to handle multiple octets character string:
http://pub.ks-and-ks.ne.jp/prog/libiso2mb.shtml
mainly written for debugging of the library.
It reads octet by octet from files given on command line (or standard input if no file is specified), converts character encoding scheme as specified by command line options (described below), and output to standard output (or a file specified by -t option or -a option).
display summary of options and exits.
output is appended to file.
specifies character encoding conversion. converters must be comma separated list of words described in Conversion specifiers.
specifies flags to change behavior of conversion. flags must be comma separated list of words describe in Flag specifiers.
succeeding options apply to input stream.
mime encoding conforming to RFC2047 is performed. <string> is used as charset name.
line number (>= 1) is inserted to beginning of each line.
succeeding options apply to output stream.
output to file (truncated).
output width of each line.
specifies charset name. Some language specifications are also accepted as well as MIME charset names, which are used to restrict candidates of encoding scheme of input stream. Acceptable languages are listed in Acceptable languages.
specifies canonical name of non-standard charset name. charset names must be comma separated list of charset names.
specifies output format
output charset name of each input stream to stderr, in the form
file name:
charset name
if two or more files are specifed on the command line, or
charset name
otherwise.
Note: for output stream, converter setup is automatically performed based on charset. So in most cases, yo need not to specify converters explicitly.
converted to Big Five,
converted to ISO-2022-CN,
converted in such a way that designate to G0 and invoked to GL
converted to ISO-2022-KR,
converted to Shift_JIS,
Big Five converted to CNS 11643,
converted to UTF-8,
UTF-8 converted to Big Five or others,
UTF-8 converted to CNS 11643 or others,
UTF-8 converted to JIS X 0208 or others,
UTF-8 converted to KS X 1001 or others,
UTF-8 converted to GB2312 or others,
UTF-8 converted to one of koi8-r, koi8-u, windows-1250, ..., or windows-1258,
domestic ASCII converted to US-ASCII,
converted to CN-GB,
converted to EUC-jp,
converted to EUC-kr,
converted to EUC-tw,
converted appropriately according to the charset bound to the internal automaton,
Unicode characters of code point between 0x80 and 0x9F (both inclusive) are converted to other Unicode characters as if they are characters of those code point in Microsoft Windows Codepage 1252,
converted to JOHAB.
converted to CN-GB-ISOIR165.
use ``1/11 2/4 2/8 F'' instead of ``1/11 2/4 F'' to designate charsets with final octet 4/0, 4/1, or 4/2 to G0,
escape sequence ``1/11 2/8 4/2'' is output before every control character,
check overlong encoding of UTF-8,
escape sequence for 7 bit single shift is ignored.
The following words may be given instead of MIME charset name for input stream. In that case, coding scheme is automatically detected (hopefully) among succeeding ones.
cn-gb, cn-big5, utf-8, or x-euc-tw.
euc-jp, shift_jis, or utf-8.
euc-kr, x-johab, utf-8, or x-unified-hangul.
iso-8859-1, cn-gb, cn-big5, x-euc-tw, euc-jp, shift_jis, euc-kr, x-johab, x-unified-hangul, or utf-8.
Kiyokazu SUTO <suto@ks-and-ks.ne.jp>
This program is distributed with absolutely no warranty.
Anyone can use, modify, and re-distibute this program without any restriction.