<Japanese version of this document>

Setup of w3m with Multiple character Encoding Extension


Start up flow

The original w3m (denoted by w3m below, and assume that the executable file has the same name) reads its configuration options from

in this order. Later specifications override previous ones.

The w3m with multiple character encoding extension (denoted by w3mmee, and assume that the executable file has the same name) needs to know realm of automatic detection of encoding scheme, encodings which your terminal accepts, conversion manner of encoding and character set, messages localized for your language, and so on. Hence its startup flow is somewhat complecated.

First w3mmee examine value of the environment variable ``W3MLANG'' (or ``LANG'' if ``W3MLANG'' is unset). It lowers cases of alphabets in the value, and regards the value as in the form:

<language code>+"_"(under score)+<country code>+"."(period)+<encoding>
For instance, if ``W3MLANG'' has value ``ja_JP.UTF-8'', w3mmee will get

From these components, w3mmee composes file names:

  1. $LIB_DIR/w3mconfig
  2. $LIB_DIR/w3mconfig.<language code>
  3. $LIB_DIR/w3mconfig_<country code>
  4. $LIB_DIR/w3mconfig.<encoding>
  5. $LIB_DIR/w3mconfig.<language code>_<country code>
  6. $LIB_DIR/w3mconfig.<language code>.<encoding>
  7. $LIB_DIR/w3mconfig_<country code>.<encoding>
  8. $LIB_DIR/w3mconfig.<language code>_<country code>.<encoding>
and reads configuration options from these files in this order.

Next it reads expositions of options displayed in the option setup panel, from the files:

  1. $LIB_DIR/w3mmessages
  2. $LIB_DIR/w3mmessages.<language code>
  3. $LIB_DIR/w3mmessages_<country code>
  4. $LIB_DIR/w3mmessages.<encoding>
  5. $LIB_DIR/w3mmessages.<language code>_<country code>
  6. $LIB_DIR/w3mmessages.<language code>.<encoding>
  7. $LIB_DIR/w3mmessages_<country code>.<encoding>
  8. $LIB_DIR/w3mmessages.<language code>_<country code>.<encoding>
in this order. Only the lines of the form:
<option name>+"="(equal sign)+<exposition>
are recognized as definitions of expostions. Spaces at beginning of lines, at end of lines, before equal signs, and after equal signs, are removed.

Finally

$HOME/.w3mmee/config

per user configuration file of the same format as $LIB_DIR/w3mconfig*,

$HOME/.w3mmee/messages

per user message setup file of the same format as $LIB_DIR/w3mmessages*,

are read and evaluated in the same manner as $LIB_DIR/w3mconfig* and $LIB_DIR/w3mmessages*, respectively.


Mapping between locale codeset names and MIME charset names

Contents of this section is applicative only when you configured w3mmee to use gettext().

When return value of gettext() function contains non US-ASCII characters, encoding of such characters must be converted to internal one. Gettext() determines encoding of its output based on codeset name in current locale, while w3mmee uses MIME charset name. Unfortunately a codeset name and a MIME charset name for an encoding scheme differ from each other in general, so w3mmee needs mapping table between them.

Though such table is already built into w3mmee, it is quite possible that the table is insufficient in your environment. Then you can tell additional correspondences to w3mmee with files

  1. $LIB_DIR/locale2mime
  2. $HOME/.w3mmee/locale2mime
each line of which must be of the form
<MIME charset name>+"="(equal sign)+<lang. spec>[+","(comma)+...]
where you may add optional spaces around "=" and ",". <lang. spec> must be a string of the form
<language code>+"_"+<country code>+"."+<codeset name>
where any (but not all) of <language code>, "_"+<country code>, or "."+<codeset name> may be omitted.


New options concerning character encoding

The followings are the list of new configuration options concerning character encoding added by multiple character encoding extension.

mylang <string>

Specifies your language. Currently, value of this option is used only to restrict realm of encoding schemes for autodetection.

For example, assume that you have specified as

mylang cjk
and try to read a document with no charset specification. Then w3mmee try to find encoding scheme among

You can also specify comma seprated list of names of character encoding schemes. In this case, the encoding schemes are used as candidates for autodetections.

mylang_charset <string>

Specifies encoding scheme of a document, of which w3mmee fails to autodetect encoding scheme.

tty_charset <string>

Specifies encoding scheme of terminal I/O.

tty_initial_charset <string>

Using this option is deprecated. Please use tty_initial_input_charset and tty_initial_output_charset instead.

tty_initial_input_charset <string>

When ISO 2022 conforming encoding scheme is specified with tty_charset, initial state of intermediate buffers of that encoding for input stream from tty can be modified to that of encoding scheme specified with this option.

tty_initial_output_charset <string>

When ISO 2022 conforming encoding scheme is specified with tty_charset, initial state of intermediate buffers of that encoding for output stream to tty can be modified to that of encoding scheme specified with this option.

tty_input_converters <string>

Specifies conversions of encoding scheme and character set of terminal input.

Please use this option only if you completely understand behavior of the support library used by multiple character encoding extension.

tty_output_converters <string>

Specifies conversions of encoding scheme and character set of terminal output.

Please use this option only if you completely understand behavior of the support library used by multiple character encoding extension.

tty_fallback_converters <string>

Unless terminal can display a character or replacement string is specified for the character, conversions specified by this option are applied to the character.

Please use this option only if you completely understand behavior of the support library used by multiple character encoding extension.

input_charset <string>

Specifies encoding scheme of a document which contains no charset sepcification, and makes w3mmee to stop autodetection of encoding scheme.

input_converters <string>

Specifies conversion of encoding scheme and character set of characters input from network or a local file.

Please use this option only if you completely understand behavior of the support library used by multiple character encoding extension.

output_charset <string>

When a document contains no charset specification and w3mmee fails to autodetect encoding scheme of the document, w3mmee assumes that name of encoding scheme of the document is that specified by this option.

If the document contains a form requiring input of text, argument passed to the action of the form after conversion to the encoding. Currently this is the only case affected by this option.

output_converters <string>

Specifies conversion of encoding scheme and character set of characters output to network or a local file.

Please use this option only if you completely understand behavior of the support library used by multiple character encoding extension.

search_converters <string>

Specifies conversion of encoding scheme and character set of characters for regular expression search.

Please look at the section "Conversion specifiers" in manual page of a utility included in the support library used by multiple character encoding extension.

process_charset <string>

Specifies encoding schemes for strings which may be passed to a local process, such as arguments for bookmark registration program.

<string> must be a space seprated list of charset specifications of the following form:

<sep1>+<regular expression for process name>+<sep2>+<charset>
or
<charset>
Each space separated token is treated as first form if the first character is non-alphanumeric. Otherwise it is regared as second form. In first form, if <sep1> is "(", "{", "[", "<", or "^", <sep2> must be ")", "}", "]", ">", or "$", respectively. Otherwise <sep2> must equal <sep1>. <sep1> and <sep2> are treated as part of regular expression, only if they are "^" and "$", respectively. Second form is an abbreviation of
"^.*$"+<charset>

A process name given, regular expressions are matched against the name in order. The charset corresponding to the expression of which match succeeded first is adopted.

tty_character_conversion <character range> <replacement string>

Specifies characters which your terminal can't handle. Instead of any character in the range, w3mmee output to terminal the first matching one in the list:

  1. the character itself if no string specification or if the string is "NULL" (without quotes, case sensitive),
  2. the string specified by this option unless it is "REJECT" (without quotes, case sensitive),
  3. the string specified by the option tty_character_replacement,
  4. the character "?" (question mark).

In case that options of this type appear twice, and that one includes another, more specific one is adopted. Or if the ranges overlap, only overlapping range is overwritten by the latter specification.

tty_character_replacement <string>

Specifies default replacement string for characters which your terminal can't handle.

view_buf <string>

Specifies a format string for messages representing documentations in buffers with mouse support disabled (including the case that mouse support was disabled when configured).

view_buf_with_mouse <string>

Specifies a format string for messages representing documentations in buffers with mouse support enabled.

omitted <string>

Specifies replacement string when middle part of a long URI is omitted.

ul_marks <string>

Specifies comma separated list of strings leading items of <ul> construct.

ul_type_disc <string>

Specifies a string leading items of <ul> of which type attribute is "disc".

ul_type_circle <string>

Specifies a string leading items of <ul> of which type attribute is "circle".

ul_type_square <string>

Specifies a string leading items of <ul> of which type attribute is "square".

small_img_alt <string>

Specifies replacement string for small images.

hr_rule <string>

Specifies a string used to draw <hr>.

menu_frame <string>

Specifies a comma separated list of menu frame components starting with left-top corner, left to right, and top to bottom.

rule <string>

Specifies a comma separated list of table borders in the order:

  1. center,
  2. left edge,
  3. top,
  4. left-top corner,
  5. right edge,
  6. vertical bar,
  7. right-top corner,
  8. bottom,
  9. left-bottom corner,
  10. horizontal bar,
  11. right-bottom corner.

rule_bold <string>

Specifies a comma separated list of table bold face borders in the order:

  1. center,
  2. left edge,
  3. top,
  4. left-top corner,
  5. right edge,
  6. vertical bar,
  7. right-top corner,
  8. bottom,
  9. left-bottom corner,
  10. horizontal bar,
  11. right-bottom corner.

message_about_config_save <string>

The option setup panel has an additional item to choose whether new setup will be saved to $HOME/.w3mmee/config. This option specifies an exposition of this configuration option.

charset_cname <string>

Specifies a canonical name of non-standard charset names in the form

<canonical name>+"="(equal sign)+<comma spearated list of charset names>
No space is allowed around equal sign or comma. Charset names are case insensitive.

For example, to treat a page containing charset specification ``charset=SHIFT-JIS'' as if its charset is ``Shift_JIS'', please add the line

charset_cname shift_jis=shift-jis
to your config file.

If there are two options of this type defining the same canonical name, the latter overrides the former.

unicode_width <string>

Specifies the name of a character width table. Recognized names are as follows (names are case insensitive).

xterm
The same as that in xterm-147. Xterm of newer version may have a different one.
EastAsianWidth_AmbiguousToNarrow, eaw_a2n
Conforming UAX #11, and characters marked as ``Ambiguous'' are assinged with width 1.
EastAsianWidth_AmbiguousToWide, eaw_a2w
Conforming UAX #11, and characters marked as ``Ambiguous'' are assinged with width 2.

prefer_charset <boolean>

Specifies whether of not a buffer is reshaped when its character encoding scheme specified with a meta tag differs from autodetected one.


New miscellaneous options

The followings are the list of new configuration options not concerning character encoding. Since original w3m does not recoginize for various reasons (because my patch was rejected, or I have not ported yet related codes to original w3m for my laziness), they are listed in this document.

accept_encoding  <encoding name> <media type> <argv[0]> <path to command>

Binds value <encoding name> of HTTP header field "content-encoding", MIME type <media type>, and a filter program to decode contents encoded with method identified by the name <encoding name>. For this option to be functional, you further need to bind <media type> with a file name extesion by adding a line

<media type> <the extension>
to the file $HOME/.mime.types.

In case that options of this type appear twice or more, and that encoding names coincide, last specification is adopted.

language_extension <string>

Specifies a comma separated list of file extensions which stand for content languages.

If a file has multiple extensions, the extensions listed in this option is skipped when w3mmee determines content type of the file.

search_across_lines <boolean value>

Specifies whether regular expression search across multiple lines is enabled or not.

concurrent <number>

Specifies maximum of number of processes to load documents.

concurrent_per_server <number>

Specifies maximum of number of processes to load documents from each server.

follow_redirection <number>

Specify how many redirections should be followed.

request_header <string>

Specify optional HTTP request header to be added. The headers

Host, Pragma, Cache-Control, Content-Length

are always assigned with values generated by w3mmee, and your specifications are ignored. The headers

UserArgent, Accept, Accept-Encoding, Accept-Language

ara assigned with values generated by w3mmee unless you explicitly specify them. The headers

Content-Type, Referer

are assigned with values which you specify only if there is no other appropriate value. The headers

Cookie, Cookie2,

are assigned with values which you specify only if cookie support in w3mmee is disabled by compile option, by command line option, or by configuration option. Otherwise w3mmee decides their values.

In case that options of this type appear twice or more, and that header names coincide, last specification is adopted.

http_version <string>

Specify version of each HTTP request. Acceptable value is "1.1" or "1.0" (without double quotation marks). Any other value is silently ignored, and version is set to "1.1".

anchor_num_style <string>

Specify style of refering anchors in formatted dump of a document. It is passed to sprintf function toghether with number (starting with 1) in the list of all links in the document. So it must contain one and only one sprintf conversion specification "%d".

img_num_style <string>

Specify style of refering images in formatted dump of a document. It is passed to sprintf function toghether with number (starting with 1) in the list of all links in the document. So it must contain one and only one sprintf conversion specification "%d".

label_withinpage_style <string>

Specify style of optional line number and columns information of links to labels within the same document in formatted dump of a document. It is passed to sprintf function toghether with line number and columns (both starting with 1). So it must contain just two sprintf conversion specifications "%d".

link_num_url <string>

When make link references in a formated output of a document, <string> is used as URI of the document.

scroll_amount <number>

When a cursor moving command is issued and cursor goes outside current view, view scrolls <number> lines or columns.

mailcap_entry <string>

Specify a mailcap entry of maxmal priority, which is intended to change an external viewer temprarily.

Options of this type can appear more than once.

browsecap <string>

Specify a comma separated list of browsecap files.

browsecap_entry <string>

Specify a browsecap entry of maxmal priority, which is intended to change an external browser temprarily.

Options of this type can appear more than once.

wrap_line <boolean>

Specify whether or not to wrap a line wider than screen width.

wrap_line_when_dump <boolean>

Specify whether or not to wrap a line wider than screen width when dumping a buffer. If this option is set to true value, option wrap_line is also set to true value.

line_truncated <character>

Specify the indicator of truncated lines.

line_continued <character>

Specify the indicator of continued lines.

preload_image <boolean>

Specify whether to load inline images before actually displayed or not.

img_valign <position>

Specify default virtical alignment of inline images. <position> must be one of D (stands for "default"), T (stands for "top"), M (stands for "middle"), or B (stands for "bottom"). D is almost the same as B, but somewhat differs for smalle images.

table_valign <position>

Specify default virtical alignment in table. <position> must be one of T (stands for "top"), M (stands for "middle"), or B (stands for "bottom").

when_redirected <behaviour>

Specify behaviour when HTTP request with method other than GET or HEAD is redirected with HTTP response code 301 or 302. <behaviour> must be one of

0
always follows redirection with original request method,
1
always follows redirection with GET method,
2
always ignore redirection,
3
query at run time.

frame_color <color>

Specify color of frame borders.

auto_pixel_per_char <boolean>

Specify whether or not number of pixels per character can be auto-detected.

auto_pixel_per_line <boolean>

Specify whether or not number of pixels per line can be auto-detected.

try_extensions <string>

Specifies a comma separated list of file extensions. When it has failed to open a local file, w3mmee appends each of the extensions to the name of the file, and retries to open a file with the new name.

You can specify "*" (asterisk without quotes) as an item in the list, which is expanded to the comma separated list of all the file extensions bound to content encoding methods (".Z,.bz2,.gz" by default, see accept_encoding option).

edit_remote_source <boolean>

Specify whether or not you want to edit cached sources of remote pages.

remove_traling_spaces <boolean>

Specify whether or not trailing spaces of each formatted line should be removed.

target_frame <string>

Specifies comma separated list of names of frames. Each item in the list is matched in order against names of frames in a frame set being rendered, and the buffer at top-left corner of the first found frame is set to the current buffer.

select_implies_accept <boolean>

If this option is set to true value, a buffer selected in a persistent buffer selection menu is automatically set to the current buffer.

search_form_text <boolean>

Specifies whether or not search commands search texts in values of controls of "form" elements.

menu_y_preference <position>

Specifies the way to determine vertical position of popup menus which includes very many items. <position> must be one of

C
selected item is placed at the cursor position if possible,
A
menu position is adjusted so that whole menu is visible if possible.

template_frame_color <color>

Specify color of borders of frames which are defined in $RC_DIR/template.

menu_buffer_name_color <color>

Specify color of title line of menu buffers.

buffername <string>

Specify format string of title line displayed at the (almost) bottom of screen. Each substring of the form ``%''+<one character> is relaced with:

%A
indicator to show asynchronous loading is incomplete,
%B
name of the buffer,
%I

characters (with opening and closing brackets "[" and "]>) to show various information about the buffer:

F
the buffer is in "forever mode" (like "less" command),
S
the source of buffer is obtained via HTTPS protocol,
I
the buffer includes an inline image of which loading is incomplete.

%L
URI of current link,
%N
information (current line number, current column, etc.) related to cursor position (and more),
%P
coordinate of the frame on which the buffer is displayed,
%U
current URI of the buffer,
%+<character not listed above>
<character> itself.

select_menu_title <string>

Specify format string of title of each item in buffer selection menu. Each substring of the form ``%''+<one character> is relaced in the same way as buffername option.

mouse_double_click_interval <number>

If a mouse button is clicked twice within <number> miliseconds, it is regarded as one double click.

auto_raise <boolean>

Specifies whether or not a buffer automatically becomes a current buffer when it becomes visible.

force_coursor_origin <boolean>

Specifies whether or not cursor is moved to the buffer at left-top corner of display when rendering frame set.

save_file_name_template <string>

Defines a name template for downloaded files. When each file name is generated, ``%F'' and ``%s'' in the template is replaced with a guessed name, ``%X'' with extension part of a guessed name, ``%B'' with rest of a guessed name after extension part removed.

save_directory_mode <string>

Defines modes of newly created directories when names of downloaded files include director parts. It must be a number expression which "strtoul(3)" accepts, or an "rwxrwxrwx" style string. To suppress to create new directories, you need to specify "0" or "---------".

counting_attribute_limit <number>

Exclusive upper limit of values of attributes "cellpadding", "cellspacing", "colspan", "rowspan", "size", and "vspace".

http_authenticate_cache_max <string>

Specifies a comma separated list of defintions of maximum number of re-use times of each cached *-Authenticate: header. Each definition must be one of the forms

+"="(equal sign)+<number>
defines value for specified authentication scheme,
"*"(asterisk)+"="<number>
modifies values for all authentication scheme,
<number>
abbreviation of "*"+"="<number>.
Negative value implies no-limit.

http_authenticate_cache_expire <string>

Specifies a comma separated list of defintions of expiration seconds of each cached *-Authenticate: header. Each definition must be one of the forms

+"="(equal sign)+<number>
defines value for specified authentication scheme,
"*"(asterisk)+"="<number>
modifies values for all authentication scheme,
<number>
abbreviation of the 2nd form.
Negative value implies no-expiration.

make_extviewer_buffer <boolean>

Specifies whether or not to make buffers for external viewers.


Enhancement of string expansion in mailcap entry

w3mmee recognizes following additional %-escapes on string expansion in mailcap entry.

%h

The host part of URI.

%p

The port part of URI.

%u

The whole URI.

%{<test>?<yes>:<no>}

First %<test> is tested whether it expands to something. Please notice that "%" is prepended to the beginning of <test>. If it really expands to anything including empty string, <yes> is processed. Otherwise <no> is processed. If <yes> is omitted, it is treated as if <test> is copied to that place. If <no> is omitted and if expansion of <test> fails, whole escape is replaced with empty string.


browsecap -- External browser capability file

w3mmee includes a mechanism to determine an external browser invoked on a URI automatically based on the scheme part of the URI. Bindings of external browsers and schemes are given by "browsecap" files. w3mmee trys to scan two files

  1. $LIB_DIR/browsecap
  2. $HOME/.w3mmee/browsecap
and makes binding table in the same manner as for "mailcap" files.

File format is also the same as "mailcap" files. Only exception is that the first field of each entry must be of the form

<scheme>+"/"(slash)+<method>
where currently supported <method> is "post", "get", or "download". <method> part may be "*" (asterisk), which is treated as a usual wildcard. In case that <method> part is "post", arguments which should be passed to a CGI program, is passed to a matched external browser as its standard input.

If relevant URI contains query string and if the query string includes a component like <word>=<value>, an escape sequence of the form %{<word>} expands to <value>. Further the escape sequence %? expands to whole of the query string (the first question mark is exclusive).

The browsecap facility is also used to determine an editor used to edit the source file of a buffer, the formatted image of a buffer, value of a input control of text type of a form element, or contents of a textarea control of a form element. An entry is adopted for this purpose if the first field of it matches "x-edit/buffer", "x-edit/screen", "x-edit/inputtext", or "x-edit/textarea", respectively.

Parser of mailcap and browsecap entries in w3mmee recognizes new flags "x-internal", "x-cgioutput", "x-type", "x-uri", "x-netpath", "x-match=<regexp>", and "x-nc-match=<regexp>".

If the flag "x-internal" is set in an entry, the entry is restricted to internal use such as determining process of an enditor described above. I recommend to set this flag in entries for such editors.

If the flag "x-cgioutput" is set, the program determined by the entry is treated as if it is a CGI program, that is, various environment variables are set before calling the program and lines before the first empty line in output of the program are parsed as HTTP response header.

Flag "x-type" is only recognized in mailcap. The string generated from command part of an entry with this flag, is treated as a MIME type name. The document which matches this entry, is processed as if its content is of the type.

Flag "x-uri" is only recognized in browsecap. The string generated from command part of an entry with this flag, is treated as a URI which should be processed by w3mmee instead of original one.

Flag "x-netpath" is only recognized in browsecap. URIs with schemes defined by entries with this flag, must be of "net_path" type (see section 3 of RFC2396).

Flags "x-match=<regexp>" and "x-nc-match=<regexp>" are only recognized in browsecap. They are exclusive, and if both are set for one entry, the latter one is atopted. If one of them is set, <regexp> is matched against the whole URI (in case-insensitive manner for "x-nc-match=<regexp>"), and only when match have succeeded, the entry is adopted. When "test=..." is also set, the results are ANDed to determine whether or not to adopt the entry.


$RC_DIR/template

You can design whole screen layout in your preferable way by defining a frame set with a file named "template" in your personal configuration directory ($RC_DIR).

The frame set must include a frame with name "_main" (without double quotation marks). All the buffers loaded after are displayed at the position of the frame.

In "cols" and "rows" attributes of the frame set, w3mmee accepts extra length specifications of the following form:

<number>c

<number> times of one US-ASCII character width (which can be modified with pixel_per_char option if nescessary),

<number>l

<number> times of one US-ASCII character height (which can be modified with pixel_per_line option if nescessary).

For example, assuming that pixel_per_char is 8 and that pixel_per_line is 16, 80c stands for 640, and 25l stands for 400.

You can use the following "about://" URIs as the sources of frames in $RC_DIR/template:

about://current-buffer
current buffer (implementation is incomplete, and currently useless),
about://menu/<menu name>
persistent version of a menu with name <menu name> defined in $RC_DIR/menu,
about://select-menu
synonym for about://menu/SelectBuffer,
about://process-list
buffer to show list of all subprocesses.


Menus

Texts in popup menus are held in (almost) the same way as usual buffers. So most search functions and most cursor moving functions for user buffers also work for popup menus.

Further you can pick up and make (persistent) copies of buffers implementing popup menus via "about://menu" URI. Many menu functions (excluding ones very depending on "popup"ing) apply these copies.

Note that the name of persistent version of buffer selection menu is "SelectBuffer", though the name of popup version of buffer selection menu is "Select".

(Mainly) to define mouse operations on the title line, special menu named "Lastline" is initially defined. The labels of items of this menu is concatinated horizontally, and inserted before (i.e. leff of) the string generated based on the value of buffername option. If the position passed to the user function "GOTO_XY" is on the title line, the function bound to the item at the position is performed.

In key definitions in $RC_DIR/menu, you can specify some words (case insensitive) for special purposes.

FALLBACK

Usually in menu buffers, after having failed to search menu specific key bindings, w3mmee search key bindings for non-menu buffers. However when a menu has a function bound to this word, it stop to further search and invokes the function.

HIDE

Label fields in menu definitions are mandatory, and usually they appear when menus are displayed. However the labes of the items bound to this word are not shown at all.


Symbolic key notations

W3mmee accepts symbolic notations in various key definitions.

END, PGDN, HOME, PGUP, CR, LF, KP-END, KP-PGDN, KP-HOME, KP-PGUP, KP-INS, F<number>, MOUSE-CLICK-<number>, MOUSE-DCLICK-<number>, MOUSE-DOUBLE-CLICK-<number> (synonym for MOUSE-DCLICK-<number>), MOUSE-DRAG-<number>, MOUSE-MOVE-<number>.

Further it accepts the following notations which do not represent real key strokes. They are used for menu function "M:POSITIONAL" to determine subfunction when it has received a position not inside menu.

MENU-OUTSIDE, MENU-FRAME-TOP, MENU-FRAME-BOTTOM, MENU-FRAME-LEFT, MENU-FRAME-RIGHT.

Character ranges

The first argument of tty_accept_character or of tty_reject_character must be of the following form. For Unicode characters,

"U+"+<hexadecimal notation of Unicode>.
or
"U+"+<hexadecimal notation of Unicode of starting character in the range>+ "-"+<hexadecimal notation of Unicode of ending character in the range>
For non-Unicode characters,
"I+"+<internal representation of character>
or
"I+"+<internal representation of starting character in the range>+ "-"+<internal representation of ending character in the range>

``Internal representation'' of non-Unicode character is computed as follows. First determine an integer S after ISO 2022 classification of character set:

Then, for 94, 96, or 94x94 set, let F be the final octet of designating sequence in ISO 2022 encoding. For 94 set which needs further itermediate octet 2/1 in its designating sequence, further add 0x40 to F. For non-ISO 2022 character set, the support library assigns each character set with an integer to identify the set. We adopt that integer as F.

Finally order all the codepoints representable in the character set, and assign all codepoints with numbers C starting with 0, in that order.

Hexadecimal notations S, F, C joined with ``+'' (plus sign) compose ``internal representation''.

F and C are optional, and their default values are


Any questions or comments about this page are greatly appreciated.

Almost all contents in this site are written by Kiyokazu SUTO (i.e. me) unless especially noted. I want to put all of them into the PUBLIC DOMAIN, even though some lawyers mention that it is impossible in my country.