Q:
We set up a local httpd server to test a set of documents & scripts, to be published on Monday. We found that OmniWeb uses different character encodings, when sending a form field to a CGI script.
e.g.:
Example with a visible textfield:
<FORM ACTION="../cgi-bin/anmelden" METHOD="get">
<INPUT TYPE="text" NAME="name" VALUE="ß">
<INPUT NAME="bAnmeldung" TYPE="submit" VALUE="Anmeldung">
Result request:
http://jupiter/../cgi-bin/anmelden?name=%DF&bAnmeldung=Anmeldung
Example with a hidden textfield:
<FORM ACTION="../cgi-bin/anmelden" METHOD="get">
<INPUT TYPE=hidden NAME="name" VALUE="ß">
<INPUT NAME="bAnmeldung" TYPE="submit" VALUE="Anmeldung">
Result request:
http://jupiter/../cgi-bin/anmelden?name=%FB&bAnmeldung=Anmeldung
We need it to decode the umlauts (very common in Europe). As you can see, though we have the same character in both forms, it returns a "%DF" from the first form and a "%FB" from the second form. The %FB would be correct but %DF is an "ë" and not an "ß" as it is supposed to be.
We verified the result with a Netscape (1.1) viewer, which in both cases returns the wrong result.
A:
This is a bug in OmniWeb. NEXTSTEP uses a nonstandard character encoding for the upper half of the character set; apparently OW isn't correctly translating from NS encoding to ISO-8859-1 for hidden textfields. In NeXT's encoding, a ``szlig'' (or ``germandbls'' as it's labelled in NeXT's character charts) is 0xFB, but in ISO8859-1 (aka ISO-Latin-1, the standard character set for WWW/HTML) the same character is encoded as 0xDF. Any server running on a NEXTSTEP machine should translate everything between the two character sets as needed.
This can be a subtle problem because all of these character sets are identical in the lower 128 characters (they're all derived from ASCII), so it isn't immediately obvious when some program isn't handling the upper half characters correctly, at least not until you start using diacriticals and ligatures and the like.