Closed Bug 210289 Opened 21 years ago Closed 20 years ago

Gecko violates HTTP-spec and breaks web pages by not defaulting to ISO-8859-1 when range 0x80-0x9F is unused, and instead using the previous page charset

Categories

(Core :: Internationalization, defect)

defect
Not set
major

Tracking

()

VERIFIED INVALID

People

(Reporter: jhartmann, Assigned: smontagu)

References

()

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6

The given URL is only an example. And if I remember right, this is an old
problem, that is the same in Phoenix/Firebird and Mozilla 1.x, at least in
German builds.

When browsing German pages with typographic characters, e. g. quotation marks,
they don't show correctly, even in the source code-view. For instance, instead
of „ and “ the browser shows � and �. (I didn't change any default character set
settings.) And when copying the source from Firebird to any other program, e. g.
Windows Editor, not the original quotation marks are copied but � again.

When viewing the same pages (and viewing/copying their source) with other
browsers (IE or Opera), everything works fine.

Reproducible: Always

Steps to Reproduce:
1. View a German page with typographic characters, e.g. German quotation marks.
2. Look what you see.
3. Compare with what should be there (or what shows up in different browsers).
Actual Results:  
Doesn't look as it should. Partly unreadable.

Expected Results:  
Show (and copy) the right characters.

This makes it impossible to (correctly) read German text with quotes and other
typography.
Hi Jörg, 
the release build you're using is quite old already. Please try a current
nightly with a new profile. This works for me here with the 20030622 build on W2K.

Hallo Jörg,
dein Build ist bereits ziemlich alt. Besorg dir mal ein aktuelles Nightly und
starte es mit einem neuen Profil. Hier bei mir funktioniert nämlich alles.
The page displays properly on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
rv:1.4) Gecko/20030624 (aka 1.4 RC 3)
Marking WORKSFORME due to lack of response from reporter.
If more information appears later, please feel free to reopen the bug.
Thanks.
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → WORKSFORME
I did test this with newer builds, including the German 0.61 - and I'm not the 
only one reporting this problem, as you can read here: http://phoenix.stw.
uni-duisburg.de/forum/viewtopic.php?t=216

Seems that Mozilla and Firebird have a problem with chosing the right character 
encoding, at least when the page itself has no charset-attribute set (so that 
the HTML/XHTML default ISO-8859-1 should be used by Mozilla) and the user comes 
to this page trough a click from a page with a charset that doesn't support 
these characters used in the loading page. That is, Mozilla seems to not chose 
the spec default (ISO-8859-1) in all situations it should do so. At least this 
is what is discussed at the given link (see above).
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
related bugs:
http://bugzilla.mozilla.org/show_bug.cgi?id=205518
http://bugzilla.mozilla.org/show_bug.cgi?id=150958
probably http://bugzilla.mozilla.org/show_bug.cgi?id=156815
http://bugzilla.mozilla.org/show_bug.cgi?id=212039

Mozilla should _not_ use anything other than the HTTP-default ISO-8859-1 (see 
spec) if this charset could be suitable and no other charset is specified either 
within HTTP-header or within the document itself. (Of course if the characters 
used in the document include those not available within ISO-8859-1, then this 
charset is "not suitable". But if the characters used are all within 
ISO-8859-1's range, there's no reason to select a different charset by default, 
which Mozilla does: in many cases ISO-8859-15 for no apperant reason, in other 
cases the charset of pages viewed before which is simply stupid.)

This indeed leads to inproper display of non-ASCII-characters on pages that 
don't specify their charset within meta tags but use nothing else than HTTP's 
default ISO-8859-1, only because a different-charset page is viewed before _or_ 
NO PAGE IS VIEWED BEFORE which often leads to the selection of ISO-8859-15 for 
no reason that one can see. (No, ISO-8859-15 is _not_ set in OS-prefs or the 
like ...)
Summary: character display problems: German quotation marks and other typographic characters don't display correctly → resulting in the display of '?' instead of non-ASCII-characters (again: depending on page before) Mozilla violates both HTTP-spec and reasonable HTML4-interpretation by not chosing ISO-8859-1 as default when no charset is specified and range 0x80-0x9F is …
Note: This is for charset-coding auto-detect: off (so Mozilla should just use 
the default -- ISO-8859-1 -- which it does not).
belongs in the Browser->Internationalization component.
Assignee: blake → smontagu
Component: General → Internationalization
Product: Firebird → Browser
QA Contact: asa → ylong
Version: unspecified → Trunk
Seems it's not really related with how the default charset set to.
The page in URL field doesn't has the html charset meta-tag, if you clear the
cache, and turn auto-detect on (View | Character Coding | Auto-detect |
Universal), mozilla will detect the page as windows-1252, and page display properly.
Hi Yuying (#8),

just read above, e.g. #6 (only two comments up from your's), and than write 
again or just kill you're post. Thanks.

Hi Mike (#7),

no, this is _not_ about Internationalization. At least it _should not_ be, since 
ISO-8859-1 is simply the default (for Englanders as well as others). And this 
_not_ about auto-detection of "foreign" charsets. To the contrary it's about 
using the default --ISO-8859-1-- when auto-detect is switched off.
(In reply to comment #4)
> (so that 
> the HTML/XHTML default ISO-8859-1 should be used by Mozilla)

Neither HTML nor XHTML have such a default. Can you quote the relevant part of
the spec? the forum link in comment 4 doesn't work.

In fact, http://www.w3.org/TR/html4/charset.html#h-5.2.2 is pretty clear that
Latin1 is not allowed to be used as default value:
"Therefore, user agents must not assume any default value for the "charset"
parameter."


this bug is invalid.

(In reply to comment #9)
> no, this is _not_ about Internationalization.

this is just how bugs are classified in bugzilla - charset related issues belong
to the internationalization component.
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago20 years ago
Resolution: --- → INVALID
> Neither HTML nor XHTML have such a default.
> Can you quote the relevant part of the spec?

Well, I quote myself from above: "Mozilla should _not_ use anything other than
the HTTP-default ISO-8859-1 (see spec) if this charset could be suitable and no
other charset is specified either within HTTP-header or within the document itself."

So it's in the HTTP-spec (as the headline states):
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1 clearly states
"When no explicit charset parameter is provided by the sender, media subtypes of
the "text" type are defined to have a default charset value of "ISO-8859-1" when
received via HTTP." Which is what Mozilla does with (non-local) web pages.

The paragraphe you quote from http://www.w3.org/TR/html4/charset.html#h-5.2.2 is
both irrelevant and misunderstanding the HTTP spec(s). The HTTP spec does not
tell what a charset parameter should be if it is absent (since if it is absent,
then it's just absent), but it defines what a character encoding a user agent
should use when there's no charset info given through the headers. So the HTML
statement about this having "proved useless because some servers don't allow a
"charset" parameter to be sent" is quite nonsense since this is a case where the
HTTP-spec default helps: A user agent should use the default ISO-8859-1 "when no
explicit charset parameter is provided by the sender" (which is the server). So
of course, "user agents must not assume any default value for the "charset"
parameter" (since if there's none, then there's none), but if there's no
"charset parameter" then there comes RFC 2616 Sec. 3.7.1: "media subtypes of the
"text" type" (like web pages) "are defined to have a default charset value of
"ISO-8859-1"". Read this: The "charset parameter" of HTTP (provided by the
server) is different from the "charset value" of the "media subtype" (to be used
by the user agent). But the HTTP spec clearly tells what the latter should be if
the first is nonexistent.

> the forum link in comment 4 doesn't work.

Just take off the line-break and it will work just fine.

> this bug is invalid

No, it's not. (We've been through this.)
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
By the way, the default for XML (and so for XHTML) is UTF-8:
http://www.w3.org/TR/1998/REC-xml-19980210#charencoding
hm indeed, http://phoenix.stw.uni-duisburg.de/forum/viewtopic.php?t=216 works
now. it didn't when I last checked.

Sorry, indeed, XML and XHTML have a default of UTF-8, and Mozilla has code to
use that. separate bug though if it doesn't.

>So it's in the HTTP-spec (as the headline states):

HTTP is irrelevant in this case, as HTML overrides that spec. HTML clearly
states that latin1 is NOT to be used as default charset.

unless of course this bug is about non-html files transferred over HTTP...

>The paragraphe you quote from http://www.w3.org/TR/html4/charset.html#h-5.2.2 is
>both irrelevant and misunderstanding the HTTP spec(s)

if you say so. tell the W3C to change that part of the spec I guess.

this bug is still invalid.
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago20 years ago
OS: Windows XP → All
Hardware: PC → All
Resolution: --- → INVALID
It's a relic of the late 1980's (when ISO-8859-1 was considered 'international
enough') that ISO-8859-1 was stipulated as the default encoding. However, no
_sane_ modern implementation does that. Insisting that Mozilla use ISO-8859-1 as
the default is rather Western-Eurocentric and that wouldn't be accepted well by
W3C and other internationally-minded standard organizations.
Verified invalid. We've been through this multiple times with the HTML working
group -- higher level transport protocols override lower level ones in these
situations.
Status: RESOLVED → VERIFIED
Summary: Mozilla violates both HTTP-spec and reasonable HTML4-interpretation by not chosing ISO-8859-1 as default when no charset is specified and range 0x80-0x9F is not used (instead it takes the charset specified by the page viewed before) resulting in the displ… → Gecko violates HTTP-spec and breaks web pages by not defaulting to ISO-8859-1 when range 0x80-0x9F is unused, and instead using the previous page charset
You need to log in before you can comment on or make changes to this bug.