If Transitional DOCTYPE does not end in //EN, standards mode is triggered

RESOLVED WONTFIX

Status

()

RESOLVED WONTFIX
17 years ago
15 years ago

People

(Reporter: kairo, Assigned: dbaron)

Tracking

({compat})

Trunk
x86
Linux
compat
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: WONTFIX?)

(Reporter)

Description

17 years ago
I have two documents containing basically the same content, but written in
different languages, therefore having different DOCTYPEs.

I read bug 42525 - Additional Comments From Henri Sivonen 2000-11-01 13:51

[...]
* Among others, no doctype, HTML 2.0, HTML 3.2 and these doctype declarations
trigger the quirks layout mode:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
[...]

My documents use the following DOCTYPEs:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//en">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//de">

the URLs for those documents are (same order as DOCTYPEs above):
http://www.kairo.at/error404-main.html.en
http://www.kairo.at/error404-main.html.de

If you look at the documents, you'll see the contents of .en documents get
almost centered vertically, it's using a table height of 70%, which is only
rendered that way in quirks mode.
In .de document, there is no centering, because standards mode is triggered.

I believe that //en or //de in DOCTYPE shouldn't result in treating the document
that differently, or you would assume that english documents are buggier than
those of other languages...
(Reporter)

Updated

17 years ago
Blocks: 34662
reassigning to dbaron, who wrote that code.
Assignee: attinasi → dbaron
*** Bug 101686 has been marked as a duplicate of this bug. ***
dbaron's new doctype sniffing code has a list of quirky doctypes. Anything else
gets the standards treatment for forward compatibility purposes.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//de">
is not on the list of quirky doctypes, because is was though to not exist on the
Web as there is no such W3C doctype. The public identifier is bogus, because it
contains "de" where the right chars would be "EN".

The substring "EN" indicates the natural language in which the normative version
of the markup language specification is written. The string is always "EN" for
HTML, because only the English version of the specification is normative.

Also, in the real public indentifier, the string "EN" should be in upper case.
Composer 4.x emits a doctype with bogus case, so the matching of quirky doctypes
is case-insensitive.

Now that I think about it, I have seen this once before in the Mozilla
newsgroups. Someone had substituted "EN" with "PL" for a Polish page.
Note: the //EN part of the DOCTYPE does *not* refer to the language of the
document. It refers to the language of the HTML elements. Therefore, it is
always English.

To change the language of the document, you should use the 'lang' attribute on
the 'html' element, as in:

   <html lang="de">
     ...
   </html>
Keywords: compat
*** Bug 102495 has been marked as a duplicate of this bug. ***

Comment 6

17 years ago
Why is the 'standards treatment' not the same like the 'EN' treatment, when 'EN'
is the standard?
This would help many visitors of such wrong Websites.
>Why is the 'standards treatment' not the same like the 'EN' treatment, when 
>'EN' is the standard?

There are too many sites out there that use the HTMl 4.0 Transitional doctype
and don't comply with the W3C Recommendations. However, the main reason is bug
22274.

See http://www.hut.fi/u/hsivonen/doctype on how to use the standards mode with
transitional docs.

BTW, KaiRo, how did you happen to put //de there? Is there a erroneous tutorial
somewhere out there?
> Why is the 'standards treatment' not the same like the 'EN' treatment, when 'EN'
> is the standard?

We have a list of known existing doctypes that should trigger quirks mode (as
should the absence of a doctype).  To be future-compatible, all other doctypes
trigger standards mode.  I'm hesitant to add variants for all sorts of languages
to this list just because a handful of people fiddle with public identifiers
thinking that they represent the language of the document.  See:

http://www.people.fas.harvard.edu/~dbaron/mozilla/doctypes
(Reporter)

Comment 9

17 years ago
If I understand you correctly, this is a web designer's bug, and we don't want
to add additional "buggy" DOCTYPES to our quirks list - so this is a WONTFIX and
it should be marked as that...

BTW, that pages I initionally use to file this bug are fixed and show right
doctypes (but also doctypes that trigger standards and not quirks mode)
(Reporter)

Comment 10

17 years ago
BTW, Henri, this was no erroneous documentation, this was just wrong guessing at
my side :(

Comment 11

17 years ago
*** Bug 110265 has been marked as a duplicate of this bug. ***

Comment 12

17 years ago
In my opinion a good Browser should be able to handle mistakes of web designers.
At least the most frequent ones.
IE, Opera, Netscape and even Mozilla till 0.94 can do it. Why not 0.95 and newer
ones.
If I understand you correctly, there can't be anything else than 'EN' after
'//'. So why don't we ignore these two letters and assume there is 'EN'.
I don't know if this is the right way to handle this, but you can't correct
every web designer and the enduser doesn't care whose fault it is. The only
thing he can notice is: Every other Browser can handle it an Mozilla can't.
The reason is that we're trying to parse doctype declarations in a
forward-compatible way.  See http://mozilla.org/docs/web-developer/quirks/

Marking as WONTFIX.
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → WONTFIX

Comment 14

17 years ago
Do You really think there will be any other language for html than English in
the future???
I don't. But many websites with this bug are fact and wil be there in future.
4 doesn't qualify as many.

Comment 16

17 years ago
So You won't fix it. OK.
I don't know how many Sites with that bug are out there, but I can imagine there
are more then four.
Perhaps, when I have time, I make a working version for myself.

Comment 17

17 years ago
I found a similar problem at http://www.clubic.com This time it's with //FR. 
This page uses an obviously incorrect doctype of !DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 3.2//FR". I feel like Mozilla's doctype detection should not
be so tight with obviously old and what should be quirks mode doctypes. I think
it would be better if the doctype detection were more of a pattern mattching, so
"-//W3C//DTD HTML 3. anything" would force quirks mode. I don't see the harm of
this for compatibility. I'd say the same thing about 4.0 Transitional. I
seriously doubt that HTML 4.0 will ever support a different language, so why not
do pattern matching (or just match the substring) instead of an exact string
match for older doctypes? (Just noticed that I appear to be saying the same
thing as comment #12.)

BTW, this site reportedly worked in Mozilla in 0.9.4 and previous and also
Netscape 6.2 and previous.

Comment 18

17 years ago
In your comment:

------- Additional Comment #17 From Tim Powell 2002-02-06 08:36 -------

I found a similar problem at http://www.clubic.com This time it's with //FR. 
This page uses an obviously incorrect doctype of !DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 3.2//FR".

This would be an evangelism issue. If you find sites that are incorrectly using 
the doctype, then please feel free to open a new bug pointing to the URL and 
assign the bug to evangelism. Our evang team can contact those sites letting 
them know the appropriate constructs.

Comment 19

17 years ago
Evangelism shouldn't be necessary for this type of issue. It wastes many
people's time: Web developers that need to change doctypes for one browser when
others work, Mozilla evangelists that need to try to persuade web developers to
make the change, users and bugzilla QA who try to figure out the problem,
Mozilla developers who get bug reports and then need to try and figure out if
this is a real bug or a markup problem. And the end result after successfully
persauding the web developer to change things is exactly what? Mozilla now works
like it should have in the first place? There's no benefit here. I can see doing
evangelism for broken nav4 versus mozilla/netscape 6 code or for DHTML. There's
little logic to this. I'm failing to see how treating older and somewhat broken
doctypes as quirks hurts anybody.

Because of the real time drain and for compatibility (see the compat keyword's
description) this bug should be reopened and fixed.
Without question, the person who put //FR in the doctype made a mistake.

tpowell@databeam.com, do you have a proposed algorithm decribed in more detail
than "DWIM"? Try searching the discussions on the W3C's mailing lists and in
comp.infosystems.www.* and you'll find that elaborate DWIM gets more confusing
than using a simple list of doctypes.

Comment 21

17 years ago
> Without question, the person who put //FR in the doctype made a mistake.

Yes. No argument there. How should Mozilla handle the mistake? Should it
intentionally break a page (handle as strict) that works in other browsers? It's
obvious from the DTD that this should be handled by quirks.

> do you have a proposed algorithm decribed in more detail than "DWIM"

I thought I was being clear above, but in case I wasn't, I'm suggesting that
Mozilla just do substring matches instead of exact string matches of the
following DTDs and treat them as quirks: -//W3C//DTD HTML 4.01 T, -//W3C//DTD
HTML 4.0 T, -//W3C//DTD HTML 3., and -//W3C//DTD HTML 2.

I suggest 4.0 and 4.01 T to make up for likely typos in the word Transitional,
but don't feel strongly about this. I would also think that the comparisons
should be done case insensitively. It sounds like from comment #3 that that is
already the case, so that's good.

The reason we do this is that guessing algorithms just aren't forward
compatible.  You seem to have forgotten about (or never known about) the problem
that we behaved differently for 4.01 Transitional doctypes depending on whether
the URI used was the TR/REC-html40/ URL or the TR/REC-html40-YYYYMMDD/ URL
because it changed whether the word "loose" was in the first 25 characters of
the URI.

Comment 23

17 years ago
I fail to see how a substring match is "guessing". I remain (blissfully?) 
unaware of any problem with loose in a 4.01 URI but it sounds awful. It seems 
to me that Mozilla could quite reasonably consider all HTML 2.x and HTML 3.x 
DTDs as needing quirks mode regardless of URI. 

This would also be consistent with IE6 at least as its doctype handling is 
documented:
http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/doctype.asp

It would also seem that the matching for 4.0x Transitional could ignore the 
language such as //EN (but care about the URI).
*** Bug 104372 has been marked as a duplicate of this bug. ***
*** Bug 159625 has been marked as a duplicate of this bug. ***
*** Bug 166260 has been marked as a duplicate of this bug. ***
*** Bug 170949 has been marked as a duplicate of this bug. ***

Comment 28

16 years ago
*** Bug 192515 has been marked as a duplicate of this bug. ***

Comment 29

16 years ago
Even 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//DE">
gets rendered in standards mode wich is definitly no Good Idea(tm).
Please reopen.
2 arguments for the reopen:
1. many many non-english webpages, which are typical candidated for being
rendered in quirks mode (as they have totally broken HTML, not only in the
DOCTYPE) have this, and doesn't work in Mozilla so.
2. if such a wrong DOCTYPE doesn't show, that the page author didn't understand
HTML (and so quite likely will make many mistakes, he didn't see in his MSIE), I
don't know, what can show it.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
I'm not convinced.
Status: REOPENED → RESOLVED
Last Resolved: 17 years ago16 years ago
Resolution: --- → WONTFIX
OK, reopening, although only to add a few doctypes to the list, not to modify
general parsing of what should be an opaque string.
Status: RESOLVED → REOPENED
Priority: -- → P2
Resolution: WONTFIX → ---
Target Milestone: --- → mozilla1.5alpha
here goes a list of candidates:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//DE">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//DE">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//DE">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//DE"> 
I'm quite sure, 'strict', 'frameset' and all those with an URL are not hit by
this, as editors doesn't set them as a default (which will be "fixed" by the
author). Also found some //FR and one //NL in the dups.
Priority: P2 → --
Target Milestone: mozilla1.5alpha → ---
FWIW, doctype sniffing is confusing already (not to myself but to people who
haven't read the code). Every time doctype sniffing changes, it becomes more
confusing. And besides, changing this now would make Mozilla and Safari bahave
differently--making things more confusing.

> as editors doesn't set them as a default

Do you mean some bogo-editors set //DE doctypes by default? Or that authors
change the editor defaults to //DE?
> And besides, changing this now would make Mozilla and Safari bahave
differently--making things more confusing.

they behave different, if it comes to SGML-Comments and to image alignment. Also
khtml always accepts CSS length without a unit - but these are the 3 points
breaking so many pages..

> Do you mean some bogo-editors set //DE doctypes by default? Or that authors
change the editor defaults to //DE?

I hope only the last.

I have checked all the duplicates of this bug and all of them
seems to have fixed their DOCTYPEs:

Bugzilla   URL                       DOCTYPE
bug 101686 http://news.mail.ru/      no DOCTYPE at all
bug 102495 http://www.tvtv.de        is using //EN
bug 110265 http://www.tvtv.de        is using //EN
bug 104372 http://www.clubic.com     is using //EN
bug 159625 URL is 404 but other pages at http://www.talkline.de is using //EN
bug 166260 http://linuxfocus.org/Deutsch/July2002/article239.shtml using //EN
bug 170949 URL is 404 but other pages at http://www.everlage.de/ is using //EN
bug 192515 no URL provided

As far as I can tell there are very few sites using these invalid DOCTYPEs
and for those that do - are their layout really broken because of it?
Does anyone know of a major site that is suffering from this problem?

IMO, this problem can and should be handled through evangelization.

Comment 37

16 years ago
*** Bug 209378 has been marked as a duplicate of this bug. ***
*** Bug 221661 has been marked as a duplicate of this bug. ***
Do we still want to add to the list of DOCTYPEs?
QA Contact: petersen → ian
Whiteboard: WONTFIX?
WONTFIX. Not enough sites depend on this.
Status: REOPENED → RESOLVED
Last Resolved: 16 years ago15 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.